DevX has an interesting article about batch processing that caught my eye for two reasons: it talks about doing it with J2EE and it discusses when asynchronous processing is a better solution. The article identifies batch jobs as those that possess the following characteristics:
- High volume, involving thousands, hundreds of thousands, or millions of data rows or transactions
- Computationally expensive, and you don't want this cost to be part of your on-line application
- Unable to be triggered by a particular user action as the data is incomplete or unstable; Data stabilises after the fact or when some other business process occurs
- Triggered by a high-level, overarching business or time-based event
The problem is that each of these effectively describes scenarios where you'd want to do asynchronous processing as well. How to choose?
Batch processing processes the work off-line. The basic architecture uses a "client" to partition the job into parallel chunks and then fire off EJBs to process those chunks. On the other hand, real-time asynchronous processing is applicable when the the processing must be performed immediately or when the results must eventually be communicated back to an on-line user. The obvious choice for asynchronous processing in J2EE is the message bean.
Fine-grained, real-time timers and events are suitable for when you need to handle the issues in real-time. However, they can be expensive. You spend CPU cycles raising, maintaining, and checking events. Event/timer processes may contend with your on-line processes for resources and for access to the same data, leading to locking problems. Using many small events to process large volumes is not as efficient as processing large data sets in one go. It's hard to get efficiencies of scale.
Batch solutions are ideal for processing that is time or non-real-time event based and/or state based:Batch processes are data-centric and can efficiently crunch through large volumes off-line without affecting your on-line systems.
- Time-based: The business function executes on a periodic or recurring basis. The function is required to run only on a daily/weekly/monthly/yearly cycle or at a time denoted by the business.
- State-based: The business function operates over data that matches a given state where state can include simple, complex, and time-based criteria.
From High-volume Transaction Processing in J2EE
Referenced Wed Apr 28 2004 15:35:01 GMT-0600