Read this first – https://codinko.wordpress.com/2017/04/16/spring-batch-introduction/

Basics:

  • Spring Batch uses a ‘Chunk Oriented’ processing style within its most common implementation.
  • Chunk oriented processing refers to reading the data one at a time, and creating ‘chunks’ that will be written out, within a transaction boundary.
  • One item is read in from an ItemReader, handed to an ItemProcessor, and aggregated.
  • Once the number of items read equals the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is committed.

https://docs.spring.io/spring-batch/docs/current/reference/html/images/chunk-oriented-processing.png

Below is a code representation of the same concepts shown above:

List items = new Arraylist();
for(int i = 0; i < commitInterval; i++){
    Object item = itemReader.read()
    Object processedItem = itemProcessor.process(item);
    items.add(processedItem);
}
itemWriter.write(items);

References: 

  • http://docs.spring.io/spring-batch/reference/html/configureStep.html
  • https://docs.spring.io/spring-batch/docs/current/reference/html/step.html#chunkOrientedProcessing

Understanding the concepts

The commit-interval defines how many items are processed within a single chunk. That number of items are read, processed, then written within the scope of a single transaction.

commit-inerval=10 means 10 items will be processed within each transaction

The page-size attribute on the paging ItemReader implementations (JdbcPagingItemReader for example) defines how many records are fetched per read of the underlying resource. So in the JDBC example, it’s how many records are requested with a single hit to the DB.

While there is no direct correlation between the two attributes, it’s typically considered a good idea to make them match, however they independently provide two knobs you can turn to modify the performance of your application.

if you have the page-size set to the same as the commit-interval, then it means a single commit for each page.

Before learning more read about these two important classes:

  1. JdbcPagingItemReader http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/database/JdbcPagingItemReader.html

Summary: ItemReader for reading database records using JDBC in a paging fashion.
The query is executed using paged requests of a size specified in AbstractPagingItemReader.setPageSize(int).

  1. PagingQueryProvider http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/database/PagingQueryProvider.html

Summary: Interface defining the functionality to be provided for generating paging queries for use with Paging Item Readers.
One method is generateFirstPageQuery()
java.lang.String generateFirstPageQuery(int pageSize)
Generate the query that will provide the first page, limited by the page size.
Parameters:
pageSize – number of rows to read for each page
Returns:
the generated query

For learning Spring Batch, the best tutorial is to learn from Spring docs!

https://docs.spring.io/spring-batch/reference/html/configureJob.html

https://docs.spring.io/spring-batch/docs/current/reference/html/job.html

Configuring  a Step:

Step is a domain object that encapsulates an independent, sequential phase of a batch job and contains all of the information necessary to define and control the actual batch processing.

The contents of any given Step are at the discretion of the developer writing a Job. A Step can be as simple or complex as the developer desires.

https://docs.spring.io/spring-batch/docs/current/reference/html/step.html#configureStep

https://docs.spring.io/spring-batch/docs/current/reference/html/images/step.png

 

The Commit Interval

A step reads in and writes out items, periodically committing using the supplied PlatformTransactionManager. With a commit-interval of 1, it will commit after writing each individual item. This is less than ideal in many situations, since beginning and committing a transaction is expensive. Ideally, it is preferable to process as many items as possible in each transaction, which is completely dependent upon the type of data being processed and the resources with which the step is interacting. For this reason, the number of items that are processed within a commit can be configured.

<job id="sampleJob">
    <step id="step1">
        <tasklet>
            <chunk reader="itemReader" writer="itemWriter" commit-interval="10"/>
        </tasklet>
    </step>
</job>

In the example above, 10 items will be processed within each transaction. At the beginning of processing a transaction is begun, and each time read is called on theItemReader, a counter is incremented. When it reaches 10, the list of aggregated items is passed to the ItemWriter, and the transaction will be committed.

 

References:

http://docs.spring.io/spring-batch/reference/html/configureStep.html

http://docs.spring.io/spring-batch/reference/html/configureJob.html