In our software development world, we often need something which can 

  • process a large amount of data
  • Batch CSV Processing 
  • Send notifications every day after processing some data
  • Produce some reports from various sources

Today we are going to discuss a framework that makes our life a lot easier as a developer. So let’s start our introduction to Spring Batch:

Overview: Spring Batch

Spring Batch is a framework for batch processing. Batch processing is something we use for long-running tasks, heavy tasks that can be run in the background. 

So Spring Batch is:

  • A lightweight batch processing framework for bulk processing
  • Frequently long-running process
  • A robust way for bulk oriented computation
  • Batch builds upon the POJO-based development approach

When we say its POJO based development approach we mean we deal with POJOs or Objects among ItemReader, ItemProcessor, ItemWriter. Spring batch is widely used in the industry. Here we are going to discuss the domain-specific language of spring batch, the database schema.

Domain-specific Language:

  • Job 
    • A job is
    • It is the Id of the job, it is mandatory to specify the value to this attribute.
    • restartable
  • Job Launcher
    • As the name specifies, it used to launch the job
  • Job Repository
    • JobRepository is used to configure it with a relational database and deal with all statistical data generated at runtime
  • Job Execution Context 
    • Say we want to save something for later access among various steps, then we can use JobExecution Context. It will be available until the lifetime of a job
  • Job Instance
    • One execution of a job is called the Job Instance. 
  • Job Parameter
    • We can inject parameters at the beginning of the job, let's say some unique id or jobName for later access till the job lifecycle.
  • Step
    • Jobs can be further divided into Steps. A job can have multiple steps. 
  • Step Execution Context
    • Similar to the job, a step also has its own execution context, where we can store required data and access it till the lifecycle of a step, (meaning in ItemReader, item writer, item processor)
  • Item
    • The Pojo class or any other object which we are dealing with in a batch job. 
  • Item reader
    • Item Reader is an Interface where we have a read method. It is the initial point of the step. Where we read data from some queue, REST API or some database.
    • The input of that read method will be a single item
  • Item Processor 
    • Here we can transform our Object/POJO as per our need, say mapping of fields or changing the format, or skipping some records based on some logic(filtering).
  • Item Writer
    • In the end, we can write our record to some queue, DB, or use rest API to save it.
  • Scheduler
    • Quartz
    • Spring @Scheduler

For both, you need a basic understanding of corn expressions.

What Is CRON and How Can It Help Schedule Cloud Workflow Jobs? - DZone  Integration

Relationship:

  • A Job has one-to-many relation with steps
  • A step has a one-to-one relation with ItemReader, ItemProcessor, ItemWriter

Flow

As we can see in the below image the flow starts when a scheduler starts ant call JobLauncer. JobLauncher invokes JobRepo and stores statistical data into the database. It also creates a Job. 
A job initiates a step. A step can be further divided into ItemReader, ItemProcessor, and ItemWriter.Which can communicate to a DB, File, REST API, or a Queue
 


Database schema:

Below is the database schema of the Spring Batch. As we discussed earlier it gave us statistical information about the running job and step and also after completion of those. We can check all metadata about steps and jobs in the below tables:



Need:

  • Bulk processing
  • Batch processing
  • Partial processing
  • How: read, process, write
  • Timebase events, periodic app 

Benefits:

  • Highly scalable
  • Ease
  • Customizable
  • Automatic retry after failure
  • Execution statistics while running and after completio

 

Technologies

Leave a Reply