Saturday, April 11, 2015

ThreadPoolTaskExecutor and HTTPConnectionManager

I was working on a project where it is a requirement to read some tasks from queue and process them.Let's say the queue name is fileSendQueue.There are multiple threads to get the tasks from the the queue   and process the tasks.Here ThreadPoolTaskExecutor of spring comes into picture.I have 10 threads in the thread  pool.Each thread takes a task from the queue process the task.After the completion of the task,the thread is returned to the pool.My Thread pool configuration is like  below

              <bean id="fileSenderTaskExecutor"
        class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
        <property name="corePoolSize" value="10" />
        <property name="maxPoolSize" value="10" />
        <property name="threadGroup" value="fileSenderThreadGroup" />

       </bean >
               
               
               
       




Here core pool size is 10 , max pool size is 10.
The job of each thread is to send a file to a remote server by httpclient.That is it  was a multipart post request.Soon after  deploying the application it is noticed that the task queue (fileSendQueue) is empty.
It looks like all the tasks in the queue are completed successfully.So far so good , so happy now, pretty cool.As expected my thread pool is working properly and it is emptying the queue and processing the tasks. And also successfully completing the task.But Soon after packing my bag for a short leave , I got a call , that the remote servers on the other sides are waiting for the files from this server.Ohh what happened, here my 
fileSendQueue size is 0.It indicates the tasks are processed from the queue.So the queue is empty now.But after logging a little more and analyzing I found the tasks are removed from the queue but it is not processed, it is stuck some where in between.Where did it  stuck?And from logging a little more it is clear that the active thread count is equal to the core pool size that is 10.Actually it is in the ThreadPoolTaskExecutor queue.
Just it removed from one queue and entered in another queue.But Why the tasks are in ThreadPoolTaskExecutor's queue?Since the number of active threads are equal to the core pool size.So the
ThreadPoolTaskExecutor queueed the tasks in its own queue.It is according to the documentation of  ThreadPoolTaskExecutor.But at first it seems ThreadPoolTaskExecutor is not working properly and it is not releasing the threads to the pool after the completion of tasks.So the active thread count is 10.But my assumption  was wrong.

The Bottleneck is not thread pool.It is with HTTPConnectionManager.The httpclient is taking too much time to send a file to remote server.So the next doubt is on HTTPConnectionManager.Perhaps it is not getting time out for bad connection as given in configurations of manager.The SOTimeout is defined as 30 seconds.That is the inactive time between two consecutive packet receive.Perhaps it is not obeying the configuration.And there is some issue with HTTPConnection manager.

After analyzing  a little more  I found  Http Manager is working expected.So what is happening here.Puzzled,confused?Actually the issue is with the file size and bidirectional connectivity.Assume That I have 100 tasks in the queue initially and more tasks are coming at run time.But the file need to send to remote server is of size like 150 MB or more and it is trying to send the file to remote server but taking too much time.Sometimes a thread is blocked for 1 hour or  more  to send a file.But  due to connectivity problem , some times at the end we are getting socket time out exception.

 java.net.SocketTimeoutException: Read timed out

So in this case thread was blocked for more than 1 hour and in the end did not do  anything useful as exception occurs at last moment due to connectivity issue.And it was the issue with all the tasks.The files size was so huge and connectivity was not smooth.

Solution to this problem: Make the file size small enough to send it in case bad connectivity.And make the task of the thread asynchronous.So that the thread from ThreadPoolTAskExecutor will not block till the completion of the task.It will not create load on the system as sooner or later the asynchronous task will complete and the threads will be released.  

Thursday, October 23, 2014

Spring Batch- A case study

In our company , we had a requirement to process some mobile numbers , to send them sms and after some configurable time of sending sms, send them a call to give information about some product.Here I would like to describe the implementation of  the  use case by use of  Spring batch.Before I start describing the use case, like to brief what Spring batch is.

 Usage of SpringBatch:

A batch application read  huge number of records from a source(generally database or file system),process them in some required pattern and write it back in  some source(might be in different netowrk/source).

 Use case of Spring Batch:
  •  Suppose we have a large number of data and we want to read,process and write the data in batch or in chunk and want to commit it in batch or chunk .
  •  Suppose we have a job where we want to perform some tasks in parallel within the batch environment.
  •  Want to restart the job manually or with the help of a scheduler.This might be a fresh restart from the beginning or a resume from where we left.
  •  Suppose we have a requirement to execute a step1 and depending on the result of step 1 next action will be taken.On success of step1 step2 will be executed and step3  will be executed on failure of step1.
  •  Suppose we need to skip records purposefully at the time of processing based on some condition.
  • Combination of all the above.
  
 Spring Batch Architecture:   

      The Spring Batch framework consist of three layer.
  • Application layer  represents the business logic  we write by using the spring batch.
  • Core layer represents the components that is necessary to control a batch job.It consist of classes such as Job Launcher,Job,Step.
  • Infrastructure layer represents Item reader,Item writer and classes to handle things like job recovery and job restart.
   
 Spring Batch Terminology: 

  •   Job:- A batch job is a combination of steps in a predefined order to execute as part of a task..It is on the top of the batch hierarchy.   
  •   JobParameters:- A set of parameters used to start a batch job.Suppose we have a job that is   interacting with our customers by sending sms or email.If the job is scheduled with parameter sms ,then it will send sms to the specific customers those are in base.If it is scheduled with the parameter email, then it will send email to the specific customers those are in base.Here "sms" and "email" are different job parameters.
  • JobInstance:The execution of a  job with the unique set of parameters is called a JobInstance of the same job.If the same job is running  with parameter sms and email at the same time,then we say that two instances of the same job  are running.
  • Step:-A Step is an entity  that encapsulates an independent, sequential phase of a batch job.
  • ExecutionContext:It represents a store to persist key/value pair (analogs to one-to-one mapping)data that can be used by step or job at the time of execution.
  • JobRepository:It is the storage mechanism for all the details of job and step executions.When a Job is first launched, a JobExecution is obtained from the repository.
  • JobLauncher:-It is the mechanism which is used to launch the job with the given set of job parameters.
  • Item Reader:ItemReader is a mechanism that retrieves  the input for a Step with one record at a time.
  • Item Processor:Item Processor is a mechanism which processes one record at a time and determines whether the record is valid or not.If if it is invalid it will skip that record.
  • Item Writer:Item Writer is a mechanism which writes the processed records of one batch or chunk at a time.
  • Listener: A listener  is something that is waiting an event to occur and intercept that with some custom requirement.Similarly batch job allows the use of listeners to do some additional stuff by hijacking an event.We can use listeners in batch job in two levels ie. job level and step level.
  • Job level listeners:-If we want to send a email/sms at the start of the job or end of the job,then job level listener is the right candidate.Job level listeners are
    1.JobExecutionListener.
  • Step level listeners:-If we want to do so some customized task inside a step , we can do it with step level listeners.Step level listeners are
    1.StepExecutionListener
    2.ChunkListener
    3.ItemReadListener
    4.ItemWriteListener
  Configuration for the job: 

 We can configure the job in different ways like in programmatic   way and in  xml  way.Here we describe      the xml configuration for the job.In our current scenario  we will use the following tags to define our job in xml.
  1.  job:It is the parent element of job configuration.The  sub elements will be in use are step and split.
  2.  Step: It is a stage in a batch job.There may be many stages associated with a batch job.A step requires either a chunk definition, a tasklet reference.The sub elements will be in use are tasklet and next.
  3.  Split:  It declares that the job should split into two or more subflows.
  4.  Tasklet: The Tasklet strategy can be implemented directly by  configuring a reference to the Tasklet interface or  by configuring a chunk .The sub elements will be in use are chunk.
  5.  Chunk:Chunk declares that the owner of the chunk ie step which contains the chunk will perform chunk oriented processing.The sub elements will be in use are reader,processor,writer,listeners.continuing..............