Service instance sharing across jobs

You can share a service instance (SI) among multiple but similar MapReduce jobs to use data and state information cached in memory for better performance.

Within the MapReduce framework, a service instance is an executing instance of the MapReduce service. The MapReduce service is transient if the service instances restart once it leaves the current job. The MapReduce service is persistent if the service instances stay and serve multiple jobs. Sharing the service instance keeps the service instance running to serve multiple tasks of the same or similar MapReduce job, thus reducing the number of times the service instance is restarted. MapReduce jobs are similar if the checksum for the job's associated user files is identical.

When the service environment is similar for multiple jobs, the service instance is kept running for jobs belonging to an application by using the SessionLeave method in the application profile. With the SessionLeave method set to keepAlive, when a MapReduce job finishes and another job is submitted, IBM® Spectrum Symphony checks the value of the previous job's checksum based on associated user files to that of the incoming job. If the checksum value is the same, the service instance is kept running for use by the application associated with the service; if the value is different, the service instance is restarted.

By default, the service instance is set to stay running once it leaves a job.