Limit the number of batch queries

About this task

In large clusters, job querying can grow quickly. If your site sees a lot of high traffic job querying, you can tune LSF to limit the number of job queries that mbatchd can handle. This helps decrease the load on the management host.

If a job information query is sent after the limit has been reached, an error message ("Batch system concurrent query limit exceeded") is displayed and mbatchd keeps retrying, in one second intervals. If the number of job queries later drops below the limit, mbatchd handles the query.

Procedure

  1. Define the maximum number of concurrent jobs queries to be handled by mbatchd in the parameter MAX_CONCURRENT_QUERY in lsb.params:
    • If mbatchd is not using multithreading, the value of MAX_CONCURRENT_QUERY is always the maximum number of job queries in the cluster.
    • If mbatchd is using multithreading (defined by the parameter LSB_QUERY_PORT in lsf.conf ), the number of job queries in the cluster can temporarily become higher than the number specified by MAX_CONCURRENT_QUERY.

      This increase in the total number of job queries is possible because the value of MAX_CONCURRENT_QUERY actually sets the maximum number of queries that can be handled by each child mbatchd that is forked by mbatchd. When the new child mbatchd starts, it handles new queries, but the old child mbatchd continues to run until all the old queries are finished. It is possible that the total number of job queries can be as high as MAX_CONCURRENT_QUERY multiplied by the number of child daemons forked by mbatchd.

  2. To limit all batch queries (in addition to job queries), specify LSB_QUERY_ENH=Y in lsf.conf.

    Enabling this parameter extends multithreaded query support to all batch query requests and extends the MAX_CONCURRENT_QUERY parameter to limit all batch queries in addition to job queries.