IBM Support

Mustgather: Back up and Compaction of LQE database fails or take a long time to run

Troubleshooting


Problem

Attempts to back up or compact the Lifecycle Query Engine (LQE) in IBM Jazz Reporting Service (JRS) is failing and no error is displayed or the back-up or compaction is taking a long time to run.

Symptom

When you manually run or schedule the back up and compaction of the LQE index, you do not get a completion notice or it fails with an error in the GUI or the LQE log files.

Cause

There are a number of issues that can cause the processes to fail or execute slowly.
Back up:
  • Large Index
  • Lack of Disk space for the back up
  • Lack of Disk space to compress the back up file
  • Back up fails to connect to the LQE relational database
  • Closed Resultset
  • Back up Schedule
Compaction:
  • Disk Space Requirements
  • Java Heap Space 
  • CRLQE1288E Compaction failed to rename the existing index directory
  • Scheduling the Compaction

  • Compaction Takes a long time to run

Diagnosing The Problem

  • Check the space available on the backup disk.
    When configuring, the backup, you must specify the folder where the backup is placed. See how much space is available on the disk.
     
  • Check the size of the LQE Repository
    The LQE repository is stores in the <LQE Install Folder>\server\conf\lqe. It consists of the folders: 
    historyTdb, historyText, indexTdb, shapeTdb, shapeText, textIndex, versionTdb.
     
  • Configure debug logging for the backup and compaction process:
    1. For ELM version 7.0.2 Service Release 1 (SR1) and 7.0.1 Service Release 1 (SR1) we adopted log4j v2 so you edit the conf/lqe/log4j2.xml file and add the following:

      <!-- LQE Backup and compaction debug logging -->
      <Logger name="com.ibm.team.integration.lqe.lib.backup.BackupScheduler" level="DEBUG" additivity="false">
      <AppenderRef ref="mainLog"/>
      </Logger>
      <Logger name="com.ibm.team.integration.lqe.lib.backup.impl.BackupTask" level="DEBUG" additivity="false">
      <AppenderRef ref="mainLog"/>
      </Logger>
      <Logger name="com.ibm.team.jis.lqe.compaction.CompactionScheduler" level="DEBUG" additivity="false">
      <AppenderRef ref="mainLog"/>
      </Logger>
      <Logger name="com.ibm.team.jis.lqe.compaction.CompactionTask" level="DEBUG" additivity="false">
      <AppenderRef ref="mainLog"/>
      </Logger>
      <Logger name="com.ibm.team.jis.lqe.compaction.CompactionUtils" level="TRACE" additivity="false">
      <AppenderRef ref="mainLog"/>
      </Logger>

       
    2. For ELM versions earlier than 7.0.2 Service Release 1 (SR1) and 7.0.1 Service Release 1 (SR1), edit the conf/lqe/log4j.properties and add the following lines
      1. For backup issues: 
        log4j.logger.com.ibm.team.integration.lqe.lib.backup.BackupScheduler=debug
      2. For compaction issues:
        log4j.logger.com.ibm.team.jis.lqe.compaction.CompactionScheduler=debug
        log4j.logger.com.ibm.team.jis.lqe.compaction.CompactionTask=debug
        log4j.logger.com.ibm.team.jis.lqe.compaction.CompactionUtils=trace
    3. Reload the log4j from the LQE UI.
      On the LQE Administration page, under Configuration, click Advanced Properties.
      In the Reload Log Properties section click reload
    4. Review the console.log on WAS Liberty servers to confirm the changes were formatted correctly and did not cause an error by searching for log4j2.xml and reviewing any errors.
       
  • Run a manual compaction or backup.
    Back up: Backing up and restoring Lifecycle Query Engine and Link Index Provider (LDX)
    Compaction: Improving the Lifecycle Query Engine performance by compacting indexed data
     
  • Collect the logs when the back up or compaction is finishedRunning IBM Support Assistant Data Collector
     
  • Take some Java™ cores.
    When you have a long running back up or compaction, or you are not sure it is processing still, then you can take a series of Java™ cores for investigation. You need to be aware that the compaction process starts a new Java™ process. Therefore, to investigate the compaction process you need to identify the compaction process and get core dumps from that process. On Windows this process is called java.exe. 

    You might take 4 or 5 Java cores. One every 5 minutes. 
  • Generating Javacores and Userdumps Manually For Performance, Hang or High CPU Issues on Windows
    How to generate javacores/thread dumps, heapdumps and system cores for the WebSphere Application Server Liberty profile

    IBM Runtime Diagnostic Code Injection for the Java Platform (Java Surgery)
     
  • Location of the LQE index files.
    You can confirm the location of the Index files for LQE at https://<server>:<port>/lqe/web/health/stats.
    The location of the files is shown under 'Node Statistics'> 'Dataset Statistics'. Change the 'Partition' to see the Location and size of each partition.
    The LQE index location can be different to the default location two ways.
    JVM system Property: Check the server.startup.bat file and the value of -Dlqe.config.location parameter in the JAVA_OPTS
    lqe.data.root property: Look in the .../server/conf/lqe/lqe.properties file for the value of lqe.data.root

Resolving The Problem

Analyze the logs for error messages and take the appropriate corrective actions.
Backup:
  • Large Index
    When the index is large, the back-up is going to take a long time. You might run a compaction on the LQE repository first to try to reduce the size as much as possible. 
     
  • Lack of Disk space for the back-up
    The free disk space requirement for the back-up folder is (twice the size of each Tdb) + 300k (100k per Tdb).
    Get the current size of the index and calculate how much space is required for the back-up. 
    You need to increase the size of the back-up disk and remove any unnecessary files from the disk.
    An LQE back-up is no longer useful when the back-up is older than the re-base period of QM or RM's TRS feeds, which at most is 30 days. So there's no need to keep LQE back-ups that are more than a month old.
    When this problem occurs, you might see these errors in the log files:
    • [lqe.BackupScheduler0-task-thread-0] TRACE bm.team.integration.lqe.lib.backup.impl.BackupTask - needed: 36976774418 - available: 27755905024
      This error indicates that the back-up needed around 37 GB of free space to run, but only around 28 were available. The solution in this case is to free up at least 10 GB of disk space.
      To see this message you need to enable trace logging instead of debug.
    • com.ibm.team.integration.lqe.lib.backup.impl.BackupException: CRLQE1278E Backup cannot start because there is not enough available disk space. Backup needs at least 439 GB of available space. Get more available disk space and try again.
      Again this error indicates you need more disk space for the back-up files.
  • Lack of Disk space to compress the back-up file
    When the compress back-up option is selected, then a second size check is done after the back-up folder is created. That second size check requires you have enough free disk space equal to the size of the back-up folder.
  • Back-up fails to connect to the LQE relational database

    [lqe.BackupScheduler0-task-thread-0] [ERROR] bm.team.integration.lqe.lib.backup.impl.BackupTask - CRLQE0513E An I/O error occurred during backup. The database server may be unavailable.
    This error indicates that the database server was not available. LQE retries a number of times to see whether it becomes available before it gives up. You need to check with the Database Administrator to see why the server was unavailable.

  • Closed Resultset
    [lqe.BackupScheduler0-task-thread-0] [ERROR] bm.team.integration.lqe.lib.backup.impl.BackupTask - CRLQE0475E A fatal error occurred during backup.

    com.ibm.team.integration.lqe.lib.backup.impl.BackupException: java.io.IOException: java.sql.SQLException: Closed Resultset

    The error indicates that the JDBC Connection was closed while it was reading data from the DB Server.

    LQE back-up is encountering the "Closed Resultset" when it is reading data from two tables in the DB Server (in blocks of 1,000,000 per sql statement) and writing that information to disk. The JDBC connection is more than likely being closed by the DB server while the write is happening. When the disks write speed on the LQE server is too slow, it's possible the JDBC connection was idle for too long and the DB server closed the connection. Meanwhile LQE is still using it. You might try the following steps:

    • Check the settings on the DB server: Have their database administrator check the DB settings to determine when a connection is automatically closed by the server. This value might need to be increased.

    • Remove the idle limit in LQE: In the conf/lqe/dbconnection.properties file, add a line to set "maxIdle=-1" for no limit.
      This change requires a restart the LQE application.
    • Reduce the block size for the back-up: You can change the blocksize setting in the JVM system properties for WebSphere Application Server: "-Dlqe.backup.table.blocksize=250000" or even smaller, This reduces the data read in each transaction. This will be quicker to write to disk and returns within the DB timeout period. This change requires a restart of the LQE server for the change to take effect. 
      Note: Reducing the value of this parameter can result in longer back-up times. This issue is discussed in: PH33370: BACKUP OF LQE TAKES TOO LONG FOR LARGE DATABASES IN JAZZ REPORTING SERVICE (524085)

  • Scheduling the back-up
    When you schedule the back-up, set it for after a compaction completes. Make sure you give the compaction enough time to complete so the two processes are not running at the same time. This behavior can compromise the validity of the back-up files.
    When the Back-up is running, you cannot query the index. Including a Metamodel Refresh. The refresh is scheduled to run at 6:00AM and 6:00PM server time each day. Try to ensure the back-up does not happen at the same time.
Compaction:
  • Disk Space Requirements
    The free disk space requirement for the compaction is the total size of each Tdb.
    This size check is done against the location of the index folders.
  • Java Heap Space
    When you experience problems with the Java Heap space, you might see this error:
    [lqe.CompactionScheduler0-task-thread-0] [ERROR] com.ibm.team.jis.lqe.compaction.CompactionTask - CRLQE0809E A fatal error occurred during compaction. com.ibm.team.jis.lqe.compaction.CompactionException: JVMDUMP039I Speicherauszugsereignis "systhrow", Detail "java/lang/OutOfMemoryError" um 2020/06/09 00:39:02 - bitte warten.
    Here is some detail from the Javacore:
    WARNING : OutOfMemoryError possibly caused by 704648512 bytes requested for object of class 0000000000948F00 from memory space 'Generational' id=00007FF5F9978740 
    Cause of thread dump : Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Java-Heapspeicher" received 


    This error indicates the compaction is running out of memory trying to process large string literals in the Tdb. You need to increase the compaction heap from the default 4GB to 16GB.

    The compaction process runs in its own JVM and manages its own heap space. The compaction heap does not come out of the LQE JVM heap. The setting can be changed In LQE under Advanced Properties-> Compaction-> Maximum Heap. The Compaction heap is only used during compaction time and is released back to the OS when compaction completes. You need to make sure you have enough physical memory available on the server for the compaction process to use for heap space. The ' Compaction Maximum Heap' is a Dynamic parameter. It does NOT require a restart of the LQE server.
  • CRLQE1288E Compaction failed to rename the existing index directory

    [lqe.CompactionScheduler0-task-thread-0] [ERROR] com.ibm.team.jis.lqe.compaction.CompactionTask   - CRLQE0809E A fatal error occurred during compaction.
    com.ibm.team.jis.lqe.compaction.CompactionException: CRLQE1288E Compaction failed to rename the existing index directory D:\IBM\Server\indices\lqe\indexTdb. Compaction makes a temporary backup of the existing data in case the compaction fails. Make sure that the index directory is accessible and Lifecycle Query Engine has permission to rename it, then try again. 


    The compaction process tries the rename process a number of times before it succeeds or fails. 

  • Scheduling the Compaction
    Schedule the compaction to happen when the server is not busy and before the back up is scheduled.
    When the compaction does not reduce the size of the index significantly, there is no benefit in scheduling it frequently. When the total index size changes over 5% between compactions, you might perform the compaction more frequently. Monitor the size of the index to make sure it does not grow rapidly. 
    When the Compaction is running, you cannot query the index. Including the Metamodel Refresh. This is scheduled to run at 6:00AM and 6:00PM server time each day. Ensure the back up does not happen at the same time.
     

  • Compaction takes a long time to run
    When compaction takes a long time to run, you can reduce the time by increasing the heap space as discussed in Java Heap Space

When the logs with the additional debug entries don't indicate any clear cause,  contact IBM support to open a case.

Document Location

Worldwide

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTU9C","label":"Jazz Reporting Service"},"ARM Category":[{"code":"a8m0z000000CbMZAA0","label":"Jazz Reporting Service-\u003ELifecycle Query Engine-\u003EAdministration"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB59","label":"Sustainability Software"}}]

Document Information

Modified date:
23 April 2024

UID

ibm10874356