IBM Support

How to use the 'connection_polling_timeout' FAP mechanism, to reduce the likelihood of Data Marts going to 'Error' when the TM1 cube (TM1 server) is temporarily unavailable (for example TM1 server is rebooting)

Troubleshooting


Problem

Customer is finding that their Data Mart's status is changing to 'Error'. This occurs when the TM1 server has a problem (for example the TM1 cube is being restarted). Example: During a maintenance period, customer reboots all the application servers. The FAP service is small/lightweight and therefore starts quickly. The TM1 server consumes more memory, therefore can take longer to start. This means that the FAP service may start when the TM1 server is unavailable. Is there a way to allow Controller's FAP service to survive a temporary loss of connection to the TM1 cube (TM1 server)?

Symptom

Scenario #1 (main/most-relevant)
This is the most important scenario (which the 'connection_polling_timeout' parameter was mostly designed for).

Initially the TM1 server/cube is not running. Then the customer starts the Windows service 'IBM Cognos FAP Service'. User opens the FAP client, and opens 'Data Mart' and tries to start the initial publish. The status changes to 'error'.

  • Even if the customer starts their TM1 server/cube now, the data mart status will not change from 'error'.


Scenario #2 (less relevant to the 'connection_polling_timeout' feature)
Initially the TM1 server, and FAP service are both running OK. The Data Mart status is set to 'running'. Suddenly the TM1 server/cube is stopped.
  • User opens the FAP client, and opens 'Data Mart'. User sees that the Data Mart status has changed to 'Error'.

Cause

There are other potential causes for similar symptoms.

  • TIP: For example, see separate IBM Technote #2012745 for a similar scenario.

This Technote relates to the scenario where the cause is a limitation of older versions of Controller (10.3.1.60 or earlier, plus also 10.3.1100.156).

  • Specifically, in older versions of Controller, if the TM1 server is not started (not available) when the Windows service 'IBM Cognos FAP Service' is started, then this will cause all data marts to go into an 'Error' state. This makes those Data Marts unable to resume without a full initial publish.
  • In later versions of Controller (for example 10.3.1100.159 and later) the new 'connection_polling_timeout' FAP mechanism allows for the resumption of the datamart if the TM1 server was not started (available) when the Windows service 'IBM Cognos FAP Service' is started.

More Information:

After enabling the 'connection_polling_timeout', the Data Mart can survive a FAP service restart (even if the TM1 server is unavailable). Specifically, what happens (in the scenarios described above in 'Symptom' section) is:

  • Scenario #1

When the data mart is started (with the TM1 server unavailable), the data mart will initially go into the status "Waiting for TM1 connection":


Assuming that the 'connection_pooling_timeout' has not been exceeded, then when the TM1 server is available (for example it has finished rebooting) the data mart status will change to 'Initial Publish' and (eventually) 'Running'.
  • Scenario #2
When the TM1 server is stopped, the data mart status (inside the 'FAP client') will change to "Publish new children":


Soon afterwards, it will change to "Running":

From then onwards it will alternate between these two values ("Publish new children" and "Running").

When the TM1 server is finally up and running (assuming that the 'connection_pooling_timeout' has not been exceeded) then the data mart status will change to 'Running' forever.

Resolving The Problem

Fix:

To make the Controller FAP service more resilient (less likely to error, if the TM1 server is temporarily unavailble) perform both of the following:


    (1) Make sure you are using a modern version of Controller, which include the new 'connection_polling_timeout' feature:
      • Controller 10.3.0 FP1 IF6 (10.3.1.64) or later 10.3.0.x version.
      • Controller 10.3.1 IF1 (10.3.1100.159) or later.
    (2) Add a parameter connection_polling_timeout into the file 'FAPService.properties'.

Steps to configure 'connection_polling_timeout':

1. Logon to the Controller application server

2. Browse to the FAP folder

  • TIP: By default, this is here: C:\Program Files\ibm\cognos\ccr_64\server\FAP

3. As a precaution, create a backup copy of the file: FAPService.properties

4. Open the file FAPService.properties inside NOTEPAD

5. Add a line similar to: connection_polling_timeout=3600000

  • TIP: The value controls the amount of time (in milliseconds) the connection waits for the TM1 server to come up. If this timout value is exceeded, then the FAP datamart will go to 'ERROR' state
  • The above value (3600000) is 60 minutes, which should allow enough time for most situations.

6. Save changes

7. Restart the Windows service 'IBM Cognos FAP Service'.

----------------------------------------------------------

NOTE: An example of this file is supplied inside a file 'FAPService.properties.new' which is copied during an installation of the relevant versions of Controller


----------------------------------------------------------

Workaround:

1. Ensure that the TM1 server is available (working OK) before continuing

2. Restart the Windows service 'IBM Cognos FAP Service'

3. Perform an Initial Publish.

[{"Product":{"code":"SS9S6B","label":"IBM Cognos Controller"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Controller","Platform":[{"code":"PF033","label":"Windows"}],"Version":"10.3.1;10.3;10.2.1;10.2.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
15 June 2018

UID

swg22012681