Distributed Server Architecture
IBM Endpoint Manager (IEM) Distributed Server Architecture (DSA) has a highly sophisticated built-in ability to install multiple servers that will replicate information from each other for the purpose of disaster recovery. When configured and in the event of a failure of one server, other servers will automatically takeover as fully-functional servers (will receive data from the relays and clients and accept console connections). When the failed server is restored, it will automatically receive updated information.
The DSA architecture health is dependent upon the health and efficiency of the database replication process facilitated by the FillDB service. If actions are successfully propagated (in the Console) and the database has successfully replicated (see the Replication tab in the Endpoint Manager Admin tool) actions will run appropriately on all child endpoints.
However if the primary DSA server is disabled after action propagation but before successful database replication then the secondary will NOT have the newly propagated actions. In this case children of the Secondary will not be provided the version of the actionsite that contains changes prior to replication and as such the Secondary will not receive the new actions or associated downloads for those actions. In this case the desired actions would need to be taken again from the Secondary.
In all cases clients will continue to be provided an actionsite (containing open actions) for gathering and you can continue to manage the deployment and take new actions from the Secondary server. Note: Action completion time might look slower. For more information, see Completion time of action taken from Replication Server.
The term 'high availability' refers to disaster or event recovery that is immediate and indeed in any case a secondary DSA server is immediately available for deployment management. However, the actionsite and content data that is being replicated between the Primary and Secondary servers should not be considered 'high availability' as it depends on a database replication process rather than a real time load balanced and concurrent data process.
- You must choose an authentication mechanism (either NT Authenticated Domain Users/Groups or SQL Authentication). All servers need to use the same authentication mechanism.
- The DSA servers should be roughly similar performance characteristics in terms of CPU, memory, disk, and overall system performance (otherwise your performance will suffer in the event of a failure).
- The DSA servers must all have the same version of SQL Server installed (either SQL Server 2000 or SQL Server 2005).
Click the following links to go to the topic you are interested in:
Verify and Manage DSA Replication
The Replication tab of the TEM Admin tool is the only way to properly verify successful replication between Servers. The tool will report important information such as Server, Distance, Expected Latency, Last Replication Time, and Last Error Message each of which can be used to troubleshoot any issues.
If you believe you are experiencing an error you can further troubleshoot by referring to the Filldb.log located by default in the following location: C:\Program Files\BigFix Enterprise\BES Server\FillDBData
Note: Please be patient as initial replication can and will take time depending upon database size and latency between Servers.
Increasing the replication interval for DSA
If you are using Distributed Server Architecture (DSA) and replication is failing with the error message 'Replication was interrupted to process server database insertions.' in the IBM Endpoint Manager Administration tool, you'll need to raise the maximum amount of time spent doing replication on the TEM Server that is failing.
To increase the maximum replication time, set the following registry key on the TEM Server.
UnInterruptableReplicationSeconds (DWORD): Seconds
Note: You must restart the FillDB service for changes to take effect.
By raising the value, the TEM Server will spend more time performing replication each time it attempts to do so based on the replication interval. The error is caused because the TEM Server is unable to complete replication using the default value.
For larger deployments of TEM, try a value of 60-120 seconds. If you are installing a new TEM Server, you might raise the value to 300-600 seconds during the initial replication period to reduce the amount of time spent initializing the new TEM Server.
Caution: Increasing the UnInterruptableReplicationSeconds extends the time given to the FillDB process for the purpose of DSA replication. However, it takes time away from the FillDB for inserting client reports in the database. If the UnInterruptableReplicationSeconds setting is set too high for too long it can backup the client reports on the relays and servers and cause a cascading reporting failure in the deployment. If this happens, clients will appear offline and reports from the endpoints will be be delayed.
It might seem that the completion time of an iOS action is slower if the device is connected to a replication server instead of the master server.
For more information about DSA, see the following resources: