Distributed Server Architecture
IBM Knowledge Center Documentation: 9.5
IBM BigFix Distributed Server Architecture (DSA) has a highly sophisticated built-in ability to install multiple servers that will replicate information from each other for the purpose of disaster recovery. When configured and in the event of a failure of one server, other servers will automatically takeover as fully-functional servers (will receive data from the relays and clients and accept console connections). When the failed server is restored, it will automatically receive updated information.
The DSA architecture health is dependent upon the health and efficiency of the database replication process facilitated by the FillDB service. If actions are successfully propagated (in the Console) and the database has successfully replicated (see the Replication tab in the BigFIx Admin tool) actions will run appropriately on all child endpoints.
However if the Primary DSA server is disabled after action propagation but before successful database replication then the secondary will NOT have the newly propagated actions. In this case, children of the Secondary server will not be provided the version of the actionsite that contains changes prior to replication and as such the Secondary will not receive the new actions or associated downloads for those actions. In this case the desired actions would need to be taken again from the Secondary.
In all cases, clients will continue to be provided an actionsite (containing open actions) for gathering and you can continue to manage the deployment and take new actions from the Secondary server.
- Note: Action completion time might look slower. For more information, see Completion time of action taken from Replication Server.
The term 'high availability' refers to disaster or event recovery that is immediate and indeed in any case a secondary DSA server is immediately available for deployment management. However, the actionsite and content data that is being replicated between the Primary and Secondary servers should not be considered 'high availability' as it depends on a database replication process rather than a real time load balanced and concurrent data process.
- The BigFix services on the DSA servers must all use the same database authentication method (i.e. all, NT Authenticated Domain Users/Groups or all SQL Authentication)
- The DSA servers must all run on the same OS server platform, type and version, (i.e. all Windows 2012 or All RHEL 7.2)
- The DSA servers should have similar hardware configurations and performance characteristics in terms of CPU, memory, disk, and overall system performance (otherwise your performance will suffer in the event of a failure) and replication may not work optimally or as expected.
- The DSA servers must must have the same type and version of the SQL or DB2 (i.e. SQL Server 2014 or all DB2 10.5) database installed.
Click the following links to go to the topic you are interested in:
- Authenticating Additional Servers (DSA)
- Installing Additional Windows Servers
- Installing Additional Linux Servers (DSA)
- Uninstalling a Windows replication server
- Uninstalling a Linux replication server
- Managing Replication (DSA) on Windows systems
- Managing Replication (DSA) on Linux systems
- Configuration for Relay Failover
- Message Level Encryption and DSA
Verify and Manage DSA Replication
The Replication tab of the BigFix Admin tool is the only user interface in BigFix to properly verify that replication is proceeding successfully between the Primary and Secondary servers. The tool reports important information such as Server, Distance, Expected Latency, Last Replication Time, and Last Error Message each of which can be used to troubleshoot any potential issues. Additional messages can be found in the FillDB.log files located in the ...\FillDBData folder (Windows: \Program Files (x86)\BigFix Enterprise\BES Server; RHEL: /var/opt/BESServer)
- Note: Be patient as the initial replication can and will take time depending upon the size of the database size and network latency between the DSA Servers.
The DSA Replication Interval
If you are using Distributed Server Architecture (DSA) and replication is failing with the error message 'Replication was interrupted to process server database insertions.' in the IBM Endpoint Manager Administration tool, you'll need to raise the maximum amount of time spent doing replication on the TEM Server that is failing.
To increase the maximum replication time, set the following server setting on the BigFix DSA servers.
UnInterruptableReplicationSeconds (DWORD): Seconds
Note: You must restart the FillDB service for changes to take effect.
By raising the value, the BigFix Server will spend more time performing replication each time it attempts to do so based on the replication interval. The error is caused because the BigFix Server is unable to complete replication using the default value.
For larger deployments of BigFix, try a value of 60-120 seconds. If you are installing a new TEM Server, you might raise the value to 300-600 seconds during the initial replication period to reduce the amount of time spent initializing the new BigFix Server.
- Caution: Increasing the UnInterruptableReplicationSeconds extends the time given to the FillDB process for the purpose of DSA replication. However, it takes time away from the FillDB for inserting client reports in the database. If the UnInterruptableReplicationSeconds setting is set too high for too long it can backup the client reports on the relays and servers and cause a cascading reporting failure in the deployment. If this happens, clients will appear offline and reports from the endpoints will be be delayed.
It might seem that the completion time of an iOS action is slower if the device is connected to a replication server instead of the master server.