Estimating internal storage space

If you plan to store data locally in your IBM® API Connect analytics deployment, estimate disk space requirements.

This information applies only to data that is stored in API Connect analytics and not to data that is offloaded to a third-party system. The formula and guidelines are based on known information about how the analytics service stores data, and cannot be directly applied to any other storage system.

Use the following guidelines to calculate a rough estimate of the amount disk space you need for storing stateful data in the API Connect analytics microservices.

Data storage: calculate how much data you want to store

The amount of disk space needed for storing analytics data is determined by the following factors:

Number of copies of the analytics data
Each copy of the analytics data must live on a separate node. The number of nodes in your deployment (based on your chosen profile in Planning the analytics profile, storage class, and storage type) determines the number of copies of analytics data that is stored in your deployment. With the three replica deployment profile, analytics data is replicated to three nodes (two replicas and one primary copy of the data) for storage. In the one replica profile, you only need to store one copy of the data.

Number of copies of data:

Copies of data = [1 | 3]
Number of days that data is retained
By default, analytics data is retained for 90 days, but you can modify the retention setting as needed. Make sure you know how long you want to store the data before attempting to calculate required disk space.

Number of days that data is retained:

Data retention = [90 days | preferred length]
Amount of each type of data stored
The required storage space for each type of logging is highly dependent on your APIs and usage. For each API, you can configure activity logging for the API activity, header, and payload. You can also customize the data to add, redact, or remove fields, which also impacts the amount of data that you store.

To estimate storage needs, calculate the average size of each type of log. When calculating your estimates, remember that the header logging size is the sum of activity logging size and the average size of your headers. The payload logging size is the sum of the header logging size and the average size of your payloads. Typically the average size of an activity logged event is 600-1000 bytes depending on the uniqueness and complexity of your analytics data. This number is highly dependent on your APIs and your implementations. For a rough estimate, you can use an average of 800 bytes per activity logged event.

If you choose to add fields, calculate the average size of the new fields as well, and add that number to all types of log policies. If you choose to remove fields, you should not subtract this size from the log policies unless you are also removing headers and/or payloads.

Amount of each type of data that is stored:

Activity log bytes per call = [600-1000 bytes]
Header log bytes per call = Activity log bytes per call + Average size of headers
Payload log bytes per call = Header log bytes per call + Average size of payloads
Percentage of each type of data
Estimate the percentage of each type of log (activity, header, payload) for all API calls.

If you follow best practices of using only activity logging for production environments and using only payload logging for test environments, this number is easy to determine. If you use different log policies per API, and they depend on "success" or "error" factors, the percentage is more difficult to determine. Typically, if you do not use an all-or-nothing method for logging, error rates range from 3% to 25% with subsequent payload logging in test and production environments. However, this is entirely dependent on your use case and your APIs.

Percentage of each type of data that is stored:

% of Activity log = [0, 100 or other estimate]
% of Header log = [0, 100 or other estimate]
% of Payload log = [0, 100 or other estimate]
Estimated number of API calls per month
When planning your API Connect deployment, this number is helpful. When estimating analytics storage, this number is vital because it is directly correlated to how much storage you need for your deployment.

If you do not know the number of calls per month, but you do know the number of calls per second, use the following formula to convert it to calls per month:

Calls per month = Calls per second * 86400 seconds per day * 30 days per month

Number of API calls per month:

Calls per month = [any estimate]

Calculating your disk space requirement

Estimate the disk space requirement for each storage node by completing following calculations.

Formula
Bytes per call = (% of Activity Log * Activity Log bytes per call) + (% of Header Log * Header Log bytes per call) + (% of Payload Log * Payload Log bytes per call)
Calls per retention period = Calls per month * (Number of days retained / 30 days per month)
Storage for API calls = Calls per retention period * Bytes per call * Copies of data
Total storage = Storage for API calls + Overhead
Storage per node = Total storage / Nodes
Details
  1. Bytes per call:

    Estimate the number of bytes that are logged for a single copy of each API call. This can be calculated from the prerequisites of the percentage of each data type and the amount of each data type stored.

    Bytes per call = (% of Activity Log * Activity Log bytes per call) + (% of Header Log * Header Log bytes per call) + (% of Payload Log * Payload Log bytes per call)
    
  2. Calls per retention period:

    Calculate the anticipated number of API calls per retention period. This can be calculated from the prerequisites of the estimated calls per month and your desired data retention.

    Calls per retention period = Calls per month * (Number of days retained / 30 days per month)
    
  3. Storage for API Calls:

    Calculate the storage needed for all copies of your api calls for your retention period. This can be calculated from steps 1 and 2, as well as the prerequisite of your total copies of your data.

    Storage for API calls = Calls per retention period * Bytes per call * Copies of data
    
  4. Total Storage:

    Calculate the total storage you need by adding buffer space to the value from step 3. The analytics service requires some overhead space to complete its operations; for example, to contain system data and temporary debug header or payload logging. In addition, allowing extra space provides a buffer in case you underestimated the number of calls, or experience an unexpected increase.

    There is no specific value for the buffer because it's based on your own situation. One approach is to use a value that rounds the Storage for API calls result from step 3 to the next whole number. Make sure the rounding leaves you with a comfortable amount of additional space. For example, if the result from step 3 is 380 GB, then adding 20 GB to reach 400 GB is probably not sufficient and you should consider rounding to a larger value such as 500 GB.

    Total storage = Storage for API calls + Buffer
    
  5. Storage per node:

    Calculate the total storage amount required per storage node. The number of storage nodes is dependent on your deployment profile. For the one replica profile, use 1. For the three replica profile, it defaults to 3. If you manually scaled the storage microservices to be greater than 3, use your actual values.

    Storage per node = Total storage / Nodes
    
Remember: This result is only an estimate. You should monitor the use of space over time and adjust storage as needed.

Save this information for when you are updating the analytics CR for your installation. Kubernetes - Creating the analytics CR, OpenShift - Analytics CR settings

Example
Deployment information:
Copies of data = 3
Data retention = 90 days
Activity log bytes per call = 850 bytes
Header log bytes per call = 15k bytes
Payload log bytes per call = 30.5k bytes
% of Activity log = 100%
% of Header log = 0%
% of Payload log = 0%
Calls per month = 64 million

Formula:

Bytes per call = (% of Activity Log * Activity Log bytes per call) + (% of Header Log * Header Log bytes per call) + (% of Payload Log * Payload Log bytes per call)
Calls per retention period = Calls per month * (Number of days retained / 30 days per month)
Storage for API calls = Calls per retention period * Bytes per call * Copies of data
Total storage = Storage for API calls + Overhead
Storage per node = Total storage / Nodes

Details:

  1. Bytes per call = 850 bytes
    (100% of Activity Log * 850 bytes) + (0% of Header Log * 15k bytes) + (0% of Payload Log * 30.5 bytes)
    
  2. Calls per retention period = 192 million calls
    64 million calls per month * (90 days retained / 30 days per month)
    
  3. Storage for API calls = 489.6 GB
    192 million calls per period * 850 bytes per call * 3 copies of data
    
  4. Total storage = 600 GB
    489.6GB storage + Buffer
    

    Since rounding to 500 GB only provides 10.4 GB of extra space, consder rounding to 600 GB instead.

  5. Storage per node = 200 GB
    600GB Total storage / 3 nodes
    

In this example, the estimated disk needed on each node for storage microservices is 200GB.