SevOne Cluster Sizing Methodology
In this guide if there is,
- [any reference to master] OR
- [[if a CLI command contains master] AND/OR
- [its output contains master]],
it means leader.
And, if there is any reference to slave, it means follower.
What is sizing and how are clusters sized?
- How do I size a new cluster?
- What are common ways to add additional capacity to my cluster?
- How do specific use-cases impact my available cluster capacity?"
This guide is divided into two parts.
- Part# 1 - Determine IPS Requirements for the Cluster - provides a methodology for estimating the required Indicators per Second (IPS) and Flows per Second (FPS) of a cluster.
- Part# 2 - Distribute IPS and FPS across SevOne Appliances - provides a methodology for distributing the required IPS and FPS across SevOne appliances.
Basic Concepts
The SevOne platform architecture is a cluster of n-distributed peer appliances,
with one peer appliance elected as the Cluster Leader. The cluster leader is responsible
for federating the cluster data and insights to the presentation layer appliance for reporting,
analytics, alerting, and administration.
Example: SevOne Cluster
- vPAS - collects, stores, and analyzes raw and aggregate time-series metrics.
- vDNC - collects, stores, and analyzes large numbers of raw and aggregate flow records.
- vSDI - is an advanced presentation, analytics, and workflow appliance; vSDI sizing is out of the scope for this guide.
- The Cluster Leader capacity must be equal to or larger than the largest capacity vPAS in the cluster. In large clusters, it is typically recommended that the Cluster Leader refrain from discovery or polling functions.
- A cluster may have a mix of peer types and sizes, but all peers must be on the same SevOne release.
- There is no hard limit to the number of peers within a cluster, allowing SevOne to scale horizontally as needed. Generally, only one Data Insight instance is required per cluster, regardless of its size and may serve multiple independent clusters of vPAS and vDNC.
Part# 1 - Determine IPS Requirements for the Cluster
Clusters are sized according to the number of IPS polled and the number of FPS received. A cluster's total capable IPS and FPS is the cumulative IPS and FPS of all its peers (HSAs excluded). When sizing a cluster, it is important to know your IPS and FPS requirements for the cluster and how IPS and FPS requirements can be distributed among one or more peers in the cluster.
IPS = ((Number of Objects) * (Average Number of Indicators per Object)) / (Polling Rate)
Determining IPS is not an exact science. Typical telecommunications or large enterprise environments are highly eclectic, with multiple domains and device types, resulting in a broad range of object types and associated indicators. This guide aims to serve as a methodology for sizing rather than an exact calculator.
For these reasons, sizing the initial deployment of a cluster in a previously unmonitored network is best done in conjunction with an experienced SevOne technical specialist. They have deep knowledge and experience of specific network vendors, devices, and configurations. That said, some of the largest contributors to the number of monitored objects and indicators in a cluster include:
- Routers come in various capacities and services. A large MPLS router may consume upwards of 25,000 objects (though far fewer in practice using rules-based collection filtering), resulting in 100,000 indicators or more, while a small office router may only have 100 objects, resulting in 1,000 indicators or less. Next to the total number of interfaces, MPLS segments (LSP) are the most common major contributing factors to the number of monitored objects and indicators, as they can be monitored individually.
- Switches, like routers, vary in capacity and services, but unlike routers, they generally have a greater number of interfaces. Each interface maps to an object type with 20 indicators and more, depending on the vendor and switch configuration. Additional services, such as QoS, can greatly increase the number of objects and resulting indicators depending on the number of classmaps, policies, and QoS enabled interfaces. A large switch may have 1,000 or more objects resulting in 20,000 or more indicators.
- Hosts include physical hosts and virtual machines. In the most common cases, they range between 50-200 objects resulting in 1,000 or more indicators
Example
Using the IPS formula above and assuming 20,000,000 indicators with a default polling period of 5 minutes (300 seconds), a cluster’s overall IPS requirement can be calculated.
IPS = 20,000,000 indicators / 300 seconds = 66,667 IPS
Estimating the number of objects for a device in existing clusters is usually easier as there are often representative devices that can be used to extrapolate to the total number of objects and indicators.
In all cases, it is easiest to iterate to a final answer by using a small percentage of representative devices of each device type to be monitored to extrapolate an estimate of the total number of objects and indicators.
Every network is unique and the average number of indicators per object and the number of objects may vary greatly across environments. It is normal to be uncertain about the number of objects (i.e., interfaces, MPLS paths, QoS queues, CPUs, etc.) and IPS, and it is best to iterate towards a final result. A good practice is to first record the current number of objects and IPS for the cluster, then add a small number (say 10-20%) of a representative device and note the increase in the number of objects and IPS after several successful polls have elapsed. The total number of objects and IPS can then be extrapolated using the total number of devices of that type.
The cluster's object count and IPS can be monitored on a per-peer basis by logging into the Cluster Manager and navigating to Administration > Cluster Manager > Peers tab. For example, if you have 10 core MPLS routers (of similar make and configuration) and adding 2 of them to the cluster consumes 20,000 objects and 1334 IPS, then a good estimate of the total number of MPLS objects for 10 core MPLS routers in such environment is ~100,000 objects and 6,667 IPS.
Determine FPS Requirements for the Cluster
The number of flows generated and exported to the vDNC by a flow-capable device is highly dependent on the volume of traffic routed by that device and the specific settings of its flow-enabled interfaces, such as the sampling rate. This results in large variations of flow volume across networks, which can make estimating the total number of flows difficult.
That said, the most common way to calculate the total number of flows is also the most accurate, which is to retrieve the flow rate and the number of flow interfaces directly from the device itself. To calculate the rate of flows exported, export the instantaneous number of flows exported. The table below contains the commands to retrieve the current flow count for the most common network vendors and models.
Manufacturer | Operating System | Command for Instantaneous Flow Count |
---|---|---|
Cisco | iOS | show ip flow export |
Cisco | iOS XR | show flow exporter fem1 location 0/0/CPU0 |
Cisco | NX-OS | show flow export |
Juniper | EX, MX | show services accounting flow |
Arista, Brocade, Foundry | Arista/Network EOS | show flow |
Nokia | SR OS | show cflowd collector |
To calculate the average flow rate over a period of time T1 - T0 seconds,
- Execute the command at T0 and take note of the current flow count, F0.
- Execute the command at T1 and take note of the current flow count, F1.
- The average flow rate for that period can be calculated as,
FPS = (F1 - F0) / (T1 - T0)
The sum of the rate of all flows across all flow-capable devices configured to export to the vDNC is the maximum possible total FPS that a vDNC would be required to process (i.e., the worst-case scenario). The choice of where in the network to collect exported flow records is dependent on flow export and aggregation. Please contact IBM SevOne Support Team for any specific questions regarding flow collection.
In practice, the number of flows (FPS) ultimately processed by the vDNC is nearly always refined
by the policy settings provided by the vDNC, such as allowing or disallowing the processing of
specific flow-enabled interfaces, as seen below.
Number of Flow Interfaces and Flows per Second on a device and interface level; Granular policy for flow-enabled interface
Part# 2 - Distribute IPS and FPS across SevOne Appliances
Determine the number of appliances for a cluster
There are two ways to add capacity to a cluster.
- horizontally (typical and most straightforward)
- vertically (less common, more involved)
Peers are delivered as virtual appliances containing a single Virtual Machine. They are available in several discrete sizes, each with its associated resource requirements.
The following table contains acceptable and tested maximum IPS and FPS fixed for each appliance type.
Appliance Type | vCPU Cores | RAM (GB) | Hard Drives | Flow Limit (FPS) | Max Indicators per Second (IPS) |
---|---|---|---|---|---|
vPAS5k | 2 | 8 | 150GB | - | 333 |
vPAS20k | 8 | 24 | 600GB | - | 1,333 |
vPAS60k | 8 | 44 | 150GB/1.3TB | - | 4,000 |
vPAS100k | 8 |
96 Higher demands (for example, xStats) may require more memory. |
500GB/2TB | - | 6,666 |
vPAS200k | 16 | 220 | 600GB/4TB | - | 13,333 |
vDNC100 | 8 | 16 | 150GB/400GB | 30,000 | - |
vDNC300 | 16 | 48 | 150GB/800GB | 80,000 | - |
vDNC1000 | 24 | 96 | 150GB/1500GB | 80,000 | - |
vDNC1500 | 24 | 128 | 150GB/3000GB | 80,000 | - |
Determine required number of vPAS for a cluster
IPS is impacted by the number of polled objects, the number of indicators monitored on that
object, and the polling rate. SevOne calculates the maximum IPS of an appliance in the following
manner.
IPS = ((Number of Objects) * (Average Number of Indicators per Object)) / (Polling Rate)
Based on the formula above, it is evident that the polling period is
inversely proportional to the appliance's object capacity. For example, halving the polling period,
will halve the appliance's object capacity. Doubling the average number of indicators
will also halve the appliance's object capacity.
To determine the maximum object capacity of an appliance, SevOne assumes an average of 20 indicators per object and a default polling rate of 300 seconds (5 minutes), and since for each appliance there is a maximum IPS, the maximum number of objects an appliance can monitor can be determined. Based on the assumptions, a vPAS100K, with a maximum IPS of 6,667, supports a maximum of 100,000 objects.
Example: How to calculate max object count for appliances based on the appliance's maximum IPS.
Number of Objects (max) = (IPS * Polling Rate) / (Average Number of Indicators per Object)
Max Objects for vPAS100K: (6,667 IPS * 300) / 20 = 100,000 Objects (max)
Max Objects for vPAS2000K: (13,334 IPS * 300) / 20 = 200,000 Objects (max)
Sections Determine IPS Requirements for the Cluster
and Determine FPS Requirements for the Cluster
describe a methodology to distribute a cluster's capacity requirements across SevOne appliances.
Let's use the appliance sizing to distribute the IPS from Part# 1 across SevOne appliances.
In Part# 1, based on the calculation, the cluster requires an estimated 20,000,000 indicators. Assume,
upon investigation using the methodology in Part# 1, it is determined that the average number of
indicators per object is 40, and the network team has decided to use the default of 300 seconds for
the polling period.
(66,667 IPS / 13,334 IPS per vPAS200K) = 5x vPAS200K
There are some cases where a smaller appliance is required. In this case, 10 vPAS100K have the same capacity as 5 vPAS200K.
IPS = (66,667 IPS / 6,667 IPS per vPAS100K) = 10x vPAS100K
Most commonly, there is a mix of appliance sizes. For example, 3x vPAS200K and 4x vPAS100K would also satisfy the requirement.
Architectural and administrative decisions may impact the choice of vPAS or vDNC sizes in your cluster. For example, there may be an administrative or architectural benefit to group polled devices by region, by tenant, by business unit, etc. SevOne's distributed platform maximizes the available deployment architecture options.
Non-standard polling rates or skewed object-to-indicator ratios
- Sizing Example #1: Non-standard polling rate
Assume the user has a vPAS100K with 60,000 objects polled at the standard 5 minute interval. There are a number of critical objects the user would like to poll more frequently to better observe microbursts of traffic. To do this, the user will poll 5,000 objects at a 1-minute interval while continuing to poll the remaining 55,000 objects at the standard interval. Will this appliance have enough capacity?- (55K Objects * 20 Indicators per Object) / 300 seconds = (55,000 * 20) / 300 = 3,667 IPS
(or check Cluster Master for actuals) - (5K Objects * 20 Indicators per Object) / 60 seconds = (5,000 * 20) / 60 = 1,667 IPS
- 3,667 IPS + 1,667 IPS = 5,334 IPS total
Note: The user's vPAS100K, while only monitoring 60K objects out of a maximum of 100K, would be at 80% capacity (5,334 of 6,667 IPS).
- (55K Objects * 20 Indicators per Object) / 300 seconds = (55,000 * 20) / 300 = 3,667 IPS
- Sizing Example #2: Monitored object type has more than 20 indicators per
object
There are many situations in which an object type has more than 20 indicators: RAN Cell monitoring, customized object types, synthetic indicators, custom adaptors, etc. Assume a vPAS200K with 100K monitored objects polled at the standard 5 minute interval. An administrator wants to add 55K objects, all with the same object type, for which the object type has been customized to include approximately 60 polled indicators. Will there be enough capacity for 55K objects on the vPAS200K?- The maximum acceptable IPS for vPAS200K is 16,667.
- The user is currently using (100,000 * 20) / 300 = 6,667 IPS
(or check Cluster Master for actuals) - The user is adding (55,000 * 60) / 300 = 11,000 IPS.
- The total required IPS is 17,667 and is greater than the available 16,667. An additional 1,000 IPS is required and will require an additional vPAS appliance. The appropriate vPAS size depends on a combination of future polling requirements and resource availability.
Determine required number of vDNC for a cluster
If assumed that in Part# 1, there are approximately 800,000 flows per second generated across 11,000
flow-enabled interfaces, the number of vDNC appliances required can be determined.
From table in section Determine the number of appliances for a cluster
above, you will notice that vDNC300, vDNC1000, and vDNC1500 all have FPS limits of 80,000. The 300, 1000, and 1500 denote
the maximum number of flow-enabled interfaces that can processed by a vDNC.
With this knowledge, the simplest case would be to deploy and manage the fewest vDNCs required. Since
all 3 appliances handle the same FPS (80,000), the required number of vDNCs will come down to the
number of flow-enabled interfaces.
Appliance Type | Count | Max FPS (GB) | Max Interfaces | |
---|---|---|---|---|
vDNC300 | 10 | 800,000 | 3,000 | |
vDNC1000 | 10 | 800,000 | 10,000 | |
vDNC1000 | 15 | 1,200,000 | 15,000 | |
vDNC1500 | 10 | 800,000 | 15,000 |
The table above shows that 10 vDNC1500s have the capacity for 800,000 FPS and 15,000 flow-enabled interfaces. However, 15 vDNC1000s can also be used for 15,000 interfaces, resulting in an additional 400,000 FPS (1,200,000 FPS total).
SevOne NPM Data Retention
SevOne allows users to adjust the data retention for polled data on the Cluster Manager page > Cluster Settings tab > subsection Storage. This is a cluster-wide setting and is applied to all peers. When a user adjusts this setting, you get the following warning.
Please contact Expert Labs for sizing guidance before modifying data retention settings.
In the warning message, if you answer Yes without obtaining the guidance from Expert Labs, you are proceeding at your own risk.
The max allowed retention is 730 days (2 years).
Adjust Objects and IP addresses,
Time | 12 Months (default) | 18 Months | 24 Months | |||
---|---|---|---|---|---|---|
NMS Size | Objects | Max IPS | Objects | Max IPS | Objects | Max IPS |
vPAS5k | 5,000 | 333 | 3,750 | 250 | 2,500 | 166 |
PAS10k | 10,000 | 666 | 7,500 | 500 | 5,000 | 333 |
PAS20k / vPAS20k | 20,000 | 1,333 | 15,000 | 1,000 | 10,000 | 666 |
PAS40k | 40,000 | 2,664 | 30,000 | 2,000 | 20,000 | 1,333 |
PAS60k / vPAS60k | 60,000 | 4,000 | 45,000 | 3,000 | 30,000 | 2,000 |
vPAS100k | 100,000 | 6,666 | 75,000 | 5,000 | 50,000 | 3,333 |
PAS200k / vPAS200k | 200,000 | 13,333 | 150,000 | 10,000 | 100,000 | 6,666 |
PAS300k | 300,000 | 20,000 | 225,000 | 15,000 | 150,000 | 10,000 |
where, IPS = Indicators per Second
Adjust Storage - this is another option to increase your storage to account for increased data retention.
Time | 12 Months (Default) | 18 Months | 24 Months |
---|---|---|---|
NMS Size | Storage Size | Storage Size | Storage Size |
vPAS5k | 150GB | 225GB | 300GB |
PAS20k / vPAS20k | 600GB | 900GB | 1.2TB |
PAS60 / vPAS60k | 1.3TB | 2TB | 2.6TB |
vPAS100k | 2TB | 3TB | 4TB |
PAS200k / vPAS200k | 4TB | 6TB | 8TB |