Performance graphs

Performance graphs are available for devices and vaults. The graphs allow the user to view the system over six default time ranges and also allow for custom time ranges to be entered.

To access the Device Graphs.

  1. Click Monitor tab.
  2. Click Devices in the navigation panel.
  3. Select the wanted Devices tab. It generates the performance graphs.

To view graphs for an individual vault.

  1. Click Monitor on the Summary section of the wanted vault. To get to the page, click Vaults in the navigation panel and select the vault.
  2. Performance graphs can be expanded or hidden; hidden is the default.
    • Click the plus icon to expand the graph.
    • Click the minus icon on the graph's title to hide the graph.

Standard computer performance parameters are calculated by the device or collected via SNMP, lm-sensors, and so on, from its system board and system-specific information for the Accesser®, and Slicestor® devices. Axes for all graphs are scaled automatically depending upon the time selected. By default, six-hour graphs are shown. The time range can go from 5 minutes to yearly. The time frame can be selected by the dropdown under the Performance action bar or by changing the To and From dates to an acceptable range.

Several basic capabilities exist to support further inspection and troubleshooting:

  • Select the icon in the upper right corner of the graph to expand it.
  • Click Export to get data provided in .csv format. This data can be analyzed further by tools.
  • Click Share to get a link to the current page and time range.
  • Click the device or vault links to go directly to their pages. It is only available in some graphs, such as Scanned Sources and the Outgoing/Incoming Rebuild.
  • Click Hide All or Show All to hide or show all lines.
  • Click the line in the legend to hide or show individual lines.
  • Click buttons that appear on hover to pan left or right through the data.
  • Click and drag on the graph to select a new time range to zoom into based on the range that is represented in the highlighted section.

Vertical Crosshairs are synchronized across all graphs and can be set/unset by clicking any graph.

A zoom in/out widget is available in the upper right corner of each graph to allow for quicker investigation into surrounding data points or diving into a cluster to get a better resolution.

When the graph time range is changed in anyway, the date range picker and all graphs are synchronized to the appropriate range, resulting in the same time window being displayed. Click Go to refresh the data in the graphs regarding the current time reference.

Note: In some circumstances, particularly when a device is down, statistics are not reported correctly. It results in gaps or erroneous values in the performance graph for that device. When the device is online, statistics are reported correctly.

For a graph, gaps in the last week, month, and year views can be of different sizes due to granularity. To accurately assess a gap, anything less than a 6-hour view should be used.

Table 1. Available Graphs by component
Graph Type Storage Pool Vault Accesser Device Slicestor Device Manager

Storage Pool Capacity and Usage

X        

Raw Space Used

  X      

Total Number of Slices per Device

  X      

Aggregate Client-Accesser Throughput

  X      

Aggregate Acceser-Slicestor Throughput

  X      

Client-Accesser Throughput

  X      

Accesser-Slicestor Throughput

  X      

Scanning Rate (sources/sec)

      X  

Estimated High Priority Data Sources to Rebuild (sources)

      X  
Rebuild Queue Overflow Rate (sources/sec)       X  

Rebuild Slices Sent

      X  

Rebuild Deletes Sent

      X  

Rebuild Bytes Sent

      X  

Rebuild Slices Received

      X  

Rebuild Deletes Received

      X  

Rebuild Bytes Received

      X  

Disk Usage

    X X X

Device Load

    X X X

CPU Usage

    X X X

Network Usage

    X X X

Accesser Requests

    X    

Message Acknowledge Time

    X X  

CPU Temp

    X X X

Fan Speed

    X X X

Hard Drive Temp

    X X X

Resident Memory Set Size

X   X X X

On Heap Resource Permits

X   X X X

Off Heap Resource Permits

X   X X X
Table 2. Performance Graph Summary
Metric Units Description

Storage Pool Capacity and Usage

bytes

Displays the overall raw capacity and storage pool usage over time.

Raw Space Used

bytes

Raw space usage.

Note: When the manager is down, gaps appear in this graph.

Total Number of Slices per Device

slices / device or slices / pillar

The storage amount in a device that is measured in slices.

Aggregate Client to Accesser Device Throughput

bytes / second

Aggregate rate at which data is traveling (reads and writes) between the external client and the Accesser device. This graph is not defined if an Accesser device is not deployed for this vault.

Aggregate Accesser to Slicestor Device Throughput

bytes / second

Aggregate rate at which data is traveling (reads and writes) between the Accesser device and the Slicestor devices. This graph is not defined if an Accesser device is not deployed for this vault.

Client to Accesser Device Throughput

bytes / second

The rate at which data is traveling (reads and writes) between the external client and the Accesser device. This graph is not defined if an Accesser device is not deployed for this vault.

Accesser to Slicestor Device Throughput

bytes / second

The rate at which data is traveling (reads and writes) between the Accesser device and the Slicestor devices. This graph is not defined if an Accesser device is not deployed for this vault.

Scanning Rate

sources / second

The rate at which the scanning agent of one or more devices are scanning the storage for missing writes/deletes.

Estimated High Priority Data Sources to Rebuild

sources

The Estimated High Priority Data Sources to Rebuild graph represents the current amount of prioritized rebuild work per set on the system. Prioritized rebuild work is rebuild work for sources the system has identified as important. The trend of this graph relates to overall system health. A downward trend means rebuild work is progressing, and important rebuild work is being completed. An upward trend indicates the system is finding important prioritized rebuild work and that work is outpacing the rate at which work is being completed. A flat line represents incoming work is keeping pace with how fast prioritized rebuild work is being completed.

Rebuild Slices Sent

slices / second

The rate at which the rebuilding agent of one or more Slicestor nodes have reconstructed missing writes and have sent to a destination Slicestor node. Measured in slices/sec.

Rebuild Deletes Sent

slices / second

The rate at which the rebuilding agent of one or more Slicestor nodes have discovered missing deletes and have sent to a destination Slicestor node.

Rebuild Bytes Sent

bytes / second

The rate at which the rebuilding agent of one or more Slicestor nodes have reconstructed missing writes and have sent to a destination Slicestor node.

Rebuild Slices Received

slices / second

The rate at which missing writes have been recovered.

Rebuild Deletes Received

slices / second

The rate at which missing deletes have been recovered.

Rebuild Bytes Received

bytes / second

The rate at which missing writes have been recovered.

Disk Usage

bytes / second

Provides a consolidated view of disk read/write speeds for all drives in device.

Read Show

Display all read lines for all drives in the graph.

Read Hide

Hide all read lines for all drives in the graph.

Write Show

Display all write lines for all drives in the graph.

Write Hide

Hide all write lines for all drives in the graph.

Aggregate Read

Sum all toggled-on read lines and display a single summed line.

Aggregate Write

Sum all toggled-on write lines and display a single summed line.

The legend items can be sorted in three different ways:

  • bay number (increasing)
  • read speed (decreasing)
  • write speed (decreasing).

Click the headers in the legend to trigger the sorting. The legend is sorted by bay number when the graph is initially loaded.

VM devices display block device name instead of bay numbers and are sorted in lexographically increasing order.

When aggregate lines are displayed instead of individual lines, the aggregate lines adjust as individual lines are toggled by using the color box in the legend.

For export, only toggled-on lines are included in the .csv output file.

Device Load

Processes

Average number of processes that are either in a runnable or uninterruptible state. A process in a runnable state is either using the CPU or waiting to use the CPU. A process in uninterruptible state is waiting for some I/O access, for example, waiting for disk. Load averages are not normalized for the number of CPUs in a system, so a load average of one means that a single CPU system is loaded all the time while on a 4 CPU system it means it was idle 75% of the time. For a lightly loaded device, the vertical axis might show m for milli-percent.

CPU Usage

%

A measure of how much time the CPUs spend on user applications and O/S (Operating System) functions. Even when the CPU usage is 0% the CPUs are still performing basic system tasks. The graph shows cumulative CPU utilization by the device. Dual and quad core CPUs might report two, four, or even eight CPUs, depending on the specific device. Therefore, loads of greater than 100% are typical. CPU Wait, also shown, represents the percentage of time that the CPU is waiting on I/O, which includes disk, network, and memory delays. If your device's CPU uses hyperthreading of CPUs, it could take the percentage to greater than the total number of CPUs x 100%.

Network Usage

bytes / second

Inbound and outbound network traffic.

Accesser Requests

requests / second

HTTP requests to the Accesser device.

Message Acknowledge Time

ms

Measured from the time an Accesser device is ready to send a data packet until the Slicestor device has written to the operating system and responded back. Long message acknowledge times can indicate slow Slicestor devices. If an Accesser device does not have a connection to Slicestor devices, no data is displayed in the graph during this period.

CPU Temperature

 

A critical alarm is sent to the Monitor Event Console if the CPU (Central Processing Unit, that is, the server microprocessor) temperature exceeds 90°C. Check the ambient temperature of the device immediately.

Fan Speed

rpm or %

A critical alarm is sent to the Monitor Event Console if the speed exceeds the threshold; the threshold is specific to the device and is set automatically by the system. Excessive fan speed generally indicates an ambient cooling problem or poor air flow around the device. Each fan is shown as a separate line on the graph.

Fan series names will be of the form '<chassis-id> <fan-name>.'

Hard Drive Temp

°C

Depending on the number of hard drives, there can be more than one graph shown. The graph indicates what drives it is displaying in its title and legend.

A critical alarm is sent to the Monitor Event Console if the temperature of any disk exceeds 60°C. Check the ambient temperature of the device immediately. Each disk is shown as a separate line on the graph.

Resident Memory Set Size

bytes The portion of memory occupied by a process that is held in the main memory (RAM).

On Heap Resource Permits

bytes The on heap memory usage. Resource permits are a construct used by the system to approximate the on-heap memory used by the core process. Permits are acquired when memory is used and released when memory is freed.

Off Heap Resource Permits

bytes The off heap memory usage. Resource permits are a construct used by the system to approximate the off-heap memory used by the core process. Permits are acquired when memory is used and released when memory is freed.
Note: In some graphs, such as the Storage Pool Capacity and Usage and Raw Space Used graphs, a temporary drop is observed, particularly during upgrade. After device upgrades are complete, the value returns to normal.

If no vaults are created on a storage pool, the scanning and rebuilder graphs do not appear on the Monitor Device page. In addition, a Toggle Aggregate button appears on these graphs, which aggregates all lines into a single line, representing the overall performance.