Time Series Analytics for IBM Virtual Private Cloud (VPC) Flows Using Grafana

9 min read

Time series analytics using IBM SQL Query and visualization on the Grafana dashboard for data on IBM Cloud Object Storage is now available.

IBM Cloud VPC is a public cloud offering that lets you establish your own private cloud-like computing environment on shared public cloud infrastructure. VPC gives an enterprise the ability to define and control a virtual network that is logically isolated from all other public cloud tenants, creating a private, secure place on the cloud.

VPC’s security features give you granular control over which IP addresses or application ports can be used for connections to resources like virtual machines or Kubernetes worker nodes. You can think of this as being analogous to the friends-only or public/private controls on social media accounts that are used to restrict who can or cannot see your otherwise public posts.

Metadata about the flow of network packets within a VPC can be captured to IBM Cloud Object Storage (COS) for further analytics. This is pretty useful for connectivity troubleshooting, traffic analysis and intrusion detection.

Due to the serverless nature of object storage in IBM Cloud, it is well suited for exponential data growth and cloud-native workloads, such as network flow analysis.

Analyzing VPC flow logs

IBM Cloud supports direct queries on Cloud Object Storage with SQL through the IBM Cloud SQL Query service, which is an interactive query service for data in IBM Cloud Object Storage. This enables you to query your data directly where it is stored. There is no ETL, no databases and no infrastructure to manage.

As we have shared in a previous blog, SQL Query makes it very easy to perform advanced SQL-based analytics on network flows in a VPC.

Taking analysis to the next level

Flow logs are essentially timestamped event data; therefore, the time of occurrence plays a major role in answering many analytical questions.

Grafana is a popular tool to analyze and visualize log data and timestamped data, in general. Unlike other solutions that require data to be stored in specific persistency (such as Elasticsearch), Grafana is persistency agnostic. Thus, you can query, visualize, alert on and understand your metrics from multiple supported data sources.

Integrating flow log data with Grafana makes visualization and query of flow logs even easier and allows for single-pane-of-glass dashboards.

To facilitate this integration, we are happy to share our cloudsql-cos-ts plugin for Grafana with you in this post. The plugin combines the strength of Grafana as a powerful visualization tool with the strength of SQL Query as a scalable and serverless way to query timestamped data with IBM’s time series analytic library.

The plugin enables Grafana to connect to SQL Query to query, visualize, alert on and understand your flow logs and other metrics data stored in Cloud Object Storage.

The architecture of the integration comprises front-end and back-end components:

Figure 1. Architectural diagram.

Figure 1. Architectural diagram.

The front-end implementing browser-based extensions is depicted as number 1 in the figure. The extensions are included in Grafana’s plugin directory, and they extend and augment Grafana’s browser user interface.

The back-end is depicted as numbers 2, 3 and 4 in the figure. It is deployed on a container-based back-end, such as IBM Cloud Code Engine or IBM Cloud Kubernetes Service and exposes the underlying APIs of Cloud Object Storage and SQL Query in a form that the front-end can use.

Setting up Grafana 

Download Grafana V7.4+:

wget https://dl.grafana.com/oss/release/grafana-7.4.1.linux-amd64.tar.gz
tar -zxvf grafana-7.4.1.linux-amd64.tar.gz
ln -s grafana grafana-7.4.1

You can launch Grafana in either development mode or production mode.

Front-end production mode

The front-end part of the plugin is hosted here.

You don’t have to build the front-end part of the plugin. Copy the compiled plugin (build folder) to the right location:

$GRAFANA_DIR/plugins-bundled
cp -R <your-path-to>/cloud_sqlquery/grafana-plugins/cloudsql-cos-ts/build/. $GRAFANA_DIR/plugins-bundled/cloudsql-cos-ts

Launch Grafana:

$GRAFANA_DIR/bin/grafana-server

Front-end development mode 

Build the plugin. Yarn and Node.js are required:

cd <your-path-to>/cloud_sqlquery/grafana-plugins/cloudsql-cos-ts
yarn init
yarn dev

The compiled plugin is stored in the dist folder.

Download the source to a folder:

export GRAFANA_DIR = $HOME/go/src/github.com/grafana/grafana

Edit the following file — go/src/github.com/grafana/grafana/conf/custom.ini — and update app_mode and plugins:

# choose 'development' or 'production'
app_mode = development
 
# choose the right location
plugins = <your-path-to>/cloud_sqlquery/grafana-plugins

Back-end deployed on a single machine (local or remote)

The back-end code can be found on IBM public GitHub. Users only interact with the plugin via the front-end component. The back-end works as a middle-man, connecting the front-end with SQL Query because part of it is responsible for the following tasks:

  1. Handling the workload, including parsing the SQL statements and translating macros to the proper SQL Query-compliant statements.
  2. Merging results from multiple queries.
  3. Query caching so that the same query is issued only once. Caching avoids a delay when issuing a real query job to the SQL Query service because the result is cached in the back-end system.
  4. The back-end is also extensible and customizable for specific business use cases where certain knowledge about the underlying data is provided.

To be able to run the back-end, create a Conda environment with all required packages for the back-end to run. Suppose that the name of the environment is cloudsql_cos_ts:

conda env create -f environment.yml
conda activate cloudsql_cos_ts

Then, run the app directly:

python app.py

By default, the app is running at your designated machine with port 18081.

Back-end deployed as a Docker image 

The app is also deployable as a Docker image. A pre-built image is available at DockerHub:

docker pull tmhoangt/ibm_cloudsqlquery_grafana_backend:1.0

We also provide a mechanism to build the back-end as a Docker image based on Alpine using the following script:

DOCKER_BUILDKIT=1 docker build . -t dev_image -f Dockerfile.stages --target builder
DOCKER_BUILDKIT=1 docker build . -t grafana_cloud_cos_ts -f Dockerfile.stages

Using Alpine as the base image allows for a much smaller Docker image than a full-fledged distribution. Running the build for the first time can take a while as all stages of the multi-stage build need to be executed. Since some packages need to be compiled from source, the creation of the first build stage takes the most time. Subsequent build stages consume the packages build from earlier stages, such that sub-sequent builds that only change the back-end code are far quicker.

While you can run the resulting image locally or deploy it to Kubernetes, you can also deploy on IBM Cloud Code Engine to avoid the need to manage any infrastructure. The image must be pushed into a supported Docker image registry like GitHub registry or IBM Cloud Container Registry.

The following text explains how to deploy the back-end on Code Engine, which can deploy the image from a public GitHub repository or a private IBM Cloud Container Registry. 

Option 1: The quick and easy way is to follow the Web UI form.

In that, we can select the image:

  • docker.io/tmhoangt/ibm_cloudsqlquery_grafana_backend
  • us.cri.io/<your-namespace>/<your-image>

If the second one is selected, we need to follow the instruction to create a CR namespace. Suppose the region is us-south, and namespace is sqlquery:

docker tag ibm_cloudsqlquery_grafana_backend us.icr.io/sqlquery/grafana-backend
ibmcloud cr login
ibmcloud cr region-set us-south
docker push us.icr.io/sqlquery/grafana-backend

Option 2: If we want to deploy using the command-line, we need the Cloud API key to access and download the Docker image hosted on the Container Registry:

ibmcloud iam api-key-create cliapikey -d "My CLI APIkey" --file key_api.txt

Suppose the API key to log into your account on Container Registry is in key_api.txt — then, the next step is to create a secret key for the given API key in order to enable Code Engine to download the image from Container Registry:

ibmcloud ce registry create --name  <registry-name> --server us.icr.io --username <your-email-to-login> --password-from-file </path/to>/key_api.txt

Then, assuming you have an account on Code Engine, run the following script:

ibmcloud ce application create --name <your-app-name> --image us.cri.io/sqlquery/grafana-backend --port 18081 --concurrency 100 --cpu 1 –min-scale 1 --max-scale 250 –request-timeout 600 

The script returns the URL to the app that is running on Code Engine, which we can access using port = 18081.

Your first query

At this point, you should have access to the Grafana server that we spun up earlier using the browser. By default, it runs on port 3000. For example, http://localhost:3000.

Adding IBM Cloud Object Storage as a Grafana data source 

Figure 2. Create the data source instance and configure the URL of the back-end web app.

Figure 2. Create the data source instance and configure the URL of the back-end web app.

Once you log into Grafana, you must create a data source instance by using the guidelines as provided in Figure 2. The name of the data source is ‘cloudsql-cos-ts,’ which is the Grafana plugin. On the configuration page, provide the information that is needed for the plugin instance, following the steps below:

  1. Define the name of the data source instance, which should be unique. Here, we name it as ‘FlowLog.’
  2. Define the IP port of the back-end engine that handles the parsing of the query and combines multiple query results before returning back to Grafana. If you run the back-end without any other settings on a given machine with IP ip-name, the back-end is accessible as http://ip-name:18081.
  3. Add the information of the SQL Query service so that the back-end engine can interact with it. An explanation for each item is provided if you point the cursor to the information icon with the letter ‘i’ inside the circle.

Before you complete the creation of the data source instance, click Save and Test and the result 'OK' must be returned.

Running your first query

You can issue the query at two places in Grafana — in the Explore section and in the individual panel of a Dashboard.

  • The Explore section in Grafana enables you to try and test with any given SQL Query statement.
  • The Dashboard section enables you to create a dashboard that contains one or more panels. Each panel can contain one or more SQL query statements, whose query result will be rendered as a flat table or a graph. The panels can be rescaled and moved freely in the Dashboard. A panel is tied to a data source instance; thus, you can visualize data from multiple data sources simultaneously.

Before you can run a query, at the top, select the correct data source instance, which is named FlowLog in Figure 3.

A query string can contain the following information:

  • Macros (plugin specific)
  • SELECT only statement
  • SQL Query-compatible functions: Apart from ANSI SQL functions, SQL Query supports an extensive set of functions to create and manipulate time series data, before returning back to Grafana for visualization purposes.

A query returns one or more columns, in which the result can be interpreted by Grafana as either in table format or time series format. You can select the format in the Format dropdown box. The result in table format can only be displayed by Grafana in a flat table. Any graph-based visualization needs data returned in time series format.

Grafana has a brief introduction to the time series concept. A time series comprises a timestamp and a measurement at each row. The plugin supports returning multiple time series in a single query. You can specify this in the metrics column.

When the time series format is selected, you have the option to specify which column you want to contain the date and time information in the Time Column text box. The values in the Time Column in string type are converted accordingly by the back-end engine before returning to Grafana.

You can issue multiple queries in a single panel, and each query can make use of a different data source:

Figure 3. Explore editor UI.

Figure 3. Explore editor UI.

Figure 4. A query editor accepts certain macros to simplify the query. The ‘Show Help’ option shows such information to users.

Figure 4. A query editor accepts certain macros to simplify the query. The ‘Show Help’ option shows such information to users.

Macros to be used in a query

You can simplify your query by using macros. Multiple queries can be chained together by using supported macros. Some of them are listed below and an example is given in Figure 4. Apart from this, Grafana supports disabling a query, inspecting its result via the Query Inspector and saving the result to a file:

  • $__source: The data source as provided in the setting for the selected data source.
  • $__dest[(format [,suffix])]: The location to store data as provided in the settings for the selected data source, with the option to specify the format and the appended path. For example: $__dest$__dest()$__dest(csv)$__dest(CSV) and $__dest(parquet, a/b/c).
  • $__source_prev(Id): This macro references the output from a previous query in the same dashboard or panel. Each query has an associated ID, which is a single character by default. This enables users to chain multiple queries. For example: $__source_prev(A).
  • $__timeFilter([type]): With this macro, the query dynamically returns data from the column as provided via the Time Column text box, based on the time range that you selected at the dashboard level. As the time can be stored in different types (e.g., string or timestamp), it may need the type as argument so that the date and time can be parsed properly by the engine. For example: $__timeFilter$__timeFilter(string).
  • $__timeFilterColumn(col_name, [type]): With this macro, the query dynamically returns data from the given column, based on the time range that you selected at the dashboard level. The value for type can be string or timestamp.

Flow Log use case

IBM Cloud Flow Logs for VPC captures the IP traffic into and out of the network interfaces in a customer-generated Virtual Server Instance (VSI) of a Virtual Private Cloud (VPC) and saves them in an IBM Cloud Object Storage (COS) bucket. The following is a walk-through example of how the information that is stored in Cloud Object Storage can be analyzed by using the plugin.

The use case is based on the data stored in FLOW_FLAT view, which can be created by injecting the query (as shared in our previous blog) into one of the queries in the Explore section or a dedicated dashboard’s panel.

The view contains the following columns, any of which can serve as ‘time’ column:

  • capture_start_time: timestamp
  • connection_start_time: timestamp
  • end_time: string
  • start_time: string
  • capture_end_time: string

Grafana provides the option to select a time range from which the data should be extracted in a given Dashboard, as shown in Figure 5. This can be absolute time range or a relative time range with respect to the current local time. Such a time range can be embedded into the query using the $__timeFilter macro:

Figure 5. A time range can be selected from Grafana.

Figure 5. A time range can be selected from Grafana.

Next, we describe how the following graph can be generated using the dynamically selected time range:

Figure 6. A dashboard with six panels: (1) the schema of the data to provide the information for making queries, (2) the total rejected and accepted connection over the selected time range, (3) the dynamics of accepted and rejected connections at every one-hour interval, (4) the top-25 rejected source IPs, (5) the top destination IPs of VSI that rejected incoming connections, (6) the bad actor as narrowed down from (4).

Figure 6. A dashboard with six panels: (1) the schema of the data to provide the information for making queries, (2) the total rejected and accepted connection over the selected time range, (3) the dynamics of accepted and rejected connections at every one-hour interval, (4) the top-25 rejected source IPs, (5) the top destination IPs of VSI that rejected incoming connections, (6) the bad actor as narrowed down from (4).

To enable the dashboard to be updated with new information quickly, you can use variables. The plugin supports injecting the variable into the query by using ${variable_name}. If the value is a string, enclose it using double quotes (e.g., “${variable_name}"). Also, query variable is supported in that the values of the variable is the result of a SQL Query that returns either a single column or two columns. When two columns are returned, the first column is treated as the label, and the second column is treated as the value. In this example we create two variables: target_ip and target_ip_private. Each has a single value in the beginning.

Combining the dynamic time range and variables, you can quickly see new result without modifying the individual queries.

Panel 1 

  • Title: Data schema
  • Description: List the column names as its types
  • Query:
    gr7
  • Output (select as a flat table):
    gr8

Panel 2

  • Title: Reject/Accept ratio to ${target_ip}. By using the variable, as ${target_ip}, in the title, the whole plot will be updated when a new value is selected for the given variable.
  • Description: Shows the total number of rejected connections and accepted connections to the specified ${target_ip}.
  • Query: We use the variable and dynamic time-range via the macro $__timeFilter in the query. Note that you cannot apply the time range selection on a derived column name, you must use an outer SELECT statement:
    gr9
  • Output: Select as a Gauge plot:
    gr10

Panel 3

  • Title: Monitor rejected and accepted connections to ${target_ip}
  • Description: Show accepted and rejected connections over time
  • Query: We use the variable and dynamic time range through the macro $__timeFilter in the query, and we chain two queries together. The first query is used as the intermediate for the second query — thus we turn off the flag ‘Get Queried Result’. The second query uses the result from the first query so we can reference to it without needing to know its exact location by using the $__source_prev(A) macro. The every-hour statistics, in terms of number of accepted and rejected connections, is calculated using time-series functions TS_SEG_COUNT on every segment. Each segment represents a one-hour data created using the TS_SEGMENT_BY_TIME function from the multiple time series as created using the  TIME_SERIES_FORMAT function.
    TS_SEGMENT_BY_TIME(time-series-object, segment-length, offset)
    TIME_SERIES_FORMAT accepts, at minimum, three pieces of  information — the timetick column representing the column with time data, the value column representing the column with measurement data and the key column representing the column as the metrics whose individual value captures a single time series:
    gr11
  • Output: As a time-series data, we can choose ‘Graph’ as the plot. By specifying the metrics column as action, the query returns data as multiple time series, each series containing the information about the segment-start-time and action-count-during-segment for a given initiator_ip’s value:
    gr12

Panel 4

  • Title: Top source IP rejected by ${target_ip}
  • Description: We select the top-25 source IPs with the most rejected connections by ${target_ip}
  • Query: We use the variable and dynamic time range through the macro $__timeFilter in the query: 
    gr13
  • Output: A ‘Bar gauge’ as plot:
    gr14

Panel 5

  • Title: Top rejected connections in VPC.
  • Description: Show, at most, the top-10 IPs of VSIs that rejected connections
  • Query: We use the variable and dynamic time range through the macro $__timeFilter in the query:
    gr15
  • Output: A ‘Bar gauge’ as plot:
    gr16

Panel 6

  • Title: Bad actor detection
  • Description: Show which IPs get rejected the most from VPC.
  • Query: We use the variable and dynamic time range through the macro $__timeFilter in the query:
    gr17
  • Output: Show as a ‘Gauge’ plot:
    gr18

Conclusion

IBM Cloud Flow Logs for VPC provides detailed traffic logs and metadata about the flow of network packets within a VPC that can be captured to IBM Cloud Object Storage for further analysis by IBM Cloud SQL Query

Flow logs are essentially timestamped event data, therefore the time of occurrence plays a major role in answering many analytical questions. Grafana is a popular tool to analyze and visualize log data and timestamped data, in general. This tutorial introduced a new plugin that enables the integration of flow log data with Grafana, which makes visualization and the query of flow logs even easier and allows for single-pane-of-glass dashboards. The plugin is open-sourced and hosted at IBM Cloud's public GitHub.

Be the first to hear about news, product updates, and innovation from IBM Cloud