Use IBM Cloud SQL Query to Analyze VPC Network Traffic from IBM Cloud Flow Logs for VPC

4 min read

Use Flow Logs for VPC to monitor network traffic and troubleshoot configuration and connectivity.

Virtual Private Cloud (VPC) Gen2 is a critical service to deploy compute in a logically isolated virtual network that you define, including selection of your own IP address range, creation of subnets and security groups.

IBM Cloud Flow Logs for VPC capture the IP traffic into and out of the network interfaces in a customer-generated VSI of a VPC and persist them into an IBM Cloud Object Storage (COS) bucket. You can use flow logs to diagnose connectivity issues or monitor traffic that enters and leaves the network interfaces of the VPC instances. This allows you to answer questions like the following:

  • Are unexpected TCP ports being accessed on my VSIs?
  • Is SSH traffic reaching the VPC but getting rejected?
  • Are bad actors trying to access my network?

COS provides an excellent landing place for high-volume, continuously growing storage. It is also possible to ingest this data from the COS bucket into other analysis tools. 

In this blog post, you'll learn how to use IBM Cloud SQL Query to analyze flow logs in a serverless way based on the scenario that my colleague Powell Quiring outlined in his excellent post, "Use IBM Log Analysis with LogDNA to Analyze VPC Network Traffic from IBM Cloud Flow Logs for VPC."

With IBM Cloud SQL Query, you only pay for what you use; there are no standing cost or setup steps required. You'll only pay a fee for how much data is accessed to execute your query. This means no pre-processing or transformation is necessary and you can get started in minutes with analyzing your flow logs:

With IBM Cloud SQL Query, you only pay for what you use; there are no standing cost or setup steps required. You'll only pay a fee for how much data is accessed to execute your query.

There are various ways to process flow logs for specific use cases, and all have specific advantages:

Deploying the sample code

A simple way to run these shell scripts is in the IBM Cloud Shell. Open cloud.ibm.com in a browser, log in and in the upper right, notice the shell icon:

Open cloud.ibm.com in a browser, log in and in the upper right, notice the shell icon:

The source code implementing these flows is available in GitHub. It comes with scripts to set up Cloud Object Storage. Detailed instructions can be found in the README, but simply start in the Cloud Shell and type the following:

git clone https://github.com/IBM-Cloud/vpc-flowlogs-sqlquery
cd vpc-flowlogs-sqlquery

Once you have configured your shell environment with the local.env file, you can start running the scripts. You will need the IBM Cloud CLI, the Cloud Object Storage plugin, the Schematics plugin and the jq command line utility, which are already installed in Cloud Shell:

cp template.local.env local.env
edit local.env
source local.env
# verify PREFIX and TF_VAR_ssh_key_name are in the environment by using the env command

Execute 000-prereqs.sh, which performs basic checks of the target resource group, the target region, the required IBM Cloud plugins and external tools.

Execute 010-create-services.sh to create a Cloud Object Storage service, a bucket and SQL Query service.

Bucket:

sqa5

Call 030-create-vpc.sh to create the VPC, subnets, instances and flow log collectors. Terraform in the IBM Cloud Schematics service is used to create all of the resources except the flow log collector, which is created using the ibmcloud cli. After the script completes, check out the flow log collector configuration in the IBM Cloud Console.

VPC Infrastructure observed by the flow log collector:

VPC Infrastructure observed by the flow log collector:

Flow log collector:

Flow log collector:

In a few minutes, the COS bucket will start to have flow log objects. A few minutes after that, you can start querying your flow logs in SQL Query.

The final few lines displayed by 030-create-vpc.sh will look something like this. Copy these and keep them handy:

>>> to exercise the vpc
curl 52.116.136.250:3000; # get hello world string
curl 52.116.136.250:3000/info; # get the private IP address
curl 52.116.136.250:3000/remote; # get the remote private IP address

SQL Query analysis

Getting started

The curl 52.116.136.250:3000/info will show the output of the private IP address of vsi1, or 10.240.0.4. The curl 52.116.136.250:3000/remote will curl the private instance from within the public instance, displaying the private IP address of vsi1 (it is private). Attempt to ssh to the public IP address — this will likely hang. We will investigate this soon:

$ ssh root@52.116.136.250

Setting up SQL Query

Navigate to your SQL Query instance and then click the Launch SQL Query UI.

Important: Replace these variables in the following steps:

  • bucket: The bucket where your flow logs are stored.
  • region: The region alias of the bucket that holds your flow logs.

First, define a table for flow logs:

-- Table definition for flow logs 
CREATE TABLE FLOW( 
version string, 
collector_crn string, 
attached_endpoint_type string, 
network_interface_id string, 
instance_crn string, 
capture_start_time timestamp, 
capture_end_time timestamp, 
number_of_flow_logs int, 
flow_logs array<struct< 
    start_time: string, 
    end_time: string, 
    connection_start_time: string, 
    direction: string,
    action: string, 
    initiator_ip: string, 
    target_ip: string, 
    initiator_port: int, 
    target_port: int, 
    transport_protocol: int, 
    ether_type: string, 
    was_initiated: boolean, 
    was_terminated: boolean, 
    bytes_from_initiator: long, 
    packets_from_initiator: long, 
    bytes_from_target: long, 
    packets_from_target: long, 
    cumulative_bytes_from_initiator: long, 
    cumulative_packets_from_initiator: long, 
    cumulative_bytes_from_target: long, 
    cumulative_packets_from_target: long 
>>, 
account string, 
region string, 
`vpc-id` string, 
`subnet-id` string, 
`endpoint-type` string, 
`instance-id` string, 
`vnic-id` string, 
`record-type` string, 
year int, 
month int, 
day int, 
hour int, 
`stream-id` string 
) USING JSON LOCATION cos://<region>/<bucket>/ibm_vpc_flowlogs_v1/

Then, define a view that will give us a nice flattened view on flow logs and make it easier to work with the data:

CREATE VIEW FLOW_FLAT AS 
WITH EXPLODED_FLOW as ( 
    SELECT 
        version,
        collector_crn,
        attached_endpoint_type,
        network_interface_id, 
        instance_crn, 
        capture_start_time, 
        capture_end_time, 
        `vnic-id`, 
        `record-type`, 
        year, 
        month, 
        day,
        hour, 
        `stream-id`, 
         explode(flow_logs) as flow 
    FROM FLOW) 
SELECT 
    version, 
    collector_crn, 
    attached_endpoint_type, 
    network_interface_id, 
    instance_crn, 
    capture_start_time, 
    capture_end_time, 
    `vnic-id`, 
    `record-type`, 
    year, 
    month, 
    day, 
    hour, 
    `stream-id`, 
    flow.* 
FROM 
    EXPLODED_FLOW

Looking for bad actors

Navigate to your SQL Query instance and then click the Launch SQL Query UI.

Let's look for flow logs that show all connections to vsi1 that have been rejected:

SELECT
    connection_start_time,
    initiator_ip,
    initiator_port,
    target_port,
    transport_protocol 
FROM FLOW_FLAT 
WHERE target_ip = "10.240.0.4"  AND action= "rejected"
sqa13

As you can see, there are quite a few records. Let's group all rejected connections by initiator_ip so that we get an idea about from where the connections originated:

SELECT
    first(connection_start_time) as first_occurence,
    initiator_ip,
    count(*) as amount
FROM FLOW_FLAT 
WHERE target_ip = "10.240.0.4"  AND action= "rejected" 
GROUP BY (initiator_ip) ORDER BY amount DESC LIMIT 10
sqa15

As you can see, most of the rejected connections originated from 92.64.196.13. In order to understand if someone did a port-scan on vsi1 we're including the target_port of each rejected connection:

SELECT
    count(*) as connections,
    first(connection_start_time) as first_attempt,
    initiator_ip,
    size(collect_set(target_port)) as distinct_ports,
    concat_ws(",", collect_set(target_port)) as target_ports          
FROM FLOW_FLAT where target_ip = "10.240.0.4"  AND action= "rejected" GROUP BY (initiator_ip) ORDER BY connections DESC
Over 700 hundred different ports have been scanned from that single IP — this surely was an attempted port scan.

Over 700 hundred different ports have been scanned from that single IP — this surely was an attempted port scan.

Let's pivot the data by the port that was targeted instead of the initiator_ip to see if that reveals new insights.

Grouping data by target port instead of initiator_ip:

SELECT
    count(*) as connections,
    first(connection_start_time) as first_attempt,
    target_port,
    size( collect_set(initiator_ip)) as individual_initiator_ips
FROM FLOW_FLAT where target_ip = "10.240.0.4" AND action= "rejected" GROUP BY (target_port) ORDER BY connections DESC
As you can see, over 1000 individual ips tried to establish a connection on port 445, which is used for direct TCP/IP MS Networking access. It's 100% likely that attackers attempted to scan the VSI for potential vulnerabilities. 

As you can see, over 1000 individual ips tried to establish a connection on port 445, which is used for direct TCP/IP MS Networking access. It's 100% likely that attackers attempted to scan the VSI for potential vulnerabilities. 

Looking for expected traffic

A few minutes after the curl commands above complete, there should be some accepted traffic:

SELECT
    connection_start_time,
    initiator_ip,
    initiator_port,
    target_port
FROM FLOW_FLAT where target_ip = "10.240.0.4" AND action= "accepted"
Notice the target_port is 3000 associated with our curl.   The private instance has had no packets received. Let's validate this:

Notice the target_port is 3000 associated with our curl.

The private instance has had no packets received. Let's validate this:

SELECT
    connection_start_time,
    initiator_ip,
    initiator_port,
    target_port,
FROM FLOW_FLAT where target_ip = "10.240.64.4"  AND action= "accepted"
sqa23

Try looking for connections initiated by the private instance — there are quite a few packets: 

SELECT
    connection_start_time,
    initiator_port,
    target_ip,
    target_port
FROM FLOW_FLAT where initiator_ip = "10.240.64.4"
sqa25

What does this mean? Looking at one of the records, I noticed target_port=67, which is for the bootp protocol. This seems okay, so I'll filter this out and look more, filtering out DNS (53) and NTP (123). I continued with this process to notice the following: 

SELECT
    connection_start_time,
    initiator_port,
    target_ip,
    target_port,
FROM FLOW_FLAT where initiator_ip = "10.240.64.4" AND target_port NOT IN (53,123,67)
sqa27
  • Port 67: Bootp
  • Port 123: NTP Network Time Protocol
  • Port 53: DNS
  • Port 443: https for installing software
  • Port 80: http for installing software

It might be interesting to look at the individual target_ip addresses for the 443 and 80 ports and verify they are the software providers that my company has approved:

SELECT
    target_ip,
    concat_ws(",", collect_set(target_port)) as target_ports_string,
    count(*) as connections
FROM FLOW_FLAT where initiator_ip = "10.240.64.4" AND target_port NOT IN (53,123,67) 
GROUP BY (target_ip) ORDER BY connections
sqa29

Maybe I should change my security groups or network ACLs to be more constrained.

Cannot SSH

On my laptop, I obtained my own IP address using curl ifconfig.me:

$ curl ifconfig.me
88.152.185.82
 
SELECT  
    connection_start_time,
    initiator_ip,
    initiator_port,
    target_port,
    transport_protocol,
    action
FROM FLOW_FLAT where initiator_ip = "88.152.185.82" AND target_port = 22
sqa31

This is good news. The network path from my laptop to the VSI is viable. But why is it being rejected?

This is likely due to Security Group Rules or Network ACLs. In the IBM Cloud Console, navigate to the VPC instances, click on the vsi1 instance and examine the Security Groups attached to the Network Interface. If you click on the Security Groups, you will notice that the install-software group is for installing software and the sg1 is for external access, but only to port 3000. That is the port used in the curl commands. There is no port 22 access, so this is likely the cause of the rejection.

On the x-sg1 security group page, in the inbound rules section, click the New rule button and then add a new inbound rule for Protocol: TCP, Port range: 22, 22. Try the ssh again to verify the solution. Look for action=accepted in the flow log by re-running the previous query.

More investigation

The more I look into the flow logs using IBM SQL Query, the more questions I have. I need to have a solid understanding of the communication paths required for my application and carefully constrain packet flows that could cause harm.  

In SQL Query, I can analyze flows in various ways using SQL superpowers. Create tables and views as we've shown to reuse and share them across various different analytics that you do.

As SQL Query offers a JDBC Driver,  I can connect all of my data easily back into existing reporting infrastructure to augment existing reports. 

Since SQL Query can work with any rectangular data stored on object storage and supports the full-breath of SQL, I can perform elaborate analytics, including joining of data from various systems — like custom inventory data, Cloud Internet Services, Activity Tracker and LogDNA. 

Having said that, this blog post is just the beginning of exploring how SQL Query can help to get insight from different data sources and build a single-pane-of-glass view onto my logging data lake. 

One example of this is to combine flow logs with Cloud Internet Services logs to get a single pane of glass of network connections as they travel through the cloud and perform additional in depth analytics. 

I'll explore this option in depth in a future blog post and introduce some of the analytics that you can do. 

Cleaning up

To remove all resources created as part of this blog post, execute 040-cleanup.sh

Conclusion

IBM Cloud Flow Logs for VPC provides detailed traffic logs, and IBM Cloud SQL Query is a great way to analyse, troubleshoot and understand network traffic. The full breath of SQL allows for augmenting and enriching network flows to allow network access to be audited in a comprehensive way.

Find out more about VPC in the Solution tutorials — just open the Virtual Private Cloud section on the left.  Check out IBM Cloud Flow Logs for VPC documentation for a more detailed description of how to use IBM Cloud SQL Query and some best practices. 

Be the first to hear about news, product updates, and innovation from IBM Cloud