How to use Flow Logs for VPC to monitor network traffic and troubleshoot connectivity using Log Analysis. 

IBM Cloud Flow Logs for VPC capture the IP traffic into and out of the network interfaces in a customer-generated VSI of a VPC and persist them into an IBM Cloud Object Storage (COS) bucket. You can use flow logs to diagnose connectivity issues or monitor traffic that enters and leaves the network interfaces of the VPC instances. This allows you to answer questions like the following:

  • Are unexpected TCP ports being accessed on my VSIs?
  • Is SSH traffic reaching the VPC but getting rejected?
  • Are bad actors trying to access my network?

COS provides an excellent landing place for high-volume, continuously growing storage. It is also possible to ingest this data from the COS bucket into other analysis tools. In this blog post, IBM Log Analysis will be the target.

Note: This is an updated version of a popular blog post: “Use IBM Log Analysis with LogDNA to Analyze VPC Network Traffic.” There are a number of changes to the automation scripts, including the use of IBM Cloud Code Engine.

Pushing VPC flow logs to IBM Log Analysis

Before running the code below, it can be helpful to initialize platform logs in the target region. The invocation of the Code Engine job deployed in the next section are visible in the platform logs

Deploying the sample code

A simple way to run these shell scripts is in the IBM Cloud Shell. Open cloud.ibm.com in a browser, log in and in the upper right, notice the shell icon:

The source code implementing these flows is available in GitHub. It comes with scripts to create a Code Engine job and project. Detailed instructions can be found in the README, but simply start in the Cloud Shell and type the following:

git clone https://github.com/IBM-Cloud/vpc-flowlogs-logdna
cd vpc-flowlogs-logdna

Once you have configured your shell environment with the demo.env file, you can start running the scripts. You will need the IBM Cloud CLI with the Code Engine plugin and the jq command line utility, which are already installed in Cloud Shell:

cp template.demo.env demo.env
edit demo.env
source demo.env

If you are in the Cloud Shell, execute the command tfswitch to get the latest Terraform version.

./000-demo-prerequisites.sh prints out the Terraform version to verify that Terraform is installed.

./100-demo-vpc-and-flowlog-bucket-create.sh creates a Cloud Object Storage service, a bucket, a VPC configured with flow logs and the Log Analysis service. Finally, it creates a code_engine_config.sh file used in the next step. The final few lines will look something like the below. Copy these and keep them handy. Yours will have different IP addresses, but this is what mine looked like:

>>> to exercise the vpc
curl 52.116.136.250:3000; # get hello world string
curl 52.116.136.250:3000/info; # get the private IP address
curl 52.116.136.250:3000/remote; # get the remote private IP address

Type the contents of code_engine_config.sh:

cat code_engine_config.sh
BASENAME="flowlogdna05"
RESOURCE_GROUP_NAME="default"
COS_BUCKET_CRN="crn:v1:bluemix:public:cloud-object-storage:global:a/7101234567897a53135fe6793c37cc74:29a8bfe2-a2a7-4166-b87d-cca7dc9c1649:bucket:flowlogdna05-cefl-001"
COS_ENDPOINT="s3.direct.us-south.cloud-object-storage.appdomain.cloud"
LOGDNA_REGION="us-south"
LOGDNA_INGESTION_KEY="722a90123456789207f65d4a6e9d0a95"

Check out the bucket in the COS instance found in the Resource list:

The bucket should already have flow logs:

Find your IBM Log Analysis service: 

Open the dashboard; it should be empty:

Lets get some flow logs written

./150-ce-prerequisites.sh verifies the ibmcloud cli with plugins for Code Engine and logging.

./200-create-ce-project-logging-and-keys.sh creates the Code Engine job and all the prerequisites. The job is subscribed to changes in the COS bucket.

Open the Code Engine Project. When you click on the Project and then click on the Jobs, you should see the newly created job:

Click on the Job to see the details:

As a reminder, this is the VPC infrastructure that is being captured by the Flow Log Collector:

Open the Flow logs for VPC to see the collector that was created:

In a few minutes, the COS bucket will start to have flow log objects. A few minutes after that, job runs will be visible. A few minutes later, flow logs will be available in the IBM Logging Dashboard.

Using IBM Log Analysis

Getting started

Use the curl commands from the 100 script. Mine were as follows:

  • curl 52.116.136.250:3000/info will show the output of the private IP address of vsi1 (e.g., 10.240.0.4)
  • curl 52.116.136.250:3000/remote will curl the private instance from within the public instance, displaying the private IP address of vsi1 (it is private)

Attempt to ssh to the public IP address — this will likely hang. We will investigate this soon:

$ ssh root@52.116.136.250

Looking for bad actors

Navigate to the IBM Log Analysis dashboard.

Lets look for flow logs in the Everything view for the private IP that was rejected. Mine was target_ip:10.240.0.4 action:rejected:

As you can see, there are quite a few records. I narrowed the search further to target_port:443 and found a record with an initiator_ip address of 92.63.197.61.

A quick Google search indicated that this IP address is unexpected and is 100% likely to be a bad actor.

Looking for expected traffic

A few minutes after the curl commands above complete, there should be some accepted traffic. 

Search for target_ip:10.240.0.4 action:accepted. Notice the target_port is 3000 associated with our curl.

The private instance has a lot less traffic. Try target_ip:10.240.64.4 and you might only see a few packets. Try the initiator_ip:10.240.64.4 and there are quite a few packets. What does this mean? 

Looking at one of the records, I noticed target_port:67, which is for the bootp protocol. This seems okay, so I’ll filter this out and look more (notice the minus sign): initiator_ip:10.240.64.4 -target_port:67. I continued with this process to notice the following: initiator_ip:10.240.64.4 -target_port:67 –target_port:123 -target_port:53:

  • Port 67: Bootp
  • Port 123: NTP Network Time Protocol
  • Port 53: DNS
  • Port 443: https for installing software
  • Port 80: http for installing software

It might be interesting to look at the target_ip addresses for the 443 and 80 ports and verify they are the software providers that my company has approved and set an alarm if they are not. Maybe I should change my security groups or network ACLs to be more constrained.

Cannot SSH

On my laptop, I obtained my own IP address using curl ifconfig.me:

$ curl ifconfig.me
24.22.68.94

In IBM Log Analysis, search for target_ip:10.240.0.4 target_port:22 initiator_ip:24.22.68.94.

A packet is found and the field: action:rejected.

This is good news. The network path from my laptop to the VSI is viable. But why is it being rejected?

This is likely due to Security Group Rules or Network ACLs. In the IBM Cloud Console, navigate to the VPC instances, click on the vsi1 instance and examine the Security Groups attached to the Network Interface. If you click on the Security Groups, you will notice that the install-software group is for installing software and the sg1 is for external access, but only to port 3000. That is the port used in the curl commands. There is no port 22 access, so this is likely the cause of the rejection.

On the x-sg1 security group page, in the inbound rules section, click the New rule button and then add a new inbound rule for Protocol: TCP, Port range: 22, 22 and the IP address returned from curl ifconfig.me.  Try the SSH again to verify the solution. Look for action:accepted in the flow log.

More investigation

The more I look into the flow logs using IBM Log Analysis, the more questions I have. I need to have a solid understanding of the communication paths required for my application and carefully constrain packet flows that could cause harm.  

In Log Analysis, you can create dashboards and alarms. For example, you may want to be notified via Slack whenever an SSH access is successful to an instance in your VPC. Narrow the search terms in the search box to find the records of interest. In the upper-right corner, click the Unsaved View drop-down and select save as new view/alert. In the pop-up, choose Alert > View-specific alert and click on Slack or one of the other notification mechanisms. Follow the provided instructions.

Clean up

Run the following scripts:

  • ./800-cleanup-code-engine.sh will destroy the Code Engine-related resources created by the 100 script
  • ./900-cleanup-demo.sh will destroy the VPC, COS and Logging Service created by the 200 script

Conclusion

IBM Cloud Flow Logs for VPC provides detailed traffic logs, and IBM Log Analysis is a great way to interactively search and understand network traffic. Real-time alert notifications allow network access to be audited.

Find out more about VPC in the Solution tutorials — just open the Virtual Private Cloud section on the left. Check out Auditing corporate policies for Identity and Access Management and Key Protect for a more detailed description of Slack alerts.

Categories

More from Cloud

IBM Cloud inactive identities: Ideas for automated processing

4 min read - Regular cleanup is part of all account administration and security best practices, not just for cloud environments. In our blog post on identifying inactive identities, we looked at the APIs offered by IBM Cloud Identity and Access Management (IAM) and how to utilize them to obtain details on IAM identities and API keys. Some readers provided feedback and asked on how to proceed and act on identified inactive identities. In response, we are going lay out possible steps to take.…

IBM Cloud VMware as a Service introduces multitenant as a new, cost-efficient consumption model

4 min read - Businesses often struggle with ongoing operational needs like monitoring, patching and maintenance of their VMware infrastructure or the added concerns over capacity management. At the same time, cost efficiency and control are very important. Not all workloads have identical needs and different business applications have variable requirements. For example, production applications and regulated workloads may require strong isolation, but development/testing, training environments, disaster recovery sites or other applications may have lower availability requirements or they can be ephemeral in nature,…

IBM accelerates enterprise AI for clients with new capabilities on IBM Z

5 min read - Today, we are excited to unveil a new suite of AI offerings for IBM Z that are designed to help clients improve business outcomes by speeding the implementation of enterprise AI on IBM Z across a wide variety of use cases and industries. We are bringing artificial intelligence (AI) to emerging use cases that our clients (like Swiss insurance provider La Mobilière) have begun exploring, such as enhancing the accuracy of insurance policy recommendations, increasing the accuracy and timeliness of…

IBM NS1 Connect: How IBM is delivering network connectivity with premium DNS offerings

4 min read - For most enterprises, how their users access applications and data is an essential part of doing business, and how they service those application and data responses has a direct correlation to revenue generation.    According to We Are Social’s Digital 2023 Global Overview Report, there are 5.19 billion people around the world using the internet in 2023. There’s an imperative need for businesses to trust their networks to deliver meaningful content to address customer needs.  So how responsive is the…