This post will demonstrate a Squid NFV using VPC routing.

A virtual private cloud (VPC) gives an enterprise the ability to define and control a virtual network that is logically isolated from all other public cloud tenants, creating a private, secure place on the public cloud.  

VPC routing allows more control over network flow and can be used to support Network Functions Virtualization (NFV) for advanced networking services, such as third-party routing, firewalls, local/global load balancing, web application firewalls and more.

This post will demonstrate a Squid NFV. Other off-the-shelf firewall instances like those from Palo Alto and F5 can be similarly configured. To quote the Squid site: “Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages.”


The host instance is going to read from internet websites. Internet-bound traffic from the host subnet will be sent to the proxy instance by the routing table and routes. The Squid NFV on the proxy instance will connect to the website and act as a middle man between the host and the website.

In the diagram above, the website is neverssl.com; the Squid proxy will impersonate (AKA spoof) neverssl.com. The proxy will be an undetectable intermediary in the conversation so existing applications on the host do not require code changes to benefit from Squid functionality.

You could click through the console to create the VPC, subnets, route table, route table route, instances, etc. This post will use Terraform, so it will be up and running in just a few minutes.

Tooling prerequisites

The provision steps are going to be done from the CLI. This will allow you to move these steps to a CI/CD pipeline or into IBM Cloud Schematics, over time. 

Ignore these prerequisites and use the IBM Cloud Shell where these tools are preinstalled — or use your workstation and verify the installation of the following tools. See the “Getting started with solution tutorials” guide for help on installing them:

  • Git
  • IBM Cloud CLI
  • Terraform
  • Jq

IAM prerequisites

You will need permissions to create VPC resources. Even if you are the account owner, an additional IAM policy is required to create instances with network interfaces that allow spoofing. See about IP spoofing checks.  

I am the account administrator, so I executed this command line in the Cloud Shell using my email address:

ibmcloud iam user-policy-create YOUR_USER_EMAIL_ADDRESS --roles "IP Spoofing Operator" --service-name is
Scroll to view full table

Alternatively, you can add this policy in the IBM Cloud Console IAM section starting at Users:

  • Click the User
  • Click Access policies
  • Click Assign access
  • Click IAM services
  • Choose VPC Infrastructure Services from the drop down
  • Click on the IP Spoofing Operator


Create and test

Clone the source code repository and execute the tooling prerequisite check:

git clone https://github.com/IBM-Cloud/vpc-nfv-squid
cd  vpc-nfv-squid
cp local.env.template local.env
edit local.env
source local.env
./000-prereq.sh
Scroll to view full table

Create the resources. Take a look at the script, it is pretty simple:

cat ./010-create.sh
#!/bin/bash
 
terraform init
terraform apply -auto-approve
Scroll to view full table

If Terraform produces the following error message instead of provisioning the proxy instance, make sure you have correctly configured your account with the IP Spoofing Operator permission as mentioned above:

Error: the provided token is not authorized to  the specified instance (ID:NEWRESOURCE) in this account
Scroll to view full table

The Terraform heavy lifting is defined in main.tf. Even if you are not familiar with Terraform, take a look. You will find a self-documenting blueprint of the architecture. Once Terraform completes successfully, open the VPC layout in the IBM Cloud console and select all of the subnets. I configured a basename in local.env of Squid:


Run the test script to verify it is working as expected. You will need to accept the ssh IP addresses when prompted:

$ ./030-test.sh
 
>>> verify it is possible to ssh to the host and execute the true command
ssh -J root@52.116.133.164 root@10.0.0.4 true
 
>>> verify proxy connectivity using ping
ssh -J root@52.116.133.164 root@10.0.0.4 ping 10.0.1.4 -c 2
PING 10.0.1.4 (10.0.1.4) 56(84) bytes of data.
64 bytes from 10.0.1.4: icmp_seq=1 ttl=64 time=0.540 ms
64 bytes from 10.0.1.4: icmp_seq=2 ttl=64 time=0.422 ms
 
--- 10.0.1.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1013ms
rtt min/avg/max/mdev = 0.422/0.481/0.540/0.059 ms
 
>>> verify explicy specifying the squid proxy server ip works. Testing the network path - not testing the router
ssh -J root@52.116.133.164 root@10.0.0.4 set -o pipefail; curl neverssl.com -s --proxy 10.0.1.4:8080 | grep poorly-behaved > /dev/null
 
>>> veriy direct access to neverssl.com, end to end, through the route table
ssh -J root@52.116.133.164 root@10.0.0.4 set -o pipefail; curl neverssl.com -s | grep poorly-behaved > /dev/null
 
>>> verify implicit access to a denied host fails
ssh -J root@52.116.133.164 root@10.0.0.4 curl virus.com -s | grep squid > /dev/null
>>> success
Scroll to view full table

In a test-driving fashion, lets dive deeper into the system that has been created.

Jump instance


The only instance that can be reached directly via ssh is the jump (bastion). Check out the security group ssg_ssl in main.tf — Securely access remote instances with a bastion host details the concepts. The Terraform output has a copy/paste string you can use to ssh to the host through the jump. The rest of the testing is done using the jump host. You can verify the test results. For me, it looked like this:

$ terraform output host 
[
  {
    "ip_host" = "10.0.0.4"
    "sshhost" = "ssh -J root@52.116.137.7 root@10.0.0.4"
  },
]
$ ssh -J root@52.116.137.7 root@10.0.0.4
...
root@squid-us-south-1-host:~
Scroll to view full table

Host to proxy access

In the last step you ssh’d to host. Let’s reproduce some of the tests. Is the proxy reachable?


root@squid-us-south-1-host:~# ping 10.0.1.4 -c 2
PING 10.0.1.4 (10.0.1.4) 56(84) bytes of data.
64 bytes from 10.0.1.4: icmp_seq=1 ttl=64 time=0.532 ms
64 bytes from 10.0.1.4: icmp_seq=2 ttl=64 time=0.428 ms
 
--- 10.0.1.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1020ms
rtt min/avg/max/mdev = 0.428/0.480/0.532/0.052 ms
Scroll to view full table

Next, verify that the Squid service is running on the proxy and that Squid is able to reach the internet. Squid is listening to port 8080, so the following curl should work:

root@squid-us-south-1-host:~# curl neverssl.com -s --proxy 10.0.1.4:8080
<html>
    <head>
        <title>NeverSSL - helping you get online</title>
...
Scroll to view full table

Host access via Virtual Network Function

Finally, the beauty of VPC routing and NFV can be seen by opening the routing tables, selecting the VPC and clicking on the route table:



The 10.0.0.0 is the CIDR for the VPC. The 166.8.0.0 and 161.26.0.0 CIDRs are service endpoints in the IBM Cloud for software repository mirrors, time servers, dns servers, etc. Check out the available endpoints. These are all delegated to the default routing table. See create route.

The interesting CIDR, 0.0.0.0/0 matches everything else. The  next hop — 10.0.1.4 — is the proxy. When the host connects to neverssl.com at IP address 54.230.154.14, it will match this route and the connection will be made to the proxy instance.

The Squid service and Linux iptables are configured in the proxy_user_data.sh file. Notice the command:

iptables -t nat -I PREROUTING 1 -s $host_ipv4_cidr_block -p tcp --dport 80 -j REDIRECT --to-port 3129
Scroll to view full table

The above iptables command executed on the proxy configures the kernel routing table to direct some of the incoming packets to the intercept port of the Squid application. Let’s break it down:

  • -t nat:  Add entry #1 in the network address translation table.
  • -s $host_ipv4_cidr_block: Only consider those packets from the host cider blocks.
  • -p tcp: Only consider tcp protocol.
  • -dport 80: Only consider packets to port 80 (http).
  • -j REDIRECT: Redirect the matching packets.
  • --to-port 3129: Change the destination port from 80 to 3129.

The Squid configuration does the rest:

cat > /etc/squid/squid.conf <<EOF
visible_hostname squid
 
#Handling HTTP requests
http_port 3129 intercept
http_port 8080
acl allowed_http_sites dstdomain .neverssl.com
acl allowed_http_sites dstdomain .test.com
acl allowed_http_sites dstdomain .ubuntu.com
http_access allow allowed_http_sites
EOF
Scroll to view full table

Squid will intercept the packets at port 3129 and serve as the middle man. Only a few sites are allowed — neversl.com, test.com and ubuntu.com. All other sites will be rejected by Squid. Now we can make sense of this portion of the diagram:


Continue testing:

root@squid-us-south-1-host:~# curl -s neverssl.com
<html>
    <head>
        <title>NeverSSL - helping you get online</title>
...
Scroll to view full table

The configuration is centralized in the route table and applies to instances on configured subnets. The host instance requires no configuration to access the internet via Squid. If you could eavesdrop on the end-to-end conversation, you would see the following:


The source and destination IP numbers of the tcp packets are as you would expect, except for the ones explicitly noted:

  1. The request is addressed to 54.230.125.14. The route table next hop route delivers it to the proxy at 10.0.1.4.
  2. At the proxy, the Linux iptables redirect to the Squid process. The Squid process establishes a connection to neverssl.com. The IP address is provided by the public gateway.
  3. The response is returned to the proxy/Squid over the public gateway.
  4. Squid impersonates neverssl.com, spoofing the IP address 54.230.125.14. The curl command running on host is none the wiser.

Using tcpdump, it is possible to see some of the traffic. Bring up another ssh session to the proxy. The ssh command will be found using Terraform output. On the proxy, run tcpdump port 80 and do the same on the host but put it into the background tcpdump port 80 &. Then, on the host in the foreground, run the curl command again. The text below has been edited for readability.

Proxy:

root@squid-proxy:~# tcpdump port 80
listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
22:27:38.344635 IP 10.0.0.4.59434 > server-54-230-125-14.dfw50.r.cloudfront.net.http: Flags [S],
22:27:38.344774 IP server-54-230-125-14.dfw50.r.cloudfront.net.http > 10.0.0.4.59434: Flags [S
Scroll to view full table

Host:

root@squid-us-south-1-host:~# tcpdump port 80 &
root@squid-us-south-1-host:~# curl -s neverssl.com >/dev/null
22:27:38.228500 IP squid-us-south-1-host.59434 > server-54-230-125-14.dfw50.r.cloudfront.net.http: Flags [S]
22:27:38.229147 IP server-54-230-125-14.dfw50.r.cloudfront.net.http > squid-us-south-1-host.59434: Flags [S
Scroll to view full table

Verify that Squid denies access to virus.com. Notice the Squid error message:

root@squid-us-south-1-host:~# curl -s virus.com
...
<p>Generated Mon, 15 Mar 2021 22:41:20 GMT by squid (squid/3.5.27)</p>
<!-- ERR_ACCESS_DENIED -->
</div>
</body></html>
root@squid-us-south-1-host:~#
Scroll to view full table

Clean up

When you are done investigating, clean up all the resources using ./040-cleanup.sh. Take a look at the script: terraform destroy.

Conclusion

Configuring the VPC routing tables for Network Functions Virtualization (NFV) can be a great way to transparently insert functionality into a VPC network. More generally, route tables and routes can be used to both isolate and extend network connectivity.

Get more experience with NFV by configuring Squid to work with HTTPS traffic: SslBump Peek and Splice.

You can also find these NFVs in the IBM Cloud Catalog:

 

More from Cloud

Modernizing child support enforcement with IBM and AWS

7 min read - With 68% of child support enforcement (CSE) systems aging, most state agencies are currently modernizing them or preparing to modernize. More than 20% of families and children are supported by these systems, and with the current constituents of these systems becoming more consumer technology-centric, the use of antiquated technology systems is archaic and unsustainable. At this point, families expect state agencies to have a modern, efficient child support system. The following are some factors driving these states to pursue modernization:…

7 min read

IBM Cloud Databases for Elasticsearch End of Life and pricing changes

2 min read - As part of our partnership with Elastic, IBM is announcing the release of a new version of IBM Cloud Databases for Elasticsearch. We are excited to bring you an enhanced offering of our enterprise-ready, fully managed Elasticsearch. Our partnership with Elastic means that we will be able to offer more, richer functionality and world-class levels of support. The release of version 7.17 of our managed database service will include support for additional functionality, including things like Role Based Access Control…

2 min read

Connected products at the edge

6 min read - There are many overlapping business usage scenarios involving both the disciplines of the Internet of Things (IoT) and edge computing. But there is one very practical and promising use case that has been commonly deployed without many people thinking about it: connected products. This use case involves devices and equipment embedded with sensors, software and connectivity that exchange data with other products, operators or environments in real-time. In this blog post, we will look at the frequently overlooked phenomenon of…

6 min read

SRG Technology drives global software services with IBM Cloud VPC under the hood

4 min read - Headquartered in Ft. Lauderdale, Florida, SRG Technology LLC. (SRGT) is a software development company supporting the education, healthcare and travel industries. Their team creates data systems that deliver the right data in real time to customers around the globe. Whether those customers are medical offices and hospitals, schools or school districts, government agencies, or individual small businesses, SRGT addresses a wide spectrum of software services and technology needs with round-the-clock innovative thinking and fresh approaches to modern data problems. The…

4 min read