Contents


Securing a Raspberry Pi embedded in your IoT device

Writing scripts to define and enforce usage patterns to secure your IoT device

Comments

Internet of Things (IoT) devices are often exposed to sensitive information, such as whether you're home, or they have control of important things, such as whether your baby is hearing soothing music or a blaring siren when your baby stirs at 3 AM. Developers must learn how to secure IoT devices from intrusions.

A Raspberry Pi is, in many respects, an excellent system to use as the processing core of IoT devices. It runs a general-purpose operating system (usually Linux, although a Windows version exists). As a developer, you have full access to all the services and functions that an operating system provides. However, this flexibility comes with security risks.

By default, a general-purpose operating system offers many potentially risky services or functions that are not needed for an IoT device. For example, rarely does a smart doorbell need to run a web browser, act as an FTP server, or use SSH to connect to random servers.

In this article, you learn how to identify exactly what the Raspberry Pi needs to do while embedded in an IoT device, and then how to prevent the Pi from doing anything else. To secure the Pi, we run a script to identify the usage pattern, which defines what the Pi is doing during a period of time. We store this information in a file. Later, we run another script that reads that file and uses it to enforce the usage pattern and prevent anything else from happening on the Pi.

This technique is too specific to use on a computer that runs a general-purpose operating system. However, IoT devices tend to have much more rigid usage patterns, so this technique works well for a Raspberry Pi that is embedded in an IoT device.

You can watch me introduce my article and demonstrate the scripts that I discuss in this article in the following video:

What you need to build your application

  • A Raspberry Pi with the Raspian OS installed. The techniques and scripts in this article have all been verified on such a system. They might work on other systems, if they are based on UNIX, but it is not guaranteed.
  • Basic knowledge of JavaScript and Node.js. I chose to use Node.js because I assume that more readers know JavaScript than one of the traditional scripting languages, such as Python.
  • Node.js packages:
    • npm (to install other packages)
    • ps-man (to get the list of packages)
    • node-netstat (to get the list of open sockets)
  • Basic knowledge of Linux system administration. Identifying a usage pattern and enforcing it are both system administration tasks. You need to have some experience with running these Linux system administration commands: ps, kill, netstat, and tcpdump.

Configuring your Raspberry Pi

From a clean installation of Raspian, which is the default Raspberry Pi operating system that you can download from the Raspberry Pi site, run these commands on the Raspberry Pi to install the required Node.js packages and a few other tools that are not installed by default.

sudo apt-get install tcpdump
sudo apt-get install npm
npm install ps-man
npm install node-netstat

Identifying usage patterns

The usage pattern that we track in this article consists of two types of data:

  1. Processes. Processes identify what the Pi does.
  2. Network connections. Network connections identify how the Pi communicates with all the other things. The connections can be divided further into two types:
    1. Listening sockets. Listening sockets are used by other systems to connect to the Pi as a server.
    2. Active connections. Active connections can enable the Pi to act as a client by connecting to other devices as servers.

Processes and listening sockets can be identified by polling the Pi periodically to see what is currently running and which ports are in listen mode, waiting for connections.

Active connections, however, can be very short lived. To identify them, you need to have an active tcpdump process.

You can use the get_pattern.js script that is available in my GitHub repo (Securing_Raspberry_Pi_in_Your_Device) to identify the usage pattern. You can put this script anywhere on the file system of your Raspberry Pi, and run the script by using Node.js:

node get_patterns.js

Processes

To build the processes part of the usage pattern, which represents what the Pi does, we need to repeatedly look at the list of processes and see what is running.

You can use the ps-man package to get the list of processes, which is similar to the ps command. You can use this code to call the ps-man package to get a list of all the process commands:

var ps = require("ps-man");

ps.list({}, function(err, result) {
	for(var i=0; i<result.length; i++)
		console.log(results[i].command);
};

This code shows a snapshot of the current commands. To merge the snapshots from different points in time, we use a hash table with the command name as the key. For example, you can use this code to create such a hash table and update it every second:

var processHistory = {};

var getProcesses = function() {
	ps.list({}, function(err, result) {
		for(var i=0; i<result.length; i++)
			processHistory[result[i].command] = true;
		});
};

setInterval(getProcesses, 1000);

Listening sockets

When the Raspberry Pi is a server, it listens on a port until some other entity connects to it. You can use the following netstat command to get only the sockets that are waiting for connections:

netstat({filter: {state: "LISTEN"} }, function(data) { … });

This filter only catches TCP and TCP for IPv6 sockets. UDP sockets don't have a listen state, and node-netstat does not return them anyway. Later in this article, I outline how you can work with UDP sockets.

The callback function is called with each socket that matches the filter, and gets the socket information in a structure. A listening socket can be identified by two features: the protocol (TCP, UDP, or the IPv6 versions of those protocols) and the port number. The following callback function gets that data and stores it in a hash table similar to the one that we used for processes earlier.

function(data) {
	listenSockets[data["local"]["port"] + "/" + data["protocol"]] = true;
}

Building a usage pattern from periodic polling

To poll processes and network sockets that are listening for connections, you need to determine two parameters:

  • Polling frequency. How often should we look at the running processes and network sockets?
  • Cycle length. How long should the program track the device to get a complete picture? For example, if the device uploads to a server once an hour, this period needs to be at least an hour. If the program contacts a server to look for updates once a day, this period has to be at least a day. During that time period, you need to provide the device with any input conditions that it might experience in real life so that their consequences will be part of the detected usage pattern.

The following code shows the definition of these parameters:

// Polling frequency, in seconds. This is how often we look at the
// process and socket lists.
var pollingFreq = 1;

// Time for a full cycle of the device, in seconds. This means that
// anything that happens on this device, we expect to happen at least
// once during that time.
var cycleLength = 60;

To start polling, we use the command setInterval. The return value of that command is an identifier that can be used to later stop the polling.

// Set up the polling
var processInterval = setInterval(getProcesses, pollingFreq *1000);
var socketInterval = setInterval(getSockets, pollingFreq *1000);


var pollingDone = false;

// End polling
var endPolling = function() {
	clearInterval(processInterval);
	clearInterval(socketInterval);

	pollingDone = true;
};

Finally, we can use the setTimeout command to end the polling. If we want, it can also report the results:

// Set up end of polling
setTimeout(function() {
		endPolling();
		console.log(JSON.stringify({
			processes: processHistory,
			sockets: listenSockets		
		}));
	}, cycleLength*1000);

Active connections

Finding information about connections where the device is the client is harder. These connections can be very short lived, so polling periodically is not going to see them. The solution in this case is to run the tcpdump command, which is a network sniffer that sees everything that passes through the interface. (You can read more about this tcpdump command in its man page.)

In this article, we are only interested in the first packet of the connection (which is always client to server), and only if that first packet is outgoing, from our IoT device to the network.

In the case of TCP packets, the first packet of the connection is the only one to have the SYN flag without an ACK flag. (You can read about TCP connection initiation in the Wikipedia entry for TCP.)

The following command (which should be typed all on one line) shows one line for each TCP connection where the IoT device is a client and keeps a copy of the line in a file called tcpAsClient:

sudo tcpdump --direction=out -n 'tcp[tcpflags] & tcp-syn != 0 and tcp[tcpflags] & tcp-ack == 0 and tcp' | tee tcpAsClient

Run this command on the Pi's command line, and then open a web browser on the Pi and browse the web to see what you are accessing. Because of the tee command, the output of tcpdump goes both to the command line window and the tcpAsClient file.

The output in tcpAsClient includes the IP address and the port number of the destination. However, IP addresses are less useful than you might think because devices typically try to access host names, not IP addresses. Large websites, such as www.ibm.com, often have multiple IP addresses, which you can see by resolving the host name from the command line by using the ping www.ibm.com command or from using Google's tools (https://toolbox.googleapps.com/apps/dig/#A/www.ibm.com) —the IP addresses are likely to be different.

To make matters more complicated, reverse DNS lookup often does not resolve to that host name. To solve this problem, we can run a separate tcpdump command to capture the DNS requests and their answers:

sudo tcpdump -n 'port 53' | tee dns

The parseTcp.js script that is available in my GitHub repo (Securing_Raspberry_Pi_in_Your_Device) illustrates how get_pattern.js parses the output from the tcpdump commands, and then combines them. I'm going to explain only the more interesting parts from this script.

The easiest way to read a file is using the fs.readFile() function. The fragment below reads a file (in this case the file name is "dns") and then calls the callback function. The file's contents is in a buffer, in the data parameter.

fs = require("fs")
fs.readFile("dns", function(err, data) {…});

First, you create an array of lines.

var lines = data.toString().split("\n");

The fields in the output of tcpdump are separated by spaces. In the case of DNS, the sixth field ([5] in an array because arrays are zero-based) is an identifier. In the case of a request, a plus sign is appended to the number. In the case of a response, the eighth field is the response type followed by a value, then another response type and a value, and so on. IP addresses have a type A and usually appear on their own. The following code is an example of a request and its response:

19:09:34.158059 IP 172.16.1.1.45897 > 172.16.0.1.53: 35156+ A? d29usylhdk1xyu.cloudfront.net. (47)

19:09:34.214358 IP 172.16.0.1.53 > 172.16.1.1.45897: 35156 8/0/0 A 52.85.202.13, A 52.85.202.254, A 52.85.202.106, A 52.85.202.237, A 52.85.202.137, A 52.85.202.204, A 52.85.202.68, A 52.85.202.248 (175)

Requests are tracked in one hash table, with the identifier as a key. Returned DNS entries are tracked in another hash table, with the IP address as the key and the host name as the value.

Parsing the tcpAsClient file is easier because we are only interested in the IP address (to get the host name) and port number on the remote side, which is always the destination in the packets that we track. This information is in the fifth entry in the line, the 4 bytes of the IP address followed by a dot and then the port number:

9:13:16.961446 IP 172.16.1.1.49494 > 80.70.128.24.80: Flags [S], seq 3551986630, win 29200, options [mss 1460,sackOK,TS val 69478387 ecr 0,nop,wscale 7], length 0

The result is a hash table with port numbers and the hosts to which the device connected on that port. The result is arranged in such a way to make it easier to use the information to produce firewall rules.

In the case of UDP, which does not have connections, it is difficult to distinguish between requests and responses. However, we can get all outgoing UDP packets by running this command:

sudo tcpdump --direction=out -n udp | tee udpPackets

Parsing this file is similar to parsing the TCP file.

Adding the tcpdump command to the get_pattern.js script

Although you can run the tcpdump command manually as we did in the previous section, it is easier to have Node run tcpdump before it starts polling for information. While Node is polling for process and socket information, tcpdump can run separate processes to gather the active connection information. Then, after a full cycle of the device has passed, Node can stop the tcpdump processes, parse the results, and save them to a file for future use.

The Node child_process library starts processes and communicates with them. In general, this is how you run a process:

var child_process = require("child_process");
var process = child_process.spawn("cp", ["/etc/passwd", "."]);

In this case, the command we call is sudo because tcpdump must run with root privileges.

The tcpdump command is a parameter on the sudo command, and so are its parameters. The parameters must be separate strings in the list. Normally, the shell takes care of that, but this mechanism bypasses the shell.

The direction parameter is a bit complicated because for two of the tcpdump commands we only want to see packets going out. But for the one tcpdump command for DNS, we want to see all packets, regardless of direction.

var process = child_process.spawn("sudo",
	["tcpdump", "--direction" + (outOnly ? "out" : "inout"), "-n", "-l", filter]);

The process has an on method that registers event handlers as do the output streams (stdout and stderr). We use these methods to get the output of tcpdump and send any errors back to the user.

var output = "";
process.stdout.on("data", function(data) {
	output += data;
});

// Show stderr
process.stderr.on("data", function(data) {
	console.log("stderr on tcpdump:" + data);
});

// Display errors
process.on("error",function(err) {
	console.log("Error:" + err);
});

Finally, at some point, we need to kill the tcpdump process. However, we can't send it a signal as we would a normal process that we spawned. Because this process is running as root, it can only be killed by root. The space gets appended to the output because otherwise, if tcpdump did not detect any packets, the program will assume it hasn't terminated yet and wait indefinitely.

setTimeout(function() {
	child_process.spawn("sudo", ["kill", process.pid]);
	callback(output + " ");
}, time);

We need to kill the tcpdump process three times, so we wrap this code in a function.

var startTcpdump = function(filter, time, callback, outOnly) {
…
};

In the get_pattern.js script that is available in my GitHub repo (Securing_Raspberry_Pi_in_Your_Device), starting at line 220, you can see three calls to startTcpdump with a filter, and the calls put the result in a variable.

Getting the usage pattern file

The usage pattern file can only be produced after the polling ends (otherwise we ignore everything that happens after it is produced), the three tcpdump processes terminate, and the tcpdump results are parsed.

To ensure the correct timeline, get_pattern.js uses the whenDo function. This function takes two function parameters. The first is a condition. The second is the callback to be called when the condition becomes true. If the condition is false, the function waits a few seconds and tries again (by using setTimeout).

var whenDo = function(when, todo) {
	if (when())
		todo();
	else
		setTimeout(function() {whenDo(when, todo);}, 5000);
};

The first time this function is called in parseDumps, where it ensures that the output will only be parsed once it is available:

var parseDumps = function() {
	whenDo(function() {
			return dnsString != "" &&
				tcpAsClient != "" &&
				udpPackets != "";
		},
	     	function() {
			parseDNS(dnsString);
			parsePorts(tcpAsClient, tcpClients);
			parsePorts(udpPackets, udpClients);

			parsingDone = true;
		}
	);
};

The other is in saveResults, where it only writes the results to a file (behaviorPattern.json) after the polling and parsing are both done:

var saveResults = function() 	{
	whenDo(function() {
		return pollingDone && parsingDone;
	}, function() {
		var result = {
			processes: processHistory,
			listen: listenSockets,
			tcp: tcpClients,
			udp: udpClients
		};
		fs.writeFile("behaviorPattern.json",
			JSON.stringify(result) + "\n");
	});
};

Those two functions are only called after a full cycle. However, we do not know in what order they would be called or whether the tcpdump processes will be terminated before the results are to be parsed, so whenDo ensures that everything will run in the correct order.

Enforcing usage patterns

Defining the usage pattern without actually enforcing it is an exercise in futility. In the case of processes, the easiest way to enforce the usage pattern is to kill the processes if they should not be running. Everything else in the usage patterns that we detected goes through the network. This means that it can be enforced through firewall rules.

Two programs in my GitHub repo (Securing_Raspberry_Pi_in_Your_Device) enforce the usage pattern. The first program is enforce_pattern.js, which directly enforces the part of the usage pattern that can be enforced by periodic polling. It kills any process that is not in the established usage pattern. The second program is firewall_rules.js, which creates the firewall rules that prevent network accesses that deviate from the usage pattern.

Killing unknown processes

The enforce_pattern.js program reads the behaviorPattern.json file, which is the output of get_pattern.js program. It then runs the enforceProcesses function ten times a second. When you choose the frequency to run enforceProcesses on your system, you need to consider the tradeoffs between two factors:

  1. The damage that a process can do in a short amount of time
  2. The performance requirements of your device

The enforceProcesses function iterates over all the processes, in the same way that getProcesses does. However, instead of registering the processes, it looks to see whether the process is in the existing usage pattern. If not, it issues a sudo kill -9 command to kill that process. However, these exceptions are checked by the allowed function:

  • Don't commit suicide. While we were getting the behavior, node was running, but it was running with a different parameter, get_pattern.js. The processes that ps-man reports to us include the command line parameters, so node enforce_pattern.js is considered a different process from node get_pattern.js.
  • Don't kill processes whose command line contains sudo kill -9. These processes are most likely to be spawned by our own process.
  • Don't kill kernel worker processes. Or rather, don't attempt to kill them. Being part of the kernel, they don't respond to any signals that make them unkillable.

Creating firewall rules

Most of the behavior pattern in the behaviorPattern.json file is devoted to network traffic. To restrict network traffic, the firewall_rules.js program in my GitHub repo (Securing_Raspberry_Pi_in_Your_Device) creates a script (fwCommands.sh) that specifies firewall rules. You run this program once, and then either run fwCommands.sh manually (as root) or add it to the startup script as explained below.

Although the JavaScript in these scripts so far is pretty simple, you might not be familiar with the Linux firewall, iptables. I will explain the fwCommands.sh file from my system to show what it does and how to control it.

The first part of fwCommands.sh specifies general rules. First, we flush any existing input and output policies. The third chain, forward, is used when the device forwards packets from one interface to another. But for this use case, it is safe to ignore it. People are unlikely to use Raspberry Pi as a network router.

#! /bin/bash
#
iptables -F INPUT
iptables -F OUTPUT

These two lines set the default action to accept (meaning, the packet is passed on). Usually, for firewalls, it is best to disallow whatever is not explicitly allowed, but here we are only securing the TCP traffic so anything else needs to be allowed.

iptables -P INPUT ACCEPT
iptables -P OUTPUT ACCEPT

The loopback interface is used for communication between different processes on the same device, so it is considered safe. If any packets are going through it, they can be passed on.

iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT

Packets that belong to established connections or a related connection (for example, the FTP data channel of an established FTP control channel) are also allowed.

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

The second part of the output derives from the list of local ports that accept connections. It allows input to those ports from anywhere because we do not know whether any client locations are illegitimate. In this case, we only accept input to ports 80 (HTTP) and 22 (SSH).

iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT

This code is the last part that handles the input policy, so we add a rule to reject any other TCP packets (those that are not part of an established or related connection, and not going to ports we explicitly allowed). Then, we add a rule to accept anything else.

iptables -A INPUT -p tcp -j REJECT
iptables -A INPUT -j ACCEPT

The third and final part of the file derives from the tcpdump that views TCP connections. It allows new output connections, but only to a port and host that appear in the behavior pattern. In this example, we are making security decisions based on DNS resolution, which can be spoofed in various ways. As usual, there is a tradeoff between security and usability. A more sensitive IoT device might have a policy that is sbased not on observed behavior, but on the actual design of the device and what it needs to allow.

iptables -A OUTPUT -p tcp -m state --state NEW -m tcp --dport 80 -d www.ibm.com -j ACCEPT
iptables -A OUTPUT -p tcp -m state --state NEW -m tcp --dport 80 -d www.google.com -j ACCEPT
iptables -A OUTPUT -p tcp -m state --state NEW -m tcp --dport 22 -d 10.20.30.40 -j ACCEPT

As with the input policy, any other TCP packet is dropped or rejected and any non-TCP packet is accepted.

iptables -A OUTPUT -p tcp -j DROP
iptables -A OUTPUT -j ACCEPT

From prototype to production

This article presented some scripts that you can use in a prototype of an IoT solution that uses a Raspberry Pi embedded in an IoT device. Obviously, some features were omitted by design. The following list includes some of the missing features and some suggestions on how to implement them.

  • Add a backdoor.
    It is sometimes useful to remotely manage a device. But if you tighten the security, you might not be able to use ssh anymore. Or, maybe you will be able to access the device, but whenever you try to run a command, it will be killed because it isn't in the usage pattern.
    The traditional solution to this problem is to leave a backdoor, or a way into the device. However, that can be dangerous because it allows access to anybody who can find out about it. One solution is to have the backdoor authenticate by you using a safe mechanism, such as one time passwords. To make it even stronger, you can use one time passwords that are not stored on the device, by using the S/Key standard, for example.
  • Start the protection automatically.
    You obviously need the protection to start automatically after a reboot by running both node enforce_protection.js and fwCommands.sh. The script to modify is /etc/rc.local.
    However, a potential problem exists. Some processes run only during initialization (fwCommands.sh, to identify an obvious example). To solve this problem, you can have the initialization script run the get_pattern.js command to get that part of the usage pattern.
  • Verify the identity of executable files.
    Right now, the behaviorPattern.json file identifies commands with the name that was used to run them, which can be either an absolute path or an executable file that is found in the path. Unfortunately, if an attacker manages to replace an executable file in our program, it will not be able to stop it.
    The solution is to identify the executable files with a cryptographic checksum, by using the crypto-js package, for example. (You can read about this package on the npm site.) This solution can identify cases where the file's name is the same but the file itself changes.
  • Only allow process killers that our process spawns.
    Right now any sudo kill -9 process is safe from the rules in the enforce_pattern.js program. This rule is excessively lenient. We create our own kill processes, and we can get PIDs from that. Instead of allowing all processes that look like them, we can exclude those we have created.
  • Add UDP and IPv6 support.
    Currently, the firewall rules just allow all UDP and IPv6 traffic. IPv6 is not a problem because it is disabled by default. But to allow UDP requires some work because UDP is connectionless, which means that server ports are not in a listen state.
    To get such ports, it is necessary to change the filter to one that looks for ports without a foreign address. It is also necessary to change the default command that node-netstat issues because the default only returns TCP. Then, you can change the firewall_rules.js script to give the correct commands to filter UDP.

Conclusion

Because a Raspberry Pi runs a general-purpose operating system, it can leave itself open to security issues that seem to plague IoT devices today. By using Linux system administration skills to identify a usage pattern, you can reduce your security exposures by restricting access to only those processes and connections that make sense for your IoT device.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Internet of Things, Security
ArticleID=1046182
ArticleTitle=Securing a Raspberry Pi embedded in your IoT device
publish-date=05172017