Network address data types in PostgreSQL
PostgreSQL provides developers with numerous data types with specialized functions. In this post, we focus on the network address data type and show you how it works, when to use it, and how it can help you when storing IP addresses in your database.
IP addresses are the lingua franca of the Internet, giving everything a place, even if it isn’t a specific place. By recording IP addresses, you can, for example, determine patterns of behavior which may be malicious. Or you may want to efficiently store your network’s layout. Whatever the goal, PostgreSQL has the data types you need.
We’re going to take a look at network address data types in PostgreSQL, namely the INET (Internet Protocol) and CIDR (Classless Internet Domain Routing) data types to store IPv4 and IPv6 addresses. We’ll cover the differences between how PostgreSQL stores INET and CIDR addresses, the input error checking capabilities that come out of the box, and some of the functions that are available for these data types.
Without further adieu, let’s take a look at INET and CIDR data types.
INET and CIDR
Classless Internet Domain Routing (CIDR) and Internet Protocol (INET) are the two data types that store IP addresses in PostgreSQL. The data types come with their own input error checking capabilities as well as their own operators and functions. It may be confusing to try to figure out when to use either the CIDR or INET data type in your PostgreSQL tables, so we’ll go over some of the differences between them and when you should use one over the other.
If we’re storing IPv4 or IPv6 host addresses, PostgreSQL recommends using the INET data type with an optional netmask. While it’s possible to store addresses that represent a network using INET, like
192.10/14, PostgreSQL recommends using CIDR, which we’ll discuss further below. For now, let’s take a look at how INET data is stored and some of the problems we might run into.
To demonstrate what the INET data type does, we’ll start off by creating a table called inet_test with address taking the INET data type:
Now, let’s insert some address values. The first value contains an IPv4 host address with a netmask of 24, second is just an IPv4 host address representing a single host, and the last is a host address with a random netmask value:
SELECT statement, our table will give us the three values we’ve entered:
Notice that the last address we inserted—
198.10/8—has zeros added to the address. The INET data type will add the necessary zeros to the IPv4 host address to complete it when we append a netmask value. Without appending a netmask, we’d receive an error telling us that the address is not valid because it’s incomplete:
Another instance where we can get an error is if we’re adding a netmask value that exceeds the number of bits allowed for that IPv4 host address. For example, if we enter another address,
198.24/24, we’ll get the following error:
The last two examples, however, are cases where the INET data type is not recommended to be used. If we’re storing IPv4 or IPv6 addresses representing a network, PostgreSQL recommends using the CIDR data type because it follows its own conventions and checks for errors a little differently. So, let’s see what CIDR is all about.
PostgreSQL recommends that the CIDR data type should be used when storing addresses that represent a network. Unlike INET, the CIDR data type checks whether there are nonzero bits to the right of the netmask on insertion. If there are, then it will give us an error and no values will be inserted.
Let’s first look at an example of how the CIDR data type works. First, we’ll create a table cidr_test with the address as the CIDR data type:
We’ll then insert some sample network addresses with and without a netmask. If we don’t enter the netmask bits, then the CIDR data type will revert to a classful network numbering system, which is an older numbering system that is not recommended to be used. However, for the sake of understanding the differences, we’ve put together addresses in pairs:
Selecting all the values that we’ve inserted into the database shows us some slight differences in how PostgreSQL has stored them:
The stored addresses using CIDR values are quite compelling, but they don’t really tell us much about how the network and broadcast addresses are conceived. To determine these addresses, we can turn to PostgreSQL’s specialized functions for network address data types.
Network address functions
PostgreSQL has several network address functions that are available for INET and CIDR data types. Of particular interest for us are the
network functions since they provide the broadcast address, the IP address, the network netmask, and the network address of the INET and CIDR addresses we inserted. If PostgreSQL didn’t have these functions, we’d have to manually figure them out, or we’d be dependent on an online resource that figures them out for us. Let’s take a look at what these functions provide us by looking at our existing datasets.
SELECT address, host(address), broadcast(address), netmask(address), network(address) FROM inet_test;
This gives us:
From this dataset, the interesting values come from the first two addresses. For the first address,
188.8.131.52/24, our addresses for the broadcast, netmask, and network addresses are different from those in the second address,
184.108.40.206 represents a single host, which is shown via the
network function since it appends a netmask of 32 to the address in the
SELECT address, host(address), broadcast(address), netmask(address), network(address) FROM cidr_test;
This gives us:
Always make sure to include the CIDR netmask to the network addresses. Examining the output of our query above, we can see significant changes that have occurred in the broadcast and netmask fields where the addresses differ significantly. The exception is the third pair of addresses,
192.168.10, where we seem to have gotten lucky since the addresses are translated the same using CIDR and the older classful system.
Indexing network addresses
So, what about query performance? We can set up indexes on INET and CIDR addresses using a Btree index by default. But to increase performance, we can set up GIN and GiST indexes on INET and CIDR columns using built-in operator classes for the indexes.
To set up a GiST index on INET or CIDR data, we’d write:
This will index both INET and CIDR data types and most operators, excluding bitwise and addition and subtraction operators.
If we insert 5000 random addresses with random netmasks, we can then test the difference between the Btree and GiST indexes. We are inserting this number of addresses in order to avoid PostgreSQL running a sequential scan over an indexed scan when using the
EXPLAIN command. Using a Btree index on the following query:
We will get the following result:
If we drop the Btree index and use the GiST index we’ve created on the same query, we’d get something like:
While it’s not a significant difference in the query time, querying over more data may potentially save us seconds using the GiST over the Btree index.
Creating a GIN index is a little more tricky in that you can only set up an index on an array of values which have to either be all CIDR or INET addresses. Nonetheless, to set up these indexes we’d write:
It’s been suggested that a more performative solution to indexing network addresses is the IP4R project, which extends the native PostgreSQL network address data types. It provides additional network address types as well as additional support for GiST indexing.
Addressing what we did
Understanding the different data types that PostgreSQL provides and how they work gives us tools that enable us to expand on how our data is represented and manipulated inside a database. With a new understanding of how the network address data type works, we can use INET and CIDR data types effectively and understand the various ways network address are interpreted and addressed in PostgreSQL.