IBM Cloud Functions as Middleware

7 min read

With the help of the CloudEvents specification, you can create your own eventing middleware.

When people think about a serverless platform—such as IBM Cloud Functions—I'm pretty sure they tend to think of it in terms of its ability to quickly deploy and scale some business function based on demand. And that's great! It certainly does do that very well. However, there's no reason to limit a platform like Cloud Functions in that way.

Let's explore another use by seeing how the recently released CloudEvents specification can turn it into middleware.

CloudEvents: A brief overview

For those of you not familiar with CloudEvents, it's a specification produced by the Cloud Native Computing Foundation that aims to help with the processing of events between distributed systems. The term "processing" can refer to the ultimate destination of the event, actually performing some action as a result of the event, or it could mean simply helping to find the correct destination for the event in the first place.

Either way, the CloudEvents specification helps by standardizing the definition of some common event metadata and specifying where that metadata can be found in the messaging carrying the event. Unlike previous eventing specifications that tried to define "one event format" to rule them all, CloudEvents took a different approach. It doesn't try to force existing event formats to change at all. Instead, in the most simple case, it simply adds additional metadata to the messages.

CloudEvents example

Let's see what this means with a very simple example. In this example, we have the following HTTP message/event:

POST /processor HTTP/1.0
Host: example.com
Content-Type: application/json

{
  "action": "newItem",
  "itemID": "93"
}

You'll see that it looks like a pretty normal event—all of the business data is in the HTTP body.

Now, let's add a few more HTTP headers:

POST /processor HTTP/1.0
Host: example.com
Content-Type: application/json
ce-specversion: 1.0
ce-type: com.bigco.newItem
ce-source: http://bigco.com/repo
ce-id: 610b6dd4-c85d-417b-b58f-3771e532

{
  "action": "newItem",
  "itemID": "93"
}

Let's discuss what these new headers are:

  • ce-specversion simply specifies the version of the CloudEvents specification to which the message adheres.
  • ce-type indicates the semantics meaning of the event—in this case a new Item was created in some system. Since the phrase newItem could be used by many different systems, it's prefixed with com.bigco to indicate that this is BigCo.com's definition of newItem.
  • ce-source contains a unique identifier of the resource to which the event occurrence is related. In this case, the new item is something called repo.
  • ce-id is just a unique value for this event so we can do some dedup checking if needed.

There are other metadata properties that the specification defines, but these are the only required ones necessary to convert any existing message into a CloudEvent. Hopefully, you'll agree that aside from ce-specversion, the others are not only trivial but common to just about all events already.

CloudEvents and IBM Cloud Functions

Given that the CloudEvents specification is "just" standardizing event metadata, your IBM Cloud Functions Actions will support CloudEvents right out of the box. 

For example, let's suppose an IoT Gateway is publishing event data from water treatment plants. A given zone of the plant has key measurements like temperature, pH, and conductivity. The CloudEvent itself may look like the following:

POST /event HTTP/1.0
Host: example.com
Content-Type: application/json
ce-specversion: 1.0
ce-type: water.region.telemetry
ce-source: plant-123-zone-5
ce-id: 610b6dd4-c85d-417b-b58f-3771e532

{
  "temp": 24.5,
  "pH": 7.7,
  "cond": 1.2 
}

Your action can access the temperature, pH, and conductivity field directly from the parameter structure passed in. The HTTP headers are in the __ow_headers field:

function main(event) {
    if (event.__ow_headers["ce-type"] == "water.region.telemetry") {
       // trivial example of saving the event
       db.save({
         plant: event.__ow_headers["ce-type"],
         temp: event.temp,
         // ...
    }
}

Eventing middleware

Standardizing on an event's metadata may not seem like a significant improvement, but its simplicity and non-intrusiveness is part its appeal—let's explore why.

As mentioned above, one of CloudEvents' goals was to help in the delivery of events to their intended destination. In order to route events properly, middleware will need to make those routing decisions based on information about (and in) the events. Where the event came from, what was the occurrence that resulted in the event, and even the data format of the event are all examples of things that could influence these routing decisions. If this information is embedded within the business data of the event, this would mean that the middleware would need to know how to parse and understand the various fields of each and every event that it might see. It could be quite expensive to maintain such a system as the variety of events grows over time.

This is where CloudEvents come in. As shown in the example above, by adding a few extra HTTP headers, it would be trivial to base these routing decisions on those bits of data. In fact, not only does that middleware not need to fully understand the shape and semantics of the event itself, it doesn't even need to look at the HTTP body at all—it can treat it as a binary blob and never actually formally parse it.

The other thing to note about this is that because the extra CloudEvents properties are "additive" to the existing message, it would still work even if this message were sent to a receiver that didn't know anything at all about CloudEvents. This means that turning a message into a CloudEvent in this way is backwards compatible with all existing systems, and, over time, newer (CloudEvents-aware) components can be developed and seamlessly dropped into the existing workflows with no impact or additional changes needed.

Leveraging IBM Cloud Functions

You might be wondering how all of this relates to a serverless platform like IBM Cloud Functions. Well, if you think about how this eventing infrastructure works, at its core, it's not much different than a proxy receiving and forwarding messages. As the number of messages increases, it'll need to scale up and then back down when the load decreases. Sound a lot like what serverless platforms were designed for, no?

Let's see what a trivial example might look like:

function main(event) {
    var mapper = {
      "com.ibm.cloud.cos.document.create": "create.myfunctions.com",
      "com.ibm.cloud.cos.document.delete": "delete.myfunctions.com",
      "com.ibm.cloud.cos.document.update": "update.myfunctions.com",
      "com.amazonaws.s3.PutObject"       : "update.myfunctions.com",
      "com.amazonaws.s3.DeleteObject"    : "delete.myfunctions.com",
    }

    var url = '' ;
    if ( event.__ow_headers != null ) {
      type = event.__ow_headers["ce-type"] ;
      if ( type != null ) url = mapper[type] ;
    }
    if ( url == null || url == "" ) url = 'default.myfunctions.com' ;

    // request.post( url, ...
}

Obviously, this code isn't complete—I didn't complete things like the forwarding(POST) of the event to the destination—but that's not really the key aspects of this. Notice that this looks very similar to a "normal" function that you might see. Since IBM Cloud Functions supports CloudEvents, this code really only needs two things to become "middleware":

  1. It needs to know where to look in the incoming CloudEvent for the data of interest. In this case, we're just looking at the ce-type HTTP Header (the semantics classifier of the event) and using that to determine where to send the message. See the type = event.__ow_headers["ce-type"] ; line.
  2. It needs to have some kind of mapping from the incoming ce-type values to the various destinations. See the mapper variable.

While I took the really easy way out in this example (just to make the point) by putting this mapping into a hashtable called mapper, you could easily make this more "real" by reading this data from some external configuration that could change without requiring a redeployment of the code.

But, even if you didn't do that, since IBM Cloud Functions allows for you to quickly update your running functions with no downtime, there's really no technical reason you couldn't keep it as I did above, if you really wanted. You can even make the logic smarter by supporting things like regular expressions or using multiple pieces of data from the incoming messages instead of just one, like I did.

But what's not in there? There's no logic at all to parse or understand the potentially complicated event itself in the HTTP body. This not only makes the code easier to write and maintain, but faster to run. It also means that as new events are added to the environment, all we need to do is add one more entry into our mapping table.

There's something else worth noting—did you notice that the mapper hashtable has entries for two different groupings of events? One from com.ibm.cos.* and one from com.amazonaws.s3.*. Just to reinforce what I've been saying, this means that this code can handle events from multiple event producers, each with their own eventing classification definitions and event schema. 

All we need to do to enable support for them is know what the values of this one field (ce-type) will look like—nothing else. Compare that to what it would take for someone not familiar with either of those event producers to track down the exact shape of each and every event that might be generated and figure out which field in their eventing payload to use for this logic. And, what if their schema changed over time such that you would have to support multiple versions? That's a lot of technical debt to manage—all simplified down to one common field in a well-known location.

As I said above, that's a significant improvement, but with simplicity and non-intrusiveness.

How far can we take this?

While the example I showed here was pretty simple, that's actually part of the point of this. With CloudEvents and hosting platforms that support them, such as IBM Cloud Functions, you could easily create your own "middleware" by simply reusing the same infrastructure that you're using for your normal business applications. To the runtime, it's all just code and it'll manage it just like anything else.

You also don't need to limit this to just "routing" decisions. If you look at what projects like Knative are doing, they've designed their entire eventing infrastructure (almost an entire eventing workflow-like engine) around CloudEvents. This means that once the event is delivered into the Knative eventing infrastructure, it'll be converted into a CloudEvent (if it's not already one), and then from there, the message can be passed into Brokers and workflow components (for fan-out, filtering, conditional routing, sequences of event-processing calls)—all without ever having understand the syntax or semantics of these events. It does it all by just looking at the CloudEvents metadata.

I don't expect CloudEvents to get a lot of hype. To me, it's one of those relatively low-key bits of technology that will prove its worth though adoption based on actual value to developers and not because it's the next hot topic in the press. We're already seeing it being picked up by open source projects and industry leaders alike, and if it becomes one of those technologies that is "just there" but would be greatly missed if it were to suddenly vanish, then that means it's doing its job.

Keep an eye out for some other work being done by the CNCF's Serverless Working Group. After completing CloudEvents, they decided to tackle some additional pain-points being seen in the community, so watch this space.

Learn more

Be the first to hear about news, product updates, and innovation from IBM Cloud