How to see performance details for cloud functions (FaaS)
In a serverless application platform, functions as a service (FaaS) run opaquely for the most part. In general terms, when a serverless function is invoked, the platform accepts the request and provisions resources before it executes the function. Let’s refer to the duration between accepting the request and the start of the function execution as the “system time”. In contrast, “user time” is the time taken in executing the function. System time and user time equal the total time it takes to service a single function request.
Serverless application development platforms provide almost total visibility into system and user times and can aggregate the information systematically. Tools that consume the information help with analyzing app performance and improving app efficiencies. For example, IBM Cloud Functions(1) records system and user times for each function activation, which is stored as part of the activation record.
Improve Performance with IBM Cloud Shell
The IBM Cloud shell aggregates and visualizes the information to make it readily possible to measure the mean execution time of individual functions and their variance. In addition, when functions are composed into sequences or used to orchestrate a more general control flow, the shell provides a detailed trace view to quickly identify performance bottlenecks and properly account for system versus user time.
Reports from the Field
Below are performance visualization taken from an application I run regularly to alert Apache OpenWhisk committers when their pull requests break the build.
I’ll describe how I composed the application’s functions in a separate article. Briefly, here’s how my application works. When When a Travis CI build completes, it triggers the application (travis2slack.app) to fetch the build logs, analyze them for failures, and alert the contributor on Slack with a build summary.
Example Slack notifications for a failed and passing Travis CI build, here shown for two different pull requests.
Looking at Session Flow
What the application does is evident in looking at the visual representation of the application as generated by the shell below.
The application control flow is shown on the right. It should be evident there are several steps in the execution, including some error handling logic (e.g., a try-catch) and a mixture of cloud and inline functions. Since cloud functions can be described in a many languages, it is easy to mix and match Node.js and Python functions for example, where appropriate.
(The shell is a desktop application. You can install it with “npm -g install @ibm-functions/shell”. From your terminal, you launch it with “fsh shell”. The figure on the left is generated by typing “app list” in the shell prompt and then selecting an application from the list of deployed apps. The figure on the right is an example session. You can see your sessions with the command “session list”, and clicking on one from the list shown.)
The figure on the left below represents a single execution of the application. Green rectangles (function did not error) and red rectangles (function existed with an error) show the dynamic trace of the execution. Grey rectangles shows a path not taken.
The session flow is generally useful for debugging application execution. It lends itself well to finding specific function failures, and replaying only those executions, possibly using a debugger to identify the root cause of a failure.
Looking at a Trace View
The Trace View is another way of visualizing a session. You can cycle through the views by selecting the appropriate tab in the bottom panel. In addition to Trace and Session Flow, you can also inspect the result of the activation result and the raw activation record.
The trace is rendered as a table (below right), with each row representing a step in the execution as it unfolds from top to bottom. Various metrics are shown including the execution duration of the function and the system overhead. In the figure above, the latter is visible as queueing delays.
Along with the Summary view, the Trace view is useful for understanding the performance and where bottlenecks may be.
The shell complements the Session and Trace views with performance summaries for individual functions.
Performance of individual functions–see below left–show the mean and a 25%–75% execution time spread.
This view is accessible by typing the command “summary” on the shell prompt.
The summary view is useful when identifying performance bottlenecks, and honing in on functions where the performance varies significantly from one invocation to the next. The shell attempts to explain this performance for outliers, based on the information gleaned from the activation records.
Below left is an example for the function analyze.log.
Clicking on a specific outlier brings up its corresponding activation record, which permits further inspection of the input, the result, even the function code. It’s also possible to replay a specific activation and attach a debugger, or edit the code in place and redeploy . For the curious, analyze.log fetches the Travis CI logs from S3 and occasionally this is a slow operation. It then parses the log file for failed tests. I will make the code available on GitHub when I describe the composition in more detail. Once in the Summary view, you can change between Summary, Grid and Timeline views.
In the Grid view (left), a heat map of activations allows one to quickly find slow or failed activations. You can type the command “grid” on the shell prompt to bring up this view. The grid may be aggregated over thousands of activations. It provides a visual representation of the execution variance, while also making it readily possible to find failed executions.
Lastly, the Timeline (above right) shows executions unfolding over time, with an estimated cost per million activations (ignoring the IBM Cloud free tier of service). The timeline view is another way of looking at a set of activations over time. The shell command for this view is “timeline”; or select from the bottom panel from relevant views.
The Timeline highlights the non-uniform nature of my travis2slack.appactivations. When developers and contributors are active, there are many pull requests. It so happens there are contributions to Apache OpenWhisk from other continents, explaining the spurt of activity at 4AM in this timeline. (I’m using IBM Cloud Functions in the US region.)
While it is possible to generate some of these views in other data visualization stacks, one of the advantages of the shell is its fully immersive, fast, and interactive nature. One can quickly switch from development, to monitoring, to debugging, without having to leave the shell environment, which runs as a native application on your desktop.
The shell is a plugin based application that is rapidly evolving and with features added every day. It is available on GitHub and we welcome contributions. For more information about the composer, visit the getting started guide and stay tuned for more articles on the topic.
You can reach us directly by opening issues on GitHub.
1 IBM Cloud Functions is a managed implementation of the Apache OpenWhisk project, an open source serverless application programming platform as a service, which my team founded and to which we actively contribute.