Logging and Error Handling in the Cloud

Share this post:

The importance of logging in the cloud

When an application in the cloud has a glitch or the customer sees the dreaded “Page not available” message, people want to know what happened and why. Management demands a root cause analysis (RCA) report as soon as the problem is fixed. Sometimes they don’t even have the patience to wait for the fix to be in. If developers have not done a good job of logging and handling errors then RCA becomes a tougher job. Cloud-native application developers tend to forget error-logging 101 because they think redeploying the application fixes everything. Catching and logging error codes and RCA are two sides of the same coin and it is no different in the cloud.

Why logging?

A basic tenet of software development, logging is useful to developers, operations, security teams, and end users. Remember watching Star Trek and hearing, “Captain’s Log, Stardate 42073.1. There has been an outbreak of an unclassified plasma plague in the Rachelis system. We’re on an emergency run to collect specimens.”? As viewers, we knew where Captain Kirk and Starship Enterprise were heading to and why.

Developers need the ability to find problems in their applications both during and after deployment. Debugging tools help, but log messages help in localizing the problem. If regular logs are not enough, many applications can also provide debug logs with different switches, which offer more details about what the application is doing.

Operations oversee many systems, both on- and off-premises. Logs and dashboards provide the visual cues that let them how systems and application are running.

IT security teams constantly view logs and analyze them for different reasons, including audit and compliance purposes. HIPAA, PCI, GDPR, and other compliance regulations place huge requirements around how logs are gathered, stored, and managed.

Finally, it is good programming practice to let end users of applications know what happened and why in the case of a glitch. When these glitches are happening in the cloud, there can be a sense of helplessness.

What to log?

According to the 12-factor application guidelines, logs are streams of aggregated, time-ordered events. That begets the question of what should be logged (keeping in mind the mantra of not to log too much or too little). The primary objective of logging is troubleshooting, but logs are also generated for other alerting purposes and business intelligence.

At a minimum, each log entry needs to include enough information for the intended monitoring and analysis. It could be full content data, but it is more likely to be an extract or summary of properties. The application logs must record “when, where, who, and what” for each event. The properties for these will be different depending on the architecture, class of application, and host system/device, but often include what is shown in the table:

Property Description
  • Log date and time (international format)
  • Event date and time (the event time stamp may be different to the time of logging, e.g., server logging where the client application is hosted on a remote device that is only periodically or intermittently online)
  • Application identifier (e.g., name and version)
  • Application address (e.g., cluster/host name or server IPv4 or IPv6 address and port number, workstation identity, local device identifier)
  • Service (e.g., name and protocol)
  • Geolocation
  • Window/form/page (e.g., entry point URL and HTTP method for a Web application, dialog box name)
  • Code location (e.g., script name, module name)
  • Type of event
  • Severity of event
  • Security relevant event flag (if the logs contain non-security event data too)
  • Description
Who (human or system)
  • Source address (e.g., user’s device/machine identifier, user’s IP address, cell/RF tower ID, mobile telephone number)
  • User identity (if authenticated or otherwise known, e.g., user database table primary key value, user name, license number)

Logging best practices

Here are some guidelines we follow when advising clients who are developing or moving applications to the cloud:

  • Don’t use different formats in the same file; stick to one common format.
  • Don’t log everything at the same log level. Logging libraries offer several log levels.
  • Allow dynamically changeable and remotely changeable logging. For example, switch from INFO log level in pre-production to DEBUG in production.
  • Add standardized timestamps while keeping them as granular as possible, and include time zone information.
  • Add transaction identifiers that will help track distributed transactions across composite applications, middleware, and microservices.
  • Keep multi-line events to a minimum.
  • Add the exact method name and line number to the error message. This makes it much easier to find the issue in code.

Lastly, log files should be copied and moved to permanent storage. They should be kept safe and confidential, even when backed up.

We would like to know what practices other cloud application development teams follow.

Learn more

Executive IT Specialist, WW Hybrid Cloud Services

Ashok Iyengar

Executive Cloud Architect - Cloud Adoption and Transformation

More Garage stories
December 12, 2018

Cognitive Trading Using Watson

With the Cognitive Trade Advisor, we at the IBM Cloud Garage São Paulo are very proud of having the opportunity of being the first team that applied cognitive technologies to help the development of International Commerce!

Continue reading

November 29, 2018

Expansion of IBM Cloud Garage

We have recently expanded the IBM Cloud Garage through collaboration with more teams across IBM, and we're delighted to share that IBM Garage is an expansion of IBM Cloud Garage.

Continue reading

November 28, 2018

Emerging Through the Cloud

Enterprises thinking about the Cloud Adoption and Transformation journey often inquire about the dimension entitled Emerging Innovation Spaces, commonly known as Emerging Technologies. See how the Cloud, the Internet of Things (IoT), and AI work together as emerging technologies.

Continue reading