Mutexes are often a source of contention that can result in performance issues or deadlocks. It’s no different in Go, and locating the root cause is often challenging. In obvious deadlock situations where all goroutines are waiting, the runtime may be able to detect or predict mutex-related issues and panic. Generally, the problems will manifest themselves at the application logic level.

Let’s look at this simple example:

lock := &sync.Mutex{}
// goroutine1
go func() {
lock.Lock()

// here we make other goroutine1 wait
time.Sleep(500 * time.Millisecond)

fmt.Printf(“%v: goroutine1 releasing…\n”, time.Now().UnixNano())
lock.Unlock()
}()

// goroutine2
go func() {
fmt.Printf(“%v: goroutine2 acquiring…\n”, time.Now().UnixNano())
lock.Lock()
fmt.Printf(“%v: goroutine2 done\n”, time.Now().UnixNano())
}()

time.Sleep(1 * time.Second)

The lock is obtained in the first goroutine, and the second goroutine has to wait for it.

Problems like this will most likely not be detected in the development phase, when there is no concurrent use of an application, and they will result in a performance issue only in the production environment. As a side note, it’s always a good idea to have automated performance regression testing in place, which will simulate concurrent live traffic.

Utilizing pprof

Go has a built-in block profiling and tracing toolset for such situations: pprof. Basically, an application has to expose the profilers on an HTTP port by importing the net/http/pprof package. Afterward, different profiles can be requested by running go tool pprof http://localhost:6060/debug/pprof/block.

Although pprof’s block profiler or tracer can be extremely helpful in identifying contention issues, there are a few obstacles in using pprof against production environment:

  • The profiler’s HTTP handler, which accepts profiling requests, needs to attach itself to the application’s HTTP server or have one running, which means extra security measures should be taken to protect the listening port.
  • Locating and accessing the application node’s host to run the go tool pprof against may be tricky in container environments like Kubernetes.
  • If the application has a deadlock or cannot respond to pprof requests, no profiling or tracing is possible. Profiles recorded before the problem was detected would be very helpful in cases like this.

For production and development environments, the IBM Instana platform provides automatic blocking call profiling. It reports regularly and profiles to the dashboard, which are accessible in the Hot spots/Time section.

Getting started with the IBM Instana platform

The IBM Instana platform is a powerful tool that can help developers quickly and efficiently detect lock contention issues in Go applications. With its automated profiling capabilities, the platform can easily identify the sections of code where multiple goroutines are competing for the same lock, allowing developers to pinpoint and resolve contention issues.

Using the platform, developers can help ensure that their applications perform optimally and deliver the best possible user experience. So, if you’re developing Go applications and looking for an effective way to detect lock contention, consider using the IBM Instana platform to streamline your debugging process and improve your application’s overall performance.

If you’re not already an IBM Instana user, you can sign up for a free two-week trial.

Get started with IBM Instana

Categories

More from IBM Instana

Observing Camunda environments with IBM Instana Business Monitoring

3 min read - Organizations today struggle to detect, identify and act on business operations incidents. The gap between business and IT continues to grow, leaving orgs unable to link IT outages to business impact.  Site reliability engineers (SREs) want to understand business impact to better prioritize their work but don’t have a way of monitoring business KPIs. They struggle to link IT outages to business impacts because data is often siloed and knowledge is tribal. It forces teams into a highly reactive mode…

Buying APM was a good decision (so is getting rid of it)

4 min read - For a long time, there wasn’t a good standard definition of observability that encompassed organizational needs while keeping the spirit of IT monitoring intact. Eventually, the concept of “Observability = Metrics + Traces + Logs” became the de facto definition. That’s nice, but to understand what observability should be, you must consider the characteristics of modern applications: Changes in how they’re developed, deployed and operated The blurring of lines between application code and infrastructure New architectures and technologies like Docker,…

Debunking observability myths – Part 5: You can create an observable system without observability-driven automation

3 min read - In our blog series, we’ve debunked the following observability myths so far: Part 1: You can skip monitoring and rely solely on logs Part 2: Observability is built exclusively for SREs Part 3: Observability is only relevant and beneficial for large-scale systems or complex architectures Part 4: Observability is always expensive In this post, we'll tackle another fallacy that limits the potential of observability—that you can create an observable system without observability driven by automation. Why is this a myth? The notion that…

Top 8 APM metrics that IT teams use to monitor their apps

5 min read - A superior customer experience (CX) is built on accurate and timely application performance monitoring (APM) metrics. You can’t fine-tune your apps or system to improve CX until you know what the problem is or where the opportunities are. APM solutions typically provide a centralized dashboard to aggregate real-time performance metrics and insights to be analyzed and compared. They also establish baselines to alert system administrators to deviations that indicate actual or potential performance issues. IT teams, DevOps and site reliability…