ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect…

Follow publication

Tuning Go Application, which has GC issues with a few steps

Lessons Learned in Performance Tuning and Garbage Collection

Figure taken from this article

Motivation

I am working on a project that pools the messages from Kafka and writes them to Google Cloud Storage. When I compared it to the existing solution Confluent Google Cloud Sink Connector under the same environment (millions of messages topic, 50m+), it could not perform well 😔. It finishes one hour after the connector finishes. When I tried to investigate why it has such latency, I met some GC issues on our app. I wanted to share my investigation and implementation. As a result of the journey, we didn’t fall behind; we finished 10 minutes before.

Figure: Before improvement (-1h)
Figure: After improvement (+10min)

During this journey writing benchmark tests and reading pprof results is important because premature optimization is the root of all evil, if you don’t know well, you can take a look at my pprof article.

Little Bit Stack & Heap Allocations

In Go, memory allocation happens in one of two places: the stack or the heap. Each goroutine has its own stack, which is a continuous block of memory. In contrast, the heap is a large shared memory area accessible by all goroutines. The diagram below illustrates this concept.

Figure: Three goroutines that its own stack and shared one heap

The stack is self-managing and used exclusively by a single goroutine. In contrast, the heap relies on the garbage collector (GC) for cleanup. Increased heap allocations put more pressure on the GC. When the GC runs, it consumes 25% of the available CPU resources and can introduce “stop-the-world” latency, during which the application is momentarily paused for several milliseconds.

In general, the cost of garbage collection is directly proportional to the volume of heap allocations made by your program.

According to our experience, here is our list, for reducing heap 👇

  • Using sync.Pool (for reusing objects instead of initializing every time). Be careful about the implementation, misimplementation causes more harm than benefit.
  • Prefer strings.Builder over + concentration
  • Try to preallocate slices and maps if their size can be known
  • Reduce pointer usage
  • Try to avoid large local variables in a function
  • Observe compiler decisions via go build -gcflag="-m" ./... output. Take a look at inlining optimisations. (related -l flag)
  • Struct data alignment technique, use fieldalignment linter

You can also take a look at the great resources below. 👇

Changing json library

  • We used the bytedance/sonic instead of the encoding/json library for serialization and deserialization with the size of ~1.50KB objects. Local load test (100k Kafka messages), pprof results (from 2.55s to 0.68s) 👇
Figure: Sonic performs very well

By changing your json library, you gain speed and performance but sacrifice stability, that’s the tradeoff. encoding/json is a more suitable one in comparison to others.

GOGC and GOMEMLIMIT Tuning

  • GOGC (Garbage Collection target percentage) is an environment variable in Go that controls the garbage collector’s aggressiveness by specifying the target heap growth rate. For example, GOGC=100 means the heap can grow by 100% before triggering garbage collection, while lower values result in more frequent collections but smaller memory usage.
  • GOMEMLIMIT is an environment variable introduced in Go 1.19 that sets a soft limit on memory usage for the Go runtime. When the total memory used by the program approaches this limit, the garbage collector becomes more aggressive in staying within the defined budget, helping manage memory in memory-constrained environments.

You can take a look at this great presentation and the GC guide article for more detail.

We used GOGC=off and GOMEMLIMIT=90% of cgroup’s memory limit by using the automemlimit library. Setting GOGC=off disables regular garbage collection based on heap growth, and GC only runs when memory usage approaches the GOMEMLIMIT, which acts as a soft limit for the heap.

Our app’s mem and CPU as shown below, (ref)

resources:
limits:
cpu: '2'
memory: 4Gi
requests:
cpu: '1'
memory: 2Gi

Results

  • GC Pause Frequency is decreased from 350 to 30. (calls per minute)
Figure: GC Pause Frequency (calls per minute of stop-the-world garbage collection)
  • GC Pause Time is decreased from 40ms to 400µs at peak.
Figure: GC Pause Time (milliseconds spent in stop-the-world garbage collection)
  • GC Time Per Min is decreased from 60µs to 2.5µs
Figure: GC Time Per Min
  • CPU utilization is increased from %3000 to %5000.
Figure: CPU utilization & usage in different network bandwidths
  • Our total memory usage is increased (we allowed that for no aggressive GC pressure)
Figure: Memory Usage

Yes, we allocated more memories but reduced overall GC & CPU time.🚀

Thank you for reading so far 💛. All feedback is welcome 🙏

We are still trying to optimize our app; this is not the end, but the beginning 😃 💃

I also thank Emre Odabas and Mehmet Sezer for supporting me on this journey. 💪

Further Reading

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.