Go Performance Guide
Go Internals

1000 Goroutines vs Maps: GC Pressure Experiment

What happens when 1000 goroutines create, read, and delete maps concurrently? We measure wall time, GC pauses, and memory across three concurrency patterns.

1000 Goroutines vs Maps: GC Pressure Experiment

What actually happens when you launch 1000 goroutines, each working with maps? How does the choice of concurrency pattern affect GC pressure, allocation overhead, and wall-clock time?

We ran three modes on Go 1.25.6 (GOMAXPROCS=12) with 1000 goroutines, each handling 1000 map entries:

  • Isolated: each goroutine creates its own map[int]int
  • Shared Mutex: all goroutines write to one shared map protected by sync.Mutex
  • sync.Map: all goroutines use sync.Map (lock-free reads)

Wall Time Comparison

The isolated pattern is 37x faster than shared mutex. Each goroutine works on its own map with zero contention, allowing all 12 CPU cores to work in parallel. The mutex serializes all writes through a single lock, turning a parallel workload into a sequential one.

GC Behavior

Memory Allocation Profile

sync.Map allocates 3.2x more memory than the other approaches. Every Store() call boxes the key and value into interface{}, generating ~3.1 million malloc/free pairs versus ~8K for the isolated pattern.

GC Pause vs Throughput Tradeoff

Scaling: Map Size vs Goroutine Count

We also tested how the isolated pattern scales across different configurations:

Architecture: How Each Mode Works

Full Results Table

When to Use Each Pattern

GC Trace Deep Dive

The GODEBUG=gctrace=1 output for the isolated pattern shows rapid GC cycling as goroutines allocate and discard maps:

gc 2 @0.001s 7%: 0.15+0.66+0.16ms clock, 1.8+0.22/0.27/0+1.9ms cpu, 4→4→1 MB
gc 3 @0.004s 8%: 0.20+0.77+0.06ms clock, 2.4+1.1/0.87/0+0.7ms cpu, 7→8→2 MB
gc 5 @0.006s 11%: 0.16+0.76+0.03ms clock, 1.9+1.0/1.1/0+0.3ms cpu, 5→7→2 MB
gc 8 @0.010s 18%: 0.20+1.00+0.05ms clock, 2.4+1.7/1.7/0+0.6ms cpu, 6→10→5 MB

Key observations:

  • 9 GC cycles in 13ms — the collector runs every ~1.5ms
  • Heap oscillates 1-10 MB — maps are created and discarded faster than GC can collect
  • CPU overhead peaks at 18% — significant but manageable on 12 cores
  • STW pauses are sub-millisecond — individual pauses are 0.05-0.20ms

Running This Experiment

// work/mapbench/cmd/goroutines1k/main.go
go run ./cmd/goroutines1k -n 1000 -size 1000 -mode isolated
go run ./cmd/goroutines1k -n 1000 -size 1000 -mode shared-mutex
go run ./cmd/goroutines1k -n 1000 -size 1000 -mode shared-syncmap

// With runtime/trace for go tool trace viewer
go run ./cmd/goroutines1k -n 1000 -size 1000 -trace report.trace
go tool trace -http=:9091 report.trace

Key Takeaways

  1. Isolated maps dominate when goroutines don't need shared state — 37x faster than mutex, no contention
  2. sync.Map costs 375x more mallocs due to interface{} boxing — only worth it for read-heavy workloads
  3. GC pressure scales linearly with goroutine count x map size — 10K goroutines trigger 65 GC cycles
  4. Mutex serializes everything — 662ms for work that takes 18ms in parallel
  5. GC pauses stay sub-millisecond on Go 1.25 even under heavy allocation pressure

On this page