What happens when 1000 goroutines create, read, and delete maps concurrently? We measure wall time, GC pauses, and memory across three concurrency patterns.

1000 Goroutines vs Maps: GC Pressure Experiment

What actually happens when you launch 1000 goroutines, each working with maps? How does the choice of concurrency pattern affect GC pressure, allocation overhead, and wall-clock time?

We ran three modes on Go 1.25.6 (GOMAXPROCS=12) with 1000 goroutines, each handling 1000 map entries:

Isolated: each goroutine creates its own map[int]int
Shared Mutex: all goroutines write to one shared map protected by sync.Mutex
sync.Map: all goroutines use sync.Map (lock-free reads)

Wall Time Comparison

The isolated pattern is 37x faster than shared mutex. Each goroutine works on its own map with zero contention, allowing all 12 CPU cores to work in parallel. The mutex serializes all writes through a single lock, turning a parallel workload into a sequential one.

gc 2 @0.001s 7%: 0.15+0.66+0.16ms clock, 1.8+0.22/0.27/0+1.9ms cpu, 4→4→1 MB
gc 3 @0.004s 8%: 0.20+0.77+0.06ms clock, 2.4+1.1/0.87/0+0.7ms cpu, 7→8→2 MB
gc 5 @0.006s 11%: 0.16+0.76+0.03ms clock, 1.9+1.0/1.1/0+0.3ms cpu, 5→7→2 MB
gc 8 @0.010s 18%: 0.20+1.00+0.05ms clock, 2.4+1.7/1.7/0+0.6ms cpu, 6→10→5 MB

Key observations:

9 GC cycles in 13ms — the collector runs every ~1.5ms
Heap oscillates 1-10 MB — maps are created and discarded faster than GC can collect
CPU overhead peaks at 18% — significant but manageable on 12 cores
STW pauses are sub-millisecond — individual pauses are 0.05-0.20ms

Running This Experiment

// work/mapbench/cmd/goroutines1k/main.go
go run ./cmd/goroutines1k -n 1000 -size 1000 -mode isolated
go run ./cmd/goroutines1k -n 1000 -size 1000 -mode shared-mutex
go run ./cmd/goroutines1k -n 1000 -size 1000 -mode shared-syncmap

// With runtime/trace for go tool trace viewer
go run ./cmd/goroutines1k -n 1000 -size 1000 -trace report.trace
go tool trace -http=:9091 report.trace

Key Takeaways

Isolated maps dominate when goroutines don't need shared state — 37x faster than mutex, no contention
sync.Map costs 375x more mallocs due to interface{} boxing — only worth it for read-heavy workloads
GC pressure scales linearly with goroutine count x map size — 10K goroutines trigger 65 GC cycles
Mutex serializes everything — 662ms for work that takes 18ms in parallel
GC pauses stay sub-millisecond on Go 1.25 even under heavy allocation pressure

1000 Goroutines vs Maps: GC Pressure Experiment

1000 Goroutines vs Maps: GC Pressure Experiment

Wall Time Comparison

GC Behavior

Memory Allocation Profile

GC Pause vs Throughput Tradeoff

Scaling: Map Size vs Goroutine Count

Architecture: How Each Mode Works

Full Results Table

When to Use Each Pattern

GC Trace Deep Dive

Running This Experiment

Key Takeaways

On this page