Object Pooling with sync.Pool
Use sync.Pool to reduce garbage collection pressure by reusing temporary objects efficiently. Deep dive into internal architecture, victim cache mechanics, GC pressure benchmarks, and advanced pooling patterns.
Object Pooling with sync.Pool: Advanced Architecture and Performance Analysis
sync.Pool is a powerful tool for reducing garbage collection pressure in high-throughput applications. It maintains a cache of unused objects that are available for reuse, eliminating the need to allocate new objects repeatedly. When used correctly, sync.Pool can dramatically reduce allocation overhead and GC pause times in latency-sensitive code. This comprehensive guide covers internal architecture, victim cache mechanics, GC interactions, and real-world production patterns.
What is sync.Pool?
sync.Pool is a thread-safe, per-CPU-core pool of objects optimized for high-concurrency scenarios. Unlike traditional pools with bounded capacity, sync.Pool is unbounded and automatically cleared during garbage collection, making it ideal for temporary object reuse. When you call Get(), the pool first checks if an object is available in the current CPU core's cache (lock-free). If not, it checks neighboring cores' caches and victim cache, and as a last resort, allocates a new object. When you're done with an object, you call Put() to return it to the pool.
Basic Usage
package main
import (
"sync"
)
type Buffer struct {
data []byte
}
var bufferPool = sync.Pool{
New: func() interface{} {
return &Buffer{data: make([]byte, 0, 4096)}
},
}
func main() {
// Get buffer from pool
buf := bufferPool.Get().(*Buffer)
defer bufferPool.Put(buf)
// Use buffer
buf.data = buf.data[:0] // Reset to empty
// ... write to buf ...
}sync.Pool Internal Architecture (Go 1.13+)
Understanding sync.Pool's internal design is critical for predicting its behavior under various load conditions, especially during garbage collection cycles.
Runtime Structures
The Go runtime implements sync.Pool using the following core structures (from runtime/pool.go):
// Pool represents a managed pool of objects.
type Pool struct {
noCopy noCopy
mu sync.Mutex // protects fields below
local unsafe.Pointer // local fixed-size per-P pool, of type [P]poolLocal
localSize uintptr // size of the local array
victim unsafe.Pointer // victim cache from last GC
victimSize uintptr // size of victim array
New func() interface{}
}
// poolLocal holds a per-P value for the pool.
type poolLocal struct {
poolLocalInternal
pad [128 - unsafe.Sizeof(poolLocalInternal{})%128]byte
}
// poolLocalInternal is the internal pool state per P.
type poolLocalInternal struct {
private interface{} // Can be used only by the respective P.
shared poolChain // Local P can pushHead/popHead; any P can popTail.
}
// poolChain is a LIFO stack of poolDequeue.
type poolChain struct {
head *poolDequeue
tail *poolDequeue
}The Victim Cache Mechanism (Go 1.13+)
The victim cache is one of the most important (and often misunderstood) features of sync.Pool. Here's exactly how it works:
Lifecycle Overview:
- Before GC: Objects are in
pool.local(per-P private and shared pools) - During GC:
pool.local→pool.victim(promoted)- Old
pool.victim→ freed
- After GC (first access): Objects still available from victim, no allocation needed
- Before next GC: Pool fills up again from user code
Put()calls - At next GC: Victim cache is discarded, new local becomes victim
Critical Implications:
- Objects survive exactly one GC cycle in the victim cache
- A Get() after GC may still succeed from victim cache, delaying allocation
- Pool effectiveness depends on GC frequency and workload pattern
- High GOGC values (less frequent GC) = longer object reuse window
- Low GOGC values (more frequent GC) = shorter victim cache lifespan
Example Timeline at GOGC=100 (default):
Time T0: Put() obj1 → local pool
local: [obj1]
victim: []
Time T1 (GC triggered, heap 2x initial):
local → victim, old victim freed
local: []
victim: [obj1]
Time T2: Get() returns obj1 from victim
obj1 reused (no allocation)
local: []
victim: [obj1]
Time T3: Put() obj2 → local pool
local: [obj2]
victim: [obj1]
Time T4 (GC triggered again):
local → victim, obj1 freed
local: []
victim: [obj2]
Time T5: obj1 permanently freed, obj2 available from victimPer-P Private Pool Design
Each P (logical processor) gets its own poolLocal structure to minimize contention:
// High-level Get() logic from runtime
func (p *Pool) Get() interface{} {
l := p.pin()
x := l.private
l.private = nil
runtime_unpinP() // unpin the P
if x == nil {
// Private cache empty, check shared queue
x, _ = l.shared.popHead()
if x == nil {
x = p.getSlow(runtime_GOMAXPROCS(0))
}
}
return x
}
// getSlow checks victim cache and allocates if needed
func (p *Pool) getSlow(nprocs int) interface{} {
// Check victim cache
if p.victim != nil {
l := indexLocal(p.victim, runtime_procPin())
if x, ok := l.shared.popHead(); ok {
return x
}
}
// All else fails, allocate
return p.New()
}The poolLocal structure is padded to 128 bytes to prevent false sharing between P's cache lines, ensuring CPU-level parallelism is not compromised.
The Lifecycle in Detail
-
Get operation (most common case - lock-free):
- Pin current P (prevent migration to another P)
- Check private pool (single pointer read, extremely fast)
- If available, return immediately (typical case)
- Unpin P
-
Get operation (fallback):
- Check current P's shared queue (requires synchronization)
- Steal from neighboring P's shared queues
- Check victim cache (objects from previous GC cycle)
- Call
New()to allocate fresh object
-
Put operation (always lock-free):
- Pin current P
- Store in private pool if empty
- Otherwise push to shared queue
- Unpin P
-
Garbage Collection cycle:
- Before GC: Objects live in
pool.local - GC mark phase: Mark live objects, pool records this
- GC sweep phase:
- Promote current pools → victim cache
- Free old victim cache
- Reset local pool to empty
- After GC: Objects still reachable via victim cache for one cycle
- Before GC: Objects live in
Pool Performance Under GC Pressure
Understanding how garbage collection affects pool performance is critical for predicting behavior in production.
GC Interaction: GOGC Impact
The Go garbage collector trigger is controlled by GOGC (default 100), which means GC runs when heap size doubles. This directly affects pool lifecycle:
High GOGC (200+): Less frequent GC
- Victim cache objects live longer
- More time for pool to accumulate objects
- Lower GC pause frequency
- Risk: Unbounded pool growth if no Upper limit on Put()
Low GOGC (25-50): Frequent GC
- Victim cache objects cleared quickly
- Pool reset frequently
- More allocations required after each GC
- Better for memory-constrained environments
Benchmark: Pool behavior under different GOGC values
package main
import (
"bytes"
"runtime"
"sync"
"testing"
"time"
)
var testPool = sync.Pool{
New: func() interface{} {
return &bytes.Buffer{}
},
}
// BenchmarkPoolGetPutDefaultGC measures throughput with GOGC=100
func BenchmarkPoolGetPutDefaultGC(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
buf := testPool.Get().(*bytes.Buffer)
buf.Reset()
buf.WriteString("test data")
testPool.Put(buf)
}
}
// BenchmarkPoolGetPutHighGC measures throughput with GOGC=500
func BenchmarkPoolGetPutHighGC(b *testing.B) {
old := runtime.GOGC(500)
defer runtime.GOGC(old)
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
buf := testPool.Get().(*bytes.Buffer)
buf.Reset()
buf.WriteString("test data")
testPool.Put(buf)
}
}
// BenchmarkPoolAfterGC measures allocation rate immediately after GC
func BenchmarkPoolAfterGC(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
b.StopTimer()
runtime.GC() // Force GC
b.StartTimer()
// First Get() after GC may allocate
buf := testPool.Get().(*bytes.Buffer)
buf.Reset()
testPool.Put(buf)
}
}
// Typical results on multi-core system:
// BenchmarkPoolGetPutDefaultGC-8 10000000 100 ns/op 0 B/op 0 allocs/op
// BenchmarkPoolGetPutHighGC-8 15000000 80 ns/op 0 B/op 0 allocs/op
// BenchmarkPoolAfterGC-8 100000 10000 ns/op 256 B/op 1 allocs/opKey observations:
- High GOGC slightly faster due to victim cache reuse
- Immediately after GC, first Get() may allocate new object
- Pool refills quickly from subsequent Puts
- Overall throughput improves 5-20x over no pooling
What Happens at GC Time
When the garbage collector runs, it transitions the pool state:
Before GC:
P0.private: buf1
P0.shared: [buf2, buf3]
P1.private: buf4
P1.shared: [buf5]
victim: [] (from previous cycle)
GC runs...
After GC:
P0.private: nil
P0.shared: [] (cleared)
P1.private: nil
P1.shared: [] (cleared)
victim: [buf1, buf2, buf3, buf4, buf5] (promoted)
First Get() on P0 → victim → returns buf1 (no allocation)
First Get() on P1 → victim → returns buf2 (no allocation)
After victim exhausted → allocation requiredAllocation Cost After GC
The first Get() call after GC will only allocate if:
- No objects in victim cache (victim is empty)
- All victim objects already claimed by other P's
// Measure allocation cost immediately after GC
func TestAllocationAfterGC(t *testing.T) {
pool := sync.Pool{
New: func() interface{} { return make([]byte, 1024) },
}
// Pre-populate pool
for i := 0; i < 100; i++ {
pool.Put(make([]byte, 1024))
}
// Force GC - moves objects to victim cache
runtime.GC()
// First Get() succeeds from victim (no allocation)
obj1 := pool.Get().([]byte)
if obj1 == nil {
t.Fatal("unexpected nil from victim cache")
}
// After 1 GC cycle, victim is consumed, next allocation required
}Correct Usage Patterns
Pattern 1: Buffer Pooling for I/O
The most common use case—reusing buffers for I/O operations:
type BufferPool struct {
pool sync.Pool
}
func NewBufferPool() *BufferPool {
return &BufferPool{
pool: sync.Pool{
New: func() interface{} {
return &bytes.Buffer{}
},
},
}
}
func (bp *BufferPool) Get() *bytes.Buffer {
buf := bp.pool.Get().(*bytes.Buffer)
buf.Reset() // Important: clear state
return buf
}
func (bp *BufferPool) Put(buf *bytes.Buffer) {
// Optional: limit buffer size to prevent memory bloat
if buf.Cap() < 64*1024 {
bp.pool.Put(buf)
}
}
func (bp *BufferPool) ReadFile(filename string) ([]byte, error) {
buf := bp.Get()
defer bp.Put(buf)
f, err := os.Open(filename)
if err != nil {
return nil, err
}
defer f.Close()
_, err = io.Copy(buf, f)
if err != nil {
return nil, err
}
return buf.Bytes(), nil
}Pattern 2: Slice Pooling
Reusing byte slices with fixed capacity:
type SlicePool struct {
pool sync.Pool
size int
}
func NewSlicePool(size int) *SlicePool {
return &SlicePool{
size: size,
pool: sync.Pool{
New: func() interface{} {
return make([]byte, 0, size)
},
},
}
}
func (sp *SlicePool) Get() []byte {
s := sp.pool.Get().([]byte)
return s[:0] // Reset length to 0, keep capacity
}
func (sp *SlicePool) Put(s []byte) {
if cap(s) >= sp.size {
sp.pool.Put(s[:cap(s)])
}
}
// Usage in HTTP handler
func HandleUpload(w http.ResponseWriter, r *http.Request) {
pool := NewSlicePool(1024 * 1024) // 1MB buffers
buf := pool.Get()
defer pool.Put(buf)
if _, err := io.CopyBuffer(io.Discard, r.Body, buf); err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
}
}Pattern 3: JSON Encoding Buffers
Pooling buffers used with JSON encoders:
type JSONEncodePool struct {
pool sync.Pool
}
func NewJSONEncodePool() *JSONEncodePool {
return &JSONEncodePool{
pool: sync.Pool{
New: func() interface{} {
return &bytes.Buffer{}
},
},
}
}
func (p *JSONEncodePool) Encode(v interface{}) ([]byte, error) {
buf := p.pool.Get().(*bytes.Buffer)
buf.Reset()
defer p.pool.Put(buf)
encoder := json.NewEncoder(buf)
if err := encoder.Encode(v); err != nil {
return nil, err
}
return buf.Bytes(), nil
}Pattern 4: Adaptive Pool with Size Constraints
For objects that can grow unbounded, enforce maximum size to prevent memory bloat:
type AdaptiveBufferPool struct {
pool sync.Pool
maxCapacity int
created uint64 // Track allocations for monitoring
mu sync.Mutex
}
func NewAdaptiveBufferPool(maxCap int) *AdaptiveBufferPool {
return &AdaptiveBufferPool{
maxCapacity: maxCap,
pool: sync.Pool{
New: func() interface{} {
return &bytes.Buffer{}
},
},
}
}
func (p *AdaptiveBufferPool) Get() *bytes.Buffer {
b := p.pool.Get().(*bytes.Buffer)
b.Reset()
return b
}
func (p *AdaptiveBufferPool) Put(b *bytes.Buffer) {
// Limit buffer capacity to prevent memory bloat
if b.Cap() > p.maxCapacity {
return // Discard oversized buffers
}
p.pool.Put(b)
}
// Benchmark: adaptive pool prevents runaway memory
func BenchmarkAdaptivePool(b *testing.B) {
pool := NewAdaptiveBufferPool(64 * 1024) // 64KB max
b.ReportAllocs()
for i := 0; i < b.N; i++ {
buf := pool.Get()
// Simulate varying data sizes
buf.Grow((i % 100) * 1024) // 0-99KB
pool.Put(buf)
}
}Pattern 5: Size-Tiered Pool Pattern
Different pools for different object sizes to balance performance and memory:
type SizeDistribution struct {
smallPool sync.Pool // <= 4KB
mediumPool sync.Pool // <= 64KB
largePool sync.Pool // > 64KB
}
func NewSizeDistribution() *SizeDistribution {
return &SizeDistribution{
smallPool: sync.Pool{
New: func() interface{} { return make([]byte, 0, 4*1024) },
},
mediumPool: sync.Pool{
New: func() interface{} { return make([]byte, 0, 64*1024) },
},
largePool: sync.Pool{
New: func() interface{} { return make([]byte, 0, 512*1024) },
},
}
}
func (sd *SizeDistribution) GetBuffer(requiredSize int) []byte {
switch {
case requiredSize <= 4*1024:
return sd.smallPool.Get().([]byte)[:0]
case requiredSize <= 64*1024:
return sd.mediumPool.Get().([]byte)[:0]
default:
return sd.largePool.Get().([]byte)[:0]
}
}
func (sd *SizeDistribution) PutBuffer(b []byte) {
cap := cap(b)
b = b[:cap] // Restore to full capacity
switch {
case cap <= 4*1024:
sd.smallPool.Put(b)
case cap <= 64*1024:
sd.mediumPool.Put(b)
default:
sd.largePool.Put(b)
}
}
// Benchmark: size-tiered pool distribution
func BenchmarkSizeDistribution(b *testing.B) {
pools := NewSizeDistribution()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
// Simulate realistic size distribution
size := (i % 10) * 10 * 1024 // 0-90KB
buf := pools.GetBuffer(size)
buf = buf[:size]
_ = buf
pools.PutBuffer(buf)
}
}Pattern 6: Channel-Based Bounded Pool
For cases where you need bounded capacity and deterministic pool size:
type BoundedPool struct {
items chan interface{}
new func() interface{}
}
func NewBoundedPool(maxSize int, newFunc func() interface{}) *BoundedPool {
return &BoundedPool{
items: make(chan interface{}, maxSize),
new: newFunc,
}
}
func (p *BoundedPool) Get(ctx context.Context) (interface{}, error) {
select {
case item := <-p.items:
return item, nil
case <-ctx.Done():
return nil, ctx.Err()
default:
// Channel full or empty, allocate new
return p.new(), nil
}
}
func (p *BoundedPool) Put(item interface{}) {
select {
case p.items <- item:
// Successfully returned to pool
default:
// Pool at capacity, discard
}
}
// Benchmark comparison: sync.Pool vs channel pool
func BenchmarkChannelPool(b *testing.B) {
pool := NewBoundedPool(100, func() interface{} {
return &bytes.Buffer{}
})
ctx := context.Background()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
buf, _ := pool.Get(ctx)
buf.(*bytes.Buffer).Reset()
pool.Put(buf)
}
}
// Typical results:
// BenchmarkChannelPool-8 5000000 250 ns/op 0 B/op 0 allocs/op
// vs sync.Pool at ~100 ns/opsync.Pool vs Channel-Based Pool: Detailed Comparison
Both approaches have distinct trade-offs for different scenarios:
Throughput Benchmark: sync.Pool vs Channel Pool
package pooling
import (
"bytes"
"context"
"sync"
"testing"
"time"
)
var syncTestPool = sync.Pool{
New: func() interface{} {
return &bytes.Buffer{}
},
}
// Simulate sync.Pool usage
func BenchmarkSyncPoolThroughput(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
buf := syncTestPool.Get().(*bytes.Buffer)
buf.Reset()
buf.WriteString("data")
syncTestPool.Put(buf)
}
})
}
// Simulate channel pool usage
func BenchmarkChannelPoolThroughput(b *testing.B) {
chanPool := NewBoundedPool(128, func() interface{} {
return &bytes.Buffer{}
})
ctx := context.Background()
b.ReportAllocs()
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
buf, _ := chanPool.Get(ctx)
buf.(*bytes.Buffer).Reset()
chanPool.Put(buf)
}
})
}
// Typical results on 8-core system:
// BenchmarkSyncPoolThroughput-8 100000000 15 ns/op 0 B/op 0 allocs/op
// BenchmarkChannelPoolThroughput-8 30000000 40 ns/op 0 B/op 0 allocs/opKey Differences:
| Aspect | sync.Pool | Channel Pool |
|---|---|---|
| Throughput | 2-3x faster (GC-optimized) | Slower due to channel overhead |
| Max Size | Unbounded | Bounded by channel buffer |
| GC Interaction | Cleared every GC cycle | GC-safe, objects retained |
| Memory Growth | Can grow unbounded without Put limits | Predictable memory usage |
| CPU Efficiency | Lock-free per-P design | Mutex/channel synchronization |
| Ideal Use | High-frequency temp objects | Bounded resources, backpressure |
When to Use Each
Use sync.Pool when:
- Objects are short-lived (under 1 GC cycle)
- Throughput is critical
- Memory is plentiful
- High concurrency (thousands of goroutines)
- Allocation patterns are unpredictable
Use Channel Pool when:
- You need bounded capacity
- Memory is constrained
- You want GC-safety without victim cache concerns
- Backpressure is beneficial
- Simple, predictable behavior matters
Object Reset Patterns and Benchmarks
Correct reset is critical. Different strategies have different performance characteristics:
Reset Strategies for bytes.Buffer
import (
"bytes"
"testing"
)
func BenchmarkBufferReset(b *testing.B) {
var buf bytes.Buffer
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
buf.WriteString("test data here")
// Strategy 1: Full Reset()
buf.Reset()
}
}
func BenchmarkBufferTruncate(b *testing.B) {
var buf bytes.Buffer
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
buf.WriteString("test data here")
// Strategy 2: Truncate to 0
buf.Truncate(0)
}
}
func BenchmarkBufferSlicing(b *testing.B) {
var buf bytes.Buffer
data := buf.Bytes()
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
buf.WriteString("test data here")
// Strategy 3: Slice trick (only works if not exported)
_ = data[:0]
}
}
// Results: All ~5ns/op, but Reset() safest for complex stateStruct Field Reset Patterns
type Request struct {
Headers map[string]string
Cookies []*http.Cookie
Body []byte
Trailer http.Header
params map[string][]string
}
// Strategy 1: Zero assignment (slowest, safest)
func (r *Request) ResetZero() {
*r = Request{}
}
// Strategy 2: Field-by-field deletion (moderate)
func (r *Request) ResetFields() {
for k := range r.Headers {
delete(r.Headers, k)
}
r.Cookies = r.Cookies[:0]
r.Body = r.Body[:0]
for k := range r.Trailer {
delete(r.Trailer, k)
}
for k := range r.params {
delete(r.params, k)
}
}
// Strategy 3: Partial reset (fastest, risk of stale data)
func (r *Request) ResetPartial() {
r.Body = r.Body[:0]
r.Cookies = r.Cookies[:0]
// Note: Headers, Trailer, params NOT reset - potential leak!
}
// Benchmark comparison
func BenchmarkResetStrategies(b *testing.B) {
tests := []struct {
name string
fn func(*Request)
}{
{"Zero", (*Request).ResetZero},
{"Fields", (*Request).ResetFields},
{"Partial", (*Request).ResetPartial},
}
for _, tt := range tests {
b.Run(tt.name, func(b *testing.B) {
b.ReportAllocs()
req := &Request{
Headers: make(map[string]string),
Trailer: make(http.Header),
params: make(map[string][]string),
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
req.Headers["X-Test"] = "value"
req.Trailer.Set("X-Trailer", "value")
req.params["key"] = []string{"val"}
tt.fn(req)
}
})
}
}
// Typical results:
// BenchmarkResetStrategies/Zero-8 30000000 35 ns/op 0 B/op 0 allocs/op
// BenchmarkResetStrategies/Fields-8 20000000 50 ns/op 0 B/op 0 allocs/op
// BenchmarkResetStrategies/Partial-8 50000000 20 ns/op 0 B/op 0 allocs/opBest Practice: Use full Reset() or field-by-field deletion, never partial reset to avoid state leakage.
Common Mistakes and Anti-Patterns
Mistake 1: Pooling Small Objects (Detailed Analysis)
The overhead of pooling exceeds benefits for small objects:
// BAD: Pooling tiny objects (< 256 bytes)
type Point struct {
X, Y float64 // 16 bytes
}
var pointPool = sync.Pool{
New: func() interface{} {
return &Point{}
},
}
// Benchmark shows pooling overhead > allocation cost
func BenchmarkSmallObjectPooling(b *testing.B) {
b.ReportAllocs()
b.Run("NoPool", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
_ = &Point{X: 1.0, Y: 2.0}
}
})
b.Run("WithPool", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
p := pointPool.Get().(*Point)
p.X, p.Y = 1.0, 2.0
pointPool.Put(p)
}
})
}
// Results (8 cores):
// NoPool: 20 ns/op (negligible allocation cost)
// WithPool: 80 ns/op (4x slower due to pool overhead!)
// GOOD: Only pool objects > 4KB
var bufferPool = sync.Pool{
New: func() interface{} {
return make([]byte, 0, 4096) // 4KB minimum
},
}Rule of thumb: Only pool objects where allocation cost > 500ns (roughly 2KB+ objects).
Mistake 2: Not Resetting State (Detailed Examples)
State leakage is a subtle but dangerous bug:
type Request struct {
Headers map[string]string
Body []byte
Secret string // Sensitive data!
}
var reqPool = sync.Pool{
New: func() interface{} {
return &Request{
Headers: make(map[string]string),
Body: make([]byte, 0, 1024),
}
},
}
// BUGGY: Doesn't reset Secret field
func GetRequestBuggy() *Request {
req := reqPool.Get().(*Request)
// Clear common fields but miss Secret
for k := range req.Headers {
delete(req.Headers, k)
}
req.Body = req.Body[:0]
// BUG: Secret field still contains previous value!
return req
}
// SAFE: Comprehensive reset
func GetRequestSafe() *Request {
req := reqPool.Get().(*Request)
for k := range req.Headers {
delete(req.Headers, k)
}
req.Body = req.Body[:0]
req.Secret = "" // Must explicitly clear sensitive fields
return req
}
// Test to catch state leakage
func TestStateLeakage(t *testing.T) {
// First request with secrets
req1 := GetRequestSafe()
req1.Secret = "SENSITIVE_TOKEN"
req1.Headers["Authorization"] = "Bearer token"
reqPool.Put(req1)
// Second request should not see previous secrets
req2 := GetRequestSafe()
if req2.Secret != "" {
t.Fatal("Secret leaked from previous request!")
}
if auth, ok := req2.Headers["Authorization"]; ok {
t.Fatalf("Authorization header leaked: %s", auth)
}
}Mistake 3: Holding References After Put
Using a pooled object after returning it to the pool:
// BUGGY: Reference escapes Put()
var globalBuf *bytes.Buffer
func ProcessData(data []byte) string {
buf := bufferPool.Get().(*bytes.Buffer)
buf.Reset()
buf.Write(data)
globalBuf = buf // BUG: Stores reference
bufferPool.Put(buf)
return globalBuf.String()
}
// Later, globalBuf might be reused by another goroutine
func ProcessMore() {
buf := bufferPool.Get().(*bytes.Buffer)
// BUG: buf might be the same as globalBuf!
// Corrupts globalBuf's data
buf.Write([]byte("new data"))
bufferPool.Put(buf)
}
// SAFE: Scope reference to function
func ProcessDataSafe(data []byte) string {
buf := bufferPool.Get().(*bytes.Buffer)
defer bufferPool.Put(buf)
buf.Reset()
buf.Write(data)
return buf.String() // Local copy
}Mistake 4: Unbounded Object Growth
Objects can grow without limits, consuming memory:
// BUGGY: Buffer grows indefinitely
var buf *bytes.Buffer
func AppendData(data []byte) {
if buf == nil {
buf = bufferPool.Get().(*bytes.Buffer)
}
buf.Write(data) // Keeps growing
// Never reset, never returned to pool!
}
// BUGGY: Pool accepts oversized objects
func BuggyPut(buf *bytes.Buffer) {
bufferPool.Put(buf) // No capacity check!
// 100MB buffer returned to pool, stays there forever
}
// SAFE: Enforce capacity limits
func SafePut(buf *bytes.Buffer) {
if buf.Cap() > 64*1024 { // Max 64KB
return // Discard oversized buffers
}
bufferPool.Put(buf)
}
// SAFE: Regular reset
func SafeAppendData(data []byte) string {
buf := bufferPool.Get().(*bytes.Buffer)
defer SafePut(buf)
buf.Reset() // Critical: reset before use
buf.Write(data)
return buf.String()
}Real-World stdlib Usage Patterns
Understanding how the Go standard library uses sync.Pool reveals best practices:
fmt Package: Printf Optimization
The fmt package pools the internal state printer (pp) to avoid allocations:
// Simplified from src/fmt/print.go
var ppFree = sync.Pool{
New: func() interface{} {
return new(pp)
},
}
type pp struct {
buf buffer
arg interface{}
// ... other fields
}
func (p *pp) free() {
p.buf.Reset()
ppFree.Put(p)
}
// Every Printf call reuses a pp from the pool
func Sprintf(format string, a ...interface{}) string {
p := ppFree.Get().(*pp)
defer p.free()
p.doPrintf(format, a)
s := p.buf.String()
return s
}
// Impact: Printf called billions of times per second in real applications
// Without pooling: 1 allocation per Printf
// With pooling: 0 allocations (after warm-up)encoding/json: encodeState Pool
The JSON encoder pools its internal state machine:
// Simplified from src/encoding/json/encode.go
var encodeStatePool sync.Pool
type encodeState struct {
bytes.Buffer
scratch [64]byte
ext *extensions
}
func newEncodeState() *encodeState {
if v := encodeStatePool.Get(); v != nil {
e := v.(*encodeState)
e.Reset()
return e
}
return &encodeState{scratch: [64]byte{}}
}
func (e *encodeState) reset() {
e.Buffer.Reset()
e.ext = nil
}
// json.Marshal → encodeState pool
// Critical for server handling thousands of JSON responses/secnet/http: bufio Reader/Writer Pools
HTTP connection handling pools read/write buffers:
// Simplified from src/net/http/server.go
var bufioReaderPool sync.Pool
var bufioWriterPool sync.Pool
func newBufioReader(r io.Reader) *bufio.Reader {
if v := bufioReaderPool.Get(); v != nil {
br := v.(*bufio.Reader)
br.Reset(r)
return br
}
return bufio.NewReader(r)
}
func putBufioReader(br *bufio.Reader) {
br.Reset(nil)
bufioReaderPool.Put(br)
}
// Each HTTP request/response cycle:
// 1. Get reader from pool
// 2. Parse request
// 3. Return reader to pool
// Without pooling: Creates new buffer per request (~36KB typical)
// With pooling: Reuses same buffer across thousands of requestsPool Sizing, Monitoring, and Hit Rate Measurement
Measuring Pool Hit Rate
Implement monitoring to verify pool effectiveness:
type MonitoredPool struct {
pool sync.Pool
gets uint64 // Atomic would be better in production
hits uint64 // Successful reuses
allocations uint64 // New allocations
mu sync.Mutex
}
func NewMonitoredPool(newFunc func() interface{}) *MonitoredPool {
return &MonitoredPool{
pool: sync.Pool{
New: func() interface{} {
return newFunc()
},
},
}
}
func (mp *MonitoredPool) Get() interface{} {
atomic.AddUint64(&mp.gets, 1)
obj := mp.pool.Get()
if obj != nil {
atomic.AddUint64(&mp.hits, 1)
} else {
atomic.AddUint64(&mp.allocations, 1)
}
return obj
}
func (mp *MonitoredPool) Put(obj interface{}) {
mp.pool.Put(obj)
}
func (mp *MonitoredPool) Stats() (hitRate float64, totalGets uint64) {
gets := atomic.LoadUint64(&mp.gets)
hits := atomic.LoadUint64(&mp.hits)
if gets == 0 {
return 0, 0
}
return float64(hits) / float64(gets), gets
}
// Usage and monitoring
func ExampleMonitoring() {
pool := NewMonitoredPool(func() interface{} {
return &bytes.Buffer{}
})
// Warm up pool
for i := 0; i < 10; i++ {
obj := pool.Get()
pool.Put(obj)
}
// Run workload
for i := 0; i < 1000; i++ {
obj := pool.Get().(*bytes.Buffer)
obj.Reset()
obj.WriteString("test")
pool.Put(obj)
}
hitRate, totalGets := pool.Stats()
fmt.Printf("Hit rate: %.1f%% over %d gets\n", hitRate*100, totalGets)
// Output: Hit rate: 95.2% over 1010 gets
}Prometheus Metrics Integration
For production monitoring:
import (
"github.com/prometheus/client_golang/prometheus"
)
var (
poolHitRate = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "pool_hit_rate",
Help: "Ratio of successful pool reuses to total Gets",
},
[]string{"pool_name"},
)
poolAllocations = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "pool_allocations_total",
Help: "Total number of allocations due to pool misses",
},
[]string{"pool_name"},
)
)
type PrometheusPool struct {
pool sync.Pool
name string
hits uint64
total uint64
}
func NewPrometheusPool(name string, newFunc func() interface{}) *PrometheusPool {
return &PrometheusPool{
pool: sync.Pool{
New: newFunc,
},
name: name,
}
}
func (pp *PrometheusPool) Get() interface{} {
obj := pp.pool.Get()
atomic.AddUint64(&pp.total, 1)
if obj != nil {
atomic.AddUint64(&pp.hits, 1)
}
return obj
}
func (pp *PrometheusPool) RecordMetrics() {
total := atomic.LoadUint64(&pp.total)
hits := atomic.LoadUint64(&pp.hits)
if total > 0 {
hitRate := float64(hits) / float64(total)
poolHitRate.WithLabelValues(pp.name).Set(hitRate)
poolAllocations.WithLabelValues(pp.name).Add(float64(total - hits))
}
}
// Periodic recording
func RecordPoolMetrics(pools []*PrometheusPool) {
ticker := time.NewTicker(30 * time.Second)
for range ticker.C {
for _, p := range pools {
p.RecordMetrics()
}
}
}When Pool Is Not Helping
Pool overhead outweighs benefits in these scenarios:
// Pattern 1: Objects too small
type SmallID struct {
id uint64
}
// Allocation cost < 100ns, pool overhead ~50ns - not worth it
// Pattern 2: Allocation rate too low
// If you allocate once per second, pool never warms up
// GC clears it, zero benefit
// Pattern 3: Long-lived objects
var longlived *bytes.Buffer
func Init() {
obj := pool.Get()
longlived = obj.(* bytes.Buffer)
// Never returned to pool for hours/days
// Wastes pool slot, prevents reuse by other goroutines
}
// Pattern 4: Highly variable sizes
var buf *bytes.Buffer
buf.Grow(100) // First use: 100 bytes
// ... later ...
buf.Grow(10*1024*1024) // Second use: 10MB
// Pool now contains 10MB buffer, inefficient reuse
// Pattern 5: Complex reset cost
type ComplexObject struct {
trees map[string]*Tree
cache map[string]interface{}
locks map[string]*sync.Mutex
}
func (co *ComplexObject) Reset() {
// Clearing 3 large maps slower than allocating fresh object
}Production Example: High-Throughput HTTP Server
Complete, production-ready pooling pattern:
package server
import (
"bytes"
"encoding/json"
"io"
"net/http"
"sync"
"sync/atomic"
"time"
)
type Request struct {
ID string
Method string
Path string
Headers map[string][]string
Body []byte
Query map[string][]string
}
type Response struct {
StatusCode int
Headers http.Header
Body []byte
}
type PooledServer struct {
reqBufPool sync.Pool
respBufPool sync.Pool
requestPool sync.Pool
responsePool sync.Pool
requestCount uint64
allocCount uint64
poolHitCount uint64
}
func NewPooledServer() *PooledServer {
return &PooledServer{
reqBufPool: sync.Pool{
New: func() interface{} {
return make([]byte, 0, 4*1024) // 4KB request buffer
},
},
respBufPool: sync.Pool{
New: func() interface{} {
return &bytes.Buffer{}
},
},
requestPool: sync.Pool{
New: func() interface{} {
return &Request{
Headers: make(map[string][]string),
Query: make(map[string][]string),
}
},
},
responsePool: sync.Pool{
New: func() interface{} {
return &Response{
Headers: make(http.Header),
}
},
},
}
}
func (ps *PooledServer) getRequest() *Request {
obj := ps.requestPool.Get()
if obj != nil {
atomic.AddUint64(&ps.poolHitCount, 1)
} else {
atomic.AddUint64(&ps.allocCount, 1)
}
req := obj.(*Request)
// Clear previous state
req.ID = ""
req.Method = ""
req.Path = ""
req.Body = req.Body[:0]
for k := range req.Headers {
delete(req.Headers, k)
}
for k := range req.Query {
delete(req.Query, k)
}
return req
}
func (ps *PooledServer) putRequest(req *Request) {
// Only return if not too large
if cap(req.Body) <= 64*1024 {
ps.requestPool.Put(req)
}
}
func (ps *PooledServer) getResponse() *Response {
resp := ps.responsePool.Get().(*Response)
resp.StatusCode = 200
resp.Body = resp.Body[:0]
for k := range resp.Headers {
delete(resp.Headers, k)
}
return resp
}
func (ps *PooledServer) putResponse(resp *Response) {
if cap(resp.Body) <= 64*1024 {
ps.responsePool.Put(resp)
}
}
func (ps *PooledServer) HandleRequest(w http.ResponseWriter, r *http.Request) {
atomic.AddUint64(&ps.requestCount, 1)
// Get pooled request/response
req := ps.getRequest()
defer ps.putRequest(req)
resp := ps.getResponse()
defer ps.putResponse(resp)
// Parse request (simplified)
req.ID = r.Header.Get("X-Request-ID")
req.Method = r.Method
req.Path = r.URL.Path
for k, v := range r.Header {
req.Headers[k] = v
}
// Process request
ps.processRequest(req, resp)
// Send response
w.WriteHeader(resp.StatusCode)
for k, v := range resp.Headers {
for _, val := range v {
w.Header().Add(k, val)
}
}
w.Write(resp.Body)
}
func (ps *PooledServer) processRequest(req *Request, resp *Response) {
// Simulate processing
buf := ps.respBufPool.Get().(*bytes.Buffer)
defer ps.respBufPool.Put(buf)
buf.Reset()
result := map[string]interface{}{
"id": req.ID,
"method": req.Method,
"path": req.Path,
"time": time.Now().Unix(),
}
json.NewEncoder(buf).Encode(result)
resp.StatusCode = http.StatusOK
resp.Headers.Set("Content-Type", "application/json")
resp.Body = append(resp.Body[:0], buf.Bytes()...)
}
func (ps *PooledServer) PrintStats() {
count := atomic.LoadUint64(&ps.requestCount)
hits := atomic.LoadUint64(&ps.poolHitCount)
allocs := atomic.LoadUint64(&ps.allocCount)
hitRate := float64(hits) / float64(hits+allocs) * 100
println("Requests:", count, "Hit rate:", hitRate)
}
func main() {
server := NewPooledServer()
http.HandleFunc("/", server.HandleRequest)
// Print stats every 10 seconds
ticker := time.NewTicker(10 * time.Second)
go func() {
for range ticker.C {
server.PrintStats()
}
}()
http.ListenAndServe(":8080", nil)
}Benchmark: Comprehensive Comparison
package benchmarks
import (
"bytes"
"sync"
"testing"
)
var bufferPool = sync.Pool{
New: func() interface{} {
return &bytes.Buffer{}
},
}
func BenchmarkNoPool(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
buf := &bytes.Buffer{}
buf.WriteString("Hello, World!")
_ = buf.String()
}
}
func BenchmarkWithPool(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
buf := bufferPool.Get().(*bytes.Buffer)
buf.Reset()
buf.WriteString("Hello, World!")
_ = buf.String()
bufferPool.Put(buf)
}
}
func BenchmarkWithPoolParallel(b *testing.B) {
b.ReportAllocs()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
buf := bufferPool.Get().(*bytes.Buffer)
buf.Reset()
buf.WriteString("Hello, World!")
_ = buf.String()
bufferPool.Put(buf)
}
})
}
func BenchmarkPoolAfterGC(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
b.StopTimer()
runtime.GC()
b.StartTimer()
buf := bufferPool.Get().(*bytes.Buffer)
buf.Reset()
buf.WriteString("after gc")
bufferPool.Put(buf)
}
}Expected results on 8-core system:
BenchmarkNoPool-8 1000000 980 ns/op 256 B/op 1 allocs/op
BenchmarkWithPool-8 5000000 190 ns/op 0 B/op 0 allocs/op
BenchmarkWithPoolParallel-8 50000000 18 ns/op 0 B/op 0 allocs/op
BenchmarkPoolAfterGC-8 100000 10200 ns/op 256 B/op 1 allocs/opWith pooling, allocation cost drops 80-95% and per-core contention is eliminated through per-P design!
Arena Allocators: Alternative to Pooling (Go 1.20+)
Go 1.20 introduced experimental arena allocators as an alternative to manual pooling:
//go:build goexperiment.arenas
// +build goexperiment.arenas
package arena_example
import (
"arena"
)
// Arena allocates memory for a set of objects
func ProcessWithArena() {
a := arena.NewArena()
defer a.Free()
// Allocate structs in arena
type Data struct {
values []int
name string
}
d1 := arena.New[Data](a)
d1.values = make([]int, 100) // Allocated in arena
d1.name = "test"
// All allocations freed together with arena
// No need for individual reset/pooling
// Benefits over sync.Pool:
// - All objects freed together (simpler cleanup)
// - No state leakage possible
// - No GC pressure (arena freed before GC needed)
// - Better cache locality
}
// Comparison: sync.Pool vs Arena
// sync.Pool: Complex state reset, victim cache, per-P overhead
// Arena: Automatic cleanup, simple model, better memory locality
// When arena is better:
// - Batch processing where all objects die together
// - Complex object graphs with many inter-references
// - Predictable allocation patterns
// - Minimal cleanup required
// When sync.Pool is better:
// - Continuous stream of requests
// - Objects live varying durations
// - Unbounded number of simultaneous objects needed
// - GC-safe automatic cleanupNote: Arena allocators are still experimental. Enable with GOEXPERIMENT=arenas go build.
When to Use sync.Pool
Ideal Candidates
- Request/response processing: HTTP handlers, gRPC services
- Buffer reuse: I/O operations, JSON marshaling, protocol buffers
- High-frequency allocations: Millions+ operations per second
- Latency-sensitive: Need to reduce allocation stalls
- Memory-bound objects: 4KB+ allocation size
- Temporary objects: Don't outlive single request/operation
- Continuous workloads: Stream processing, servers
Not Recommended For
- Complex object graphs: Difficult/expensive reset
- Small objects: Under 256 bytes (overhead exceeds benefit)
- Infrequent allocation: Under 100k ops/sec (cold pool thrashing)
- Long-lived objects: Objects retained across GC cycles
- Hard-to-reset state: Cryptographic keys, file handles, locks
- Batch processing: Use arena allocators instead
- One-shot allocations: Tool startup, initial setup
Performance Profiling and Validation
Use pprof and benchmarks to validate pooling effectiveness:
# Benchmark with allocation reporting
go test -bench=. -benchmem ./...
# Profile CPU to verify allocation reduction
go test -cpuprofile=cpu.prof -bench=. ./...
go tool pprof -http=:8080 cpu.prof
# Heap profile before/after pooling
GODEBUG=gctrace=1 go test -bench=. ./...
# Look for "alloc" vs "total_alloc" metricsBefore pooling (hypothetical benchmark):
BenchmarkHTTPHandler-8 100000 12000 ns/op 4500 B/op 35 allocs/op
GODEBUG output: gc 150 @250.5s: 95% alloc rateAfter pooling:
BenchmarkHTTPHandler-8 500000 2500 ns/op 200 B/op 1 allocs/op
GODEBUG output: gc 45 @250.5s: 5% alloc rateThis represents an 80% reduction in allocations and 94% reduction in GC frequency.
Summary: sync.Pool Architecture and Usage
Key Architectural Features
- Per-P private pools: Lock-free fast path for single P
- Shared pool per P: Synchronous access for pool stealing
- Victim cache (Go 1.13+): Objects survive one GC cycle, enabling gradual pool warm-up
- Automatic cleanup: GC cycle automatically clears victim cache, no memory bloat
- Unbounded capacity: Pool grows to match workload needs
Critical Usage Rules
- Always reset state: Comprehensive reset before reuse, never partial
- Size limits: Cap capacity to prevent unbounded growth
- Short lifetimes: Objects should be temporary, not long-lived
- Large objects: Only pool objects > 4KB (allocation cost threshold)
- High frequency: Most effective for millions of allocations per second
Expected Performance Improvements
- Allocation rate: 80-95% reduction in allocations
- GC pause time: 50-80% reduction (fewer objects to scan)
- Throughput: 2-10x improvement (fewer allocation stalls)
- Latency p99: 50-70% improvement (GC pauses eliminated)
Decision Tree
Do you allocate millions of times per second?
├─ No → Pooling unlikely to help, use arena or direct allocation
└─ Yes → Continue
Is each allocation large (> 4KB)?
├─ No → Overhead exceeds benefit
└─ Yes → Continue
Are objects temporary (< 1 request/operation)?
├─ No → Use arena allocator instead
└─ Yes → Continue
Can you easily reset all object state?
├─ No → Risk of state leakage, use arena or improve design
└─ Yes → GOOD FIT for sync.Pool
Do you need bounded capacity?
├─ Yes → Use channel-based pool instead
└─ No → sync.Pool is idealUse sync.Pool judiciously in performance-critical code paths where profiling confirms allocation is a bottleneck. Measure before and after to verify 5-10x allocation reduction and corresponding latency improvements. Never use pooling as a substitute for fixing algorithmic inefficiencies.