Go Performance Guide
Ecosystem & Production

Container and Cloud Performance

Optimizing Go applications for containers and cloud environments — GOMAXPROCS in cgroups, Kubernetes resource tuning, Docker image optimization, serverless cold starts, and memory limit awareness.

Go applications running in containers and cloud environments face constraints fundamentally different from bare metal. Container resource limits, cgroup CPU throttling, and memory pressure create new performance considerations that require explicit configuration and awareness. This article explores how to optimize Go for containerized deployments, from Docker image size to Kubernetes resource allocation to serverless cold starts.

GOMAXPROCS in Containers: The CPU Bottleneck

One of the most common Go performance issues in containers stems from a simple mismatch: Go detects CPU count from the host, but runs inside a cgroup that may limit available CPUs.

The Problem: Host CPU Count vs Container Limit

package main

import (
    "fmt"
    "runtime"
)

func main() {
    // On a 16-core host, even in a 1-core container:
    fmt.Println(runtime.NumCPU()) // Prints: 16
    fmt.Println(runtime.GOMAXPROCS(-1)) // Prints: 16
}

The Go runtime spawns 16 goroutine schedulers (P structures), but the cgroup only allows 1 core's worth of CPU time. Result: massive contention, context switches, and throttling.

CPU Throttling with CFS Quota

Linux cgroups v1 uses the Completely Fair Scheduler (CFS) quota mechanism:

# cgroup v1 cpu limits
cat /sys/fs/cgroup/cpu/docker/container-id/cpu.cfs_quota_us    # 100000 (0.1s per 100ms)
cat /sys/fs/cgroup/cpu/docker/container-id/cpu.cfs_period_us   # 100000 (100ms period)

# Effective CPU limit: quota / period = 100000 / 100000 = 1 core

With GOMAXPROCS=16 but only 1 core of quota, the kernel throttles goroutines aggressively:

Time 0-100ms:    Run ~6 goroutines (1 core capacity)
Time 100ms:      Cgroup quota exhausted, ALL goroutines blocked
Time 200ms:      Quota refreshes, resume for ~100ms
Result:          Bursty, unpredictable latency

uber-go/automaxprocs: Automatic Detection

The automaxprocs library solves this by reading cgroup limits and adjusting GOMAXPROCS:

import _ "go.uber.org/automaxprocs"

That's it. In your main.go, blank-import the package:

package main

import (
    _ "go.uber.org/automaxprocs"
    "fmt"
    "runtime"
)

func main() {
    // In a 1-core container:
    fmt.Println(runtime.GOMAXPROCS(-1)) // Prints: 1 (adjusted!)
    // Runs smoothly without contention
}

Internally, automaxprocs reads cgroup files:

// Simplified automaxprocs logic
func detectCgroupLimit() int {
    // cgroup v2
    if cpuset, ok := readCgroupV2("cpuset.cpus"); ok {
        return parseCPUSet(cpuset)
    }
    // cgroup v1
    if quota, ok := readCgroupV1("cpu/cpu.cfs_quota_us"); ok {
        if period, ok := readCgroupV1("cpu/cpu.cfs_period_us"); ok {
            limit := quota / period
            if limit > 0 {
                return limit
            }
        }
    }
    return runtime.NumCPU() // Fallback
}

Go 1.25: Built-in Cgroup Awareness

Starting with Go 1.25, the runtime automatically detects cgroup CPU limits without external libraries:

// Go 1.25+: GOMAXPROCS automatically respects cgroup limits
package main

import "runtime"

func main() {
    // No import needed; runtime.GOMAXPROCS already adjusted
    // by the scheduler based on cgroup cpu.cfs_quota_us
}

Migration path:

  • Go 1.24 and earlier: Use uber-go/automaxprocs
  • Go 1.25+: Automatic, but automaxprocs still works (no harm)

Impact: Benchmarks

A CPU-bound workload in a 1-core container with GOMAXPROCS=16 vs adjusted GOMAXPROCS=1:

Scenario: Process 10,000 items, compute-heavy
Baseline (GOMAXPROCS=16):
    Throughput:  4,200 items/sec (throttled, many context switches)
    Latency p99: 850ms
    CPU usage:   ~140% (exceeds 100% due to kernel accounting)

With automaxprocs (GOMAXPROCS=1):
    Throughput:  8,500 items/sec
    Latency p99: 120ms
    CPU usage:   ~98%

2x improvement in throughput, 7x improvement in latency.

GOMEMLIMIT: Soft Memory Limits (Go 1.19+)

Go 1.19 introduced GOMEMLIMIT, a soft limit on heap memory that influences GC behavior.

How GOMEMLIMIT Affects GC

package main

import (
    "fmt"
    "runtime/debug"
)

func main() {
    // Set soft memory limit to 512MB
    debug.SetGCPercent(100)
    debug.SetMaxStack(256 * 1024 * 1024) // 256MB stack

    // GOMEMLIMIT acts as a soft target
    // GC triggers more frequently when approaching limit
}

Without GOMEMLIMIT:

  • GC runs based on GOGC percentage (default 100)
  • Heap can grow unbounded
  • Under high allocation load, GC pauses become massive

With GOMEMLIMIT:

  • GC targets staying under the limit
  • Trades more frequent GC pauses for bounded heap
  • Prevents OOM kills in container environments

Setting GOMEMLIMIT Relative to Container Memory

In Kubernetes with 512MB container limit:

# Set GOMEMLIMIT to 80% of container memory
export GOMEMLIMIT=409600000  # 390MB in bytes
export GOGC=100

# Or use the automemlimit library

The remaining 20% is for:

  • Go runtime internals
  • Stack space for goroutines
  • Buffer pools and caches
  • OS page cache

Interaction with GOGC

# Aggressive: More frequent GC, lower peak heap
GOMEMLIMIT=400M GOGC=50   # GC at 50% of limit

# Balanced: Default behavior
GOMEMLIMIT=400M GOGC=100  # GC when actual > (last_live * 1.0)

# Relaxed: Fewer GC pauses, higher peak
GOMEMLIMIT=400M GOGC=200  # GC when actual > (last_live * 2.0)

Preventing OOM Kills

The dangerous pattern: no limits.

# BAD: No memory limit
docker run myapp

# With limits but no GOMEMLIMIT:
# Container memory limit: 512MB
# Go heap grows freely until... OOM kill

# GOOD: Aligned limits
docker run -m 512m -e GOMEMLIMIT=400M myapp
# OOM becomes unlikely; GC manages memory proactively

automemlimit Library

import _ "github.com/KimMachineGun/automemlimit"

automemlimit reads cgroup memory limits and sets GOMEMLIMIT automatically:

package main

import (
    _ "github.com/KimMachineGun/automemlimit"
    "fmt"
    "runtime/debug"
)

func main() {
    // In a 512MB container:
    // automemlimit sets GOMEMLIMIT to ~450MB automatically
    limit := debug.GCStats{}.PauseQuantiles[5]
    fmt.Println("Memory managed automatically")
}

Kubernetes Resource Tuning for Go

Requests vs Limits

In Kubernetes:

apiVersion: v1
kind: Pod
metadata:
  name: go-app
spec:
  containers:
  - name: app
    image: myapp:latest
    resources:
      requests:
        cpu: "500m"      # Scheduling guarantee
        memory: "256Mi"  # Scheduling guarantee
      limits:
        cpu: "1000m"     # Hard cap
        memory: "512Mi"  # Hard cap

requests: Used by scheduler; Go should perform well at this level. limits: Enforced by kernel; exceeding triggers throttling or OOM.

CPU Limits Cause Throttling

This is counterintuitive but critical:

# This config causes throttling even with idle cores!
resources:
  limits:
    cpu: "1000m"  # Hard cap at 1 core

With GOMAXPROCS=4 (4-core host), the runtime schedules 4 goroutine workers, but the cgroup throttles to 1 core. The other 3 cores sit idle.

Timeline:
Time 0-25ms:     One P gets scheduled, uses ~1 core
Time 25-50ms:    Throttle active, all Ps waiting
Time 50-75ms:    Scheduling, one P goes again

Solution: Align GOMAXPROCS to the limit:

# Option 1: Use automaxprocs (respects limits)
# Option 2: Set explicitly
export GOMAXPROCS=1  # Match the 1-core limit

# Option 3: Go 1.25+: Automatic

Memory Limits vs GOMEMLIMIT Alignment

resources:
  limits:
    memory: "512Mi"

# In deployment:
env:
- name: GOMEMLIMIT
  value: "409600000"  # ~390MB (80% of 512MB limit)

The gap (512MB - 390MB = 122MB) protects against:

  • Allocations between GC checks
  • Runtime overhead
  • Sudden spikes

Pod QoS Classes Impact on Scheduling

Kubernetes assigns QoS classes based on request/limit ratios:

# Guaranteed: requests == limits (highest priority)
resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "512Mi"
    cpu: "500m"

# Burstable: requests < limits (medium priority)
resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

# BestEffort: no requests/limits (lowest priority, evicted first)

For Go services, prefer Guaranteed QoS. During node pressure, Guaranteed pods are never evicted.

Readiness/Liveness Probes: Zero-Allocation Health Checks

Health check endpoints should be fast and non-allocating:

package main

import (
    "net/http"
)

var (
    readyResponse = []byte("OK\n")
    aliveResponse = []byte("OK\n")
)

func main() {
    // Pre-allocate response bodies
    http.HandleFunc("/ready", func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Content-Type", "text/plain")
        w.WriteHeader(http.StatusOK)
        w.Write(readyResponse) // No allocation
    })

    http.HandleFunc("/live", func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Content-Type", "text/plain")
        w.WriteHeader(http.StatusOK)
        w.Write(aliveResponse) // No allocation
    })

    http.ListenAndServe(":8080", nil)
}
apiVersion: v1
kind: Pod
metadata:
  name: go-app
spec:
  containers:
  - name: app
    image: myapp:latest
    livenessProbe:
      httpGet:
        path: /live
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 10

    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5

Vertical Pod Autoscaler and Go Memory Patterns

VPA observes actual memory usage and recommends resource adjustments:

kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: go-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: go-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        memory: "128Mi"
      maxAllowed:
        memory: "2Gi"
EOF

VPA works well with Go because Go's GC stabilizes memory at predictable levels. Once VPA observes a few days of traffic, it recommends accurate limits.

Docker Image Optimization

Multi-stage Builds: Builder vs Runtime

A naive Dockerfile includes build tooling in the final image:

# BAD: Bloated image
FROM golang:1.22
WORKDIR /app
COPY . .
RUN go build -o app .
EXPOSE 8080
ENTRYPOINT ["/app"]

This produces a 1.5GB image (Go toolchain included).

Better: Multi-stage build

# Stage 1: Builder
FROM golang:1.22 AS builder
WORKDIR /app
COPY go.mod go.sum .
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build \
    -ldflags="-w -s" \
    -o /tmp/app .

# Stage 2: Runtime (minimal)
FROM scratch
COPY --from=builder /tmp/app /app
EXPOSE 8080
ENTRYPOINT ["/app"]

Final image size: 15MB (builder 1.5GB + scratch ~8MB).

Explanation:

  • CGO_ENABLED=0: Disable C dependencies (don't need libc in scratch)
  • -ldflags="-w -s": Remove debug symbols (saves ~30%)
  • FROM scratch: No base OS, just the binary

Base Image Choices

# scratch: Absolute minimum (just binary)
FROM scratch
COPY --from=builder /tmp/app /app
# Size: ~8MB binary
# Pros: Tiny, attack surface minimal
# Cons: No shell, no tools, can't exec into container
# distroless: Security-focused minimal (includes CA certs, timezone data)
FROM gcr.io/distroless/base-debian12
COPY --from=builder /tmp/app /app
# Size: ~20MB (includes libc, CA certs)
# Pros: Small, has essentials, safer
# Cons: Can't debug (no shell)
# alpine: Small with package manager (musl libc)
FROM alpine:3.19
RUN apk add --no-cache ca-certificates tzdata
COPY --from=builder /tmp/app /app
# Size: ~30MB
# Pros: Package manager, shell, tools
# Cons: musl vs glibc differences (rare issues)

Recommendation: Use distroless for production (security + small), alpine for development (easier debugging).

Static Compilation

Ensure the binary runs on any system:

CGO_ENABLED=0 go build -o app .

# Verify statically linked
ldd ./app
    # Should print: "not a dynamic executable" (good!)
    # NOT: "linux-vdso.so.1", "libc.so.6" (bad, dynamic linking)

Dynamic linking in containers causes:

  • Base image mismatches (alpine's musl vs ubuntu's glibc)
  • Dependency version conflicts
  • Runtime errors in production

Always build with CGO_ENABLED=0 for container deployments.

UPX Compression: Size vs Startup Tradeoff

UPX compresses the binary executable:

# Build normally
CGO_ENABLED=0 go build -o app .
ls -lh app  # 8MB

# Compress with UPX
upx -9 app
ls -lh app  # 2.5MB (68% smaller!)

Tradeoff:

Uncompressed:
    Container image: 15MB
    Startup time: 5ms
    Memory: 50MB (loaded into RAM)

UPX compressed:
    Container image: 10MB
    Startup time: 25ms (decompression overhead)
    Memory: 51MB (decompress into memory)

UPX helps with image pull time and registry storage, but adds startup latency. For serverless, avoid UPX. For long-running services, it's neutral.

Layer Caching: Ordering Dockerfile Instructions

Docker builds are layer-cached. If you modify source code, all subsequent layers rebuild:

# BAD: Copy source early
FROM golang:1.22 AS builder
WORKDIR /app
COPY . .                        # Layer cache: busted on any file change
RUN go mod download
RUN go build -o app .

On every code change, go mod download re-runs (unnecessary).

# GOOD: Copy dependencies first, source later
FROM golang:1.22 AS builder
WORKDIR /app
COPY go.mod go.sum .           # Layer cache: busted only on dependency changes
RUN go mod download

COPY . .                        # Layer cache: busted on code changes
RUN go build -o app .

With this order, changing only source code skips the RUN go mod download cache layer.

In CI, this saves 10-30 seconds per build.

.dockerignore Best Practices

# .dockerignore
.git
.gitignore
README.md
.github/
*.test
*.tmp
/vendor (if using vendor/)
.DS_Store

Excluding unnecessary files speeds up COPY . . and reduces build context.

Serverless / Lambda Performance

Cold Start Optimization

A Lambda cold start includes:

  1. Container initialization
  2. Runtime initialization
  3. Application startup
  4. First request execution
# Cold start timeline (milliseconds)
Container overhead:           100ms
Go runtime startup:           50ms
Application init:             100ms
First request handling:       200ms
Total cold start:             450ms

Optimization strategies:

1. Binary Size Reduction

# Normal build
go build -o bootstrap .
ls -lh bootstrap     # 12MB

# Optimized
CGO_ENABLED=0 go build \
    -ldflags="-w -s -X main.Version=prod" \
    -trimpath \
    -o bootstrap .
ls -lh bootstrap     # 8.5MB (smaller = faster pull/decompress)

2. Fast Initialization

// BAD: Initialization in request handler
func handleRequest(ctx context.Context, event any) (string, error) {
    db := connectToDatabase()   // Every request: connection, parsing
    cache := initCache()

    return "ok", nil
}

// GOOD: Initialize at cold start
var (
    db *sql.DB
    cache *Cache
)

func init() {
    db = connectToDatabase()
    cache = initCache()
}

func handleRequest(ctx context.Context, event any) (string, error) {
    // Reuse existing connections
    return "ok", nil
}

Keep-Alive Strategies

Lambda provisioned concurrency keeps containers warm between requests:

# AWS SAM template
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  GoFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: bootstrap
      Runtime: provided.al2
      CodeUri: .
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 5
      # Keeps 5 containers warm, no cold starts

This is expensive but eliminates cold start latency.

Connection Pooling in Serverless

Wrong: Create connection per request

func handleRequest(ctx context.Context, event any) (string, error) {
    conn, err := net.Dial("tcp", "db.internal:5432")
    defer conn.Close() // Closed after every request!
    // Expensive: TCP handshake, TLS, query parsing
}

Right: Reuse connection across invocations

var dbConn *sql.DB

func init() {
    // Initialized once at cold start
    dbConn, _ = sql.Open("postgres", "...")
    dbConn.SetMaxOpenConns(2)  // Serverless: small pool
    dbConn.SetMaxIdleConns(1)
}

func handleRequest(ctx context.Context, event any) (string, error) {
    row := dbConn.QueryRowContext(ctx, "SELECT ...")
    // Reuses existing connection: fast!
}

AWS Lambda Go Runtime vs provided.al2

# Using provided.al2 (recommended for Go)
FROM public.ecr.aws/lambda/provided:al2 as builder
WORKDIR ${LAMBDA_TASK_ROOT}

COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
    -ldflags="-w -s" \
    -o bootstrap main.go

FROM public.ecr.aws/lambda/provided:al2
COPY --from=builder ${LAMBDA_TASK_ROOT}/bootstrap ${LAMBDA_TASK_ROOT}/
CMD [ "bootstrap" ]

The provided.al2 runtime is lightweight and purpose-built for custom Go handlers. No bloat, just the Go binary.

Memory Sizing

In Lambda, more memory = more CPU = faster execution:

Memory    vCPU     Cost/month
128MB     0.013    $2.08
256MB     0.026    $4.17
512MB     0.053    $8.34
1024MB    0.106    $16.67

For a 100ms execution time:

1024MB:   100ms execution  →  $0.0000167 per invocation
512MB:    200ms execution  →  $0.0000167 per invocation
256MB:    400ms execution  →  $0.0000167 per invocation

Same cost, but 1024MB finishes 2x faster (better UX). Size for P99 latency.

Resource-Aware Go Applications

Reading Cgroup Limits Programmatically

import "runtime/cgroups"

func readCgroupLimit() (int64, error) {
    // Read cgroup v2 memory limit
    limit, err := os.ReadFile("/sys/fs/cgroup/memory.max")
    if err == nil {
        val, _ := strconv.ParseInt(strings.TrimSpace(string(limit)), 10, 64)
        return val, nil
    }

    // Fallback to cgroup v1
    limit, err := os.ReadFile("/sys/fs/cgroup/memory/memory.limit_in_bytes")
    if err == nil {
        val, _ := strconv.ParseInt(strings.TrimSpace(string(limit)), 10, 64)
        return val, nil
    }

    return 0, err
}

Alternatively, use a library:

import "github.com/containerd/cgroups/v3"

func getMemoryLimit(ctx context.Context) (int64, error) {
    stats, err := cgroups.Load(cgroups.V1, cgroups.PidPath(os.Getpid()))
    if err != nil {
        return 0, err
    }
    return stats.Memory.MemoryLimit, nil
}

Adaptive Worker Pools Based on Resources

package main

import (
    "runtime"
    "sync"
)

type WorkerPool struct {
    workers int
    tasks   chan Task
}

func NewWorkerPool() *WorkerPool {
    numWorkers := runtime.GOMAXPROCS(-1)
    // Scale based on detected CPU
    if numWorkers < 1 {
        numWorkers = 1
    }
    if numWorkers > 128 {
        numWorkers = 128 // Cap at reasonable limit
    }

    wp := &WorkerPool{
        workers: numWorkers,
        tasks:   make(chan Task, numWorkers*2),
    }
    wp.start()
    return wp
}

func (wp *WorkerPool) start() {
    for i := 0; i < wp.workers; i++ {
        go wp.worker()
    }
}

func (wp *WorkerPool) worker() {
    for task := range wp.tasks {
        task.Execute()
    }
}

Graceful Shutdown

Ensure in-flight requests complete before exit:

package main

import (
    "context"
    "net/http"
    "os"
    "os/signal"
    "sync"
    "syscall"
    "time"
)

func main() {
    server := &http.Server{Addr: ":8080"}

    var wg sync.WaitGroup

    // Start server in goroutine
    wg.Add(1)
    go func() {
        defer wg.Done()
        server.ListenAndServe()
    }()

    // Wait for shutdown signal
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGTERM, syscall.SIGINT)
    <-sigChan

    // Graceful shutdown: 30 second timeout
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    server.Shutdown(ctx) // Stops accepting new connections
    wg.Wait()            // Wait for existing requests to finish
}

This ensures Kubernetes can drain pods cleanly during rolling updates.

Health Check Endpoints

type HealthChecker struct {
    mu sync.RWMutex
    up bool
}

func (h *HealthChecker) Ready(w http.ResponseWriter, r *http.Request) {
    h.mu.RLock()
    ready := h.up
    h.mu.RUnlock()

    if !ready {
        w.WriteHeader(http.StatusServiceUnavailable)
        w.Write([]byte("not ready"))
        return
    }

    w.WriteHeader(http.StatusOK)
    w.Write([]byte("ok"))
}

Set up=true only after initialization completes. Kubernetes waits for readiness probe to pass before routing traffic.

Monitoring in Containers

Runtime Metrics

Export Go runtime metrics for observability:

import (
    "net/http"
    _ "net/http/pprof"
    "runtime"
)

func metricsHandler(w http.ResponseWriter, r *http.Request) {
    var m runtime.MemStats
    runtime.ReadMemStats(&m)

    w.Header().Set("Content-Type", "text/plain")
    fmt.Fprintf(w, "go_goroutines %d\n", runtime.NumGoroutine())
    fmt.Fprintf(w, "go_gc_runs %d\n", m.NumGC)
    fmt.Fprintf(w, "go_memory_heap_bytes %d\n", m.Alloc)
    fmt.Fprintf(w, "go_memory_heap_max_bytes %d\n", m.TotalAlloc)
}

Prometheus /metrics Endpoint

import "github.com/prometheus/client_golang/prometheus/promhttp"

func main() {
    http.Handle("/metrics", promhttp.Handler())
    http.ListenAndServe(":8080", nil)
}

Prometheus scrapes this endpoint and provides Go metrics in dashboards.

pprof in Containers

Enable safely with authentication:

import (
    "net"
    "net/http"
    _ "net/http/pprof"
)

func init() {
    // Bind pprof to localhost only (not exposed)
    go func() {
        listener, _ := net.Listen("tcp", "127.0.0.1:6060")
        http.Serve(listener, nil)
    }()
}

In production, port-forward to access:

kubectl port-forward deployment/go-app 6060:6060
# Then locally: go tool pprof http://localhost:6060/debug/pprof/heap

Practical Configuration Examples

Dockerfile (Optimized)

# Multi-stage build for minimal size and startup
FROM golang:1.22 AS builder
WORKDIR /build

# Download dependencies
COPY go.mod go.sum ./
RUN go mod download

# Build application
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
    -ldflags="-w -s -X main.Version=$(git describe --tags)" \
    -trimpath \
    -o /tmp/app ./cmd/main.go

# Runtime stage: distroless base
FROM gcr.io/distroless/base-debian12:nonroot
COPY --from=builder /tmp/app /usr/local/bin/app

# Health check
HEALTHCHECK --interval=10s --timeout=3s --start-period=5s \
    CMD ["/usr/local/bin/app", "-health-check"]

ENTRYPOINT ["/usr/local/bin/app"]

Kubernetes Deployment (Resource-Aware)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: go-api
  template:
    metadata:
      labels:
        app: go-api
    spec:
      containers:
      - name: app
        image: myregistry.azurecr.io/go-api:v1.2.3
        imagePullPolicy: IfNotPresent

        # Resource allocation
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"

        # Environment: Auto-tuned for container limits
        env:
        - name: GOMAXPROCS
          valueFrom:
            resourceFieldRef:
              containerName: app
              resource: limits.cpu
              divisor: "1m"
        - name: GOMEMLIMIT
          value: "400MiB"
        - name: GOGC
          value: "100"

        # Probes
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 10

        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

        # Graceful shutdown
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 5"]

      terminationGracePeriodSeconds: 30

Lambda Handler (Go)

package main

import (
    "context"
    "fmt"
    "github.com/aws/aws-lambda-go/events"
    "github.com/aws/aws-lambda-go/lambda"
)

var (
    // Initialized once at cold start
    initialized = false
)

func init() {
    // Long-running initialization happens here
    // Database connections, cache initialization, etc.
    fmt.Println("Initializing...")
    time.Sleep(100 * time.Millisecond) // Simulate work
    initialized = true
}

func handler(ctx context.Context, event events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {
    return events.APIGatewayProxyResponse{
        StatusCode: 200,
        Body:       "Hello from Lambda",
    }, nil
}

func main() {
    lambda.Start(handler)
}

Build and deploy:

GOOS=linux GOARCH=amd64 go build -o bootstrap main.go
zip function.zip bootstrap
aws lambda update-function-code --function-name myFunction --zip-file fileb://function.zip

Production Checklist

  • Use Go 1.25+ for automatic cgroup CPU detection, else use automaxprocs
  • Set GOMEMLIMIT to 80% of container memory limit
  • Use multi-stage Docker builds with distroless or scratch base
  • Build with CGO_ENABLED=0 for maximum portability
  • Configure Kubernetes requests/limits with Guaranteed QoS
  • Implement health check endpoints (zero-allocation)
  • Add graceful shutdown with terminationGracePeriodSeconds
  • Enable runtime metrics for observability
  • Test locally with container resource limits (docker run --cpus, --memory)
  • Monitor GC behavior; adjust GOGC if needed

Conclusion

Optimizing Go for containers requires understanding resource constraints and configuring the runtime to match them. From automatic GOMAXPROCS detection to GOMEMLIMIT alignment, from multi-stage Docker builds to Kubernetes resource tuning, each optimization compounds. A properly tuned containerized Go application performs comparably to bare metal while gaining the operational benefits of containers and cloud platforms.

The key insight: Go doesn't automatically adapt to container constraints; you must configure it explicitly. With the patterns in this article, you'll build robust, efficient, cloud-native Go services.

On this page