Container and Cloud Performance
Optimizing Go applications for containers and cloud environments — GOMAXPROCS in cgroups, Kubernetes resource tuning, Docker image optimization, serverless cold starts, and memory limit awareness.
Go applications running in containers and cloud environments face constraints fundamentally different from bare metal. Container resource limits, cgroup CPU throttling, and memory pressure create new performance considerations that require explicit configuration and awareness. This article explores how to optimize Go for containerized deployments, from Docker image size to Kubernetes resource allocation to serverless cold starts.
GOMAXPROCS in Containers: The CPU Bottleneck
One of the most common Go performance issues in containers stems from a simple mismatch: Go detects CPU count from the host, but runs inside a cgroup that may limit available CPUs.
The Problem: Host CPU Count vs Container Limit
package main
import (
"fmt"
"runtime"
)
func main() {
// On a 16-core host, even in a 1-core container:
fmt.Println(runtime.NumCPU()) // Prints: 16
fmt.Println(runtime.GOMAXPROCS(-1)) // Prints: 16
}The Go runtime spawns 16 goroutine schedulers (P structures), but the cgroup only allows 1 core's worth of CPU time. Result: massive contention, context switches, and throttling.
CPU Throttling with CFS Quota
Linux cgroups v1 uses the Completely Fair Scheduler (CFS) quota mechanism:
# cgroup v1 cpu limits
cat /sys/fs/cgroup/cpu/docker/container-id/cpu.cfs_quota_us # 100000 (0.1s per 100ms)
cat /sys/fs/cgroup/cpu/docker/container-id/cpu.cfs_period_us # 100000 (100ms period)
# Effective CPU limit: quota / period = 100000 / 100000 = 1 coreWith GOMAXPROCS=16 but only 1 core of quota, the kernel throttles goroutines aggressively:
Time 0-100ms: Run ~6 goroutines (1 core capacity)
Time 100ms: Cgroup quota exhausted, ALL goroutines blocked
Time 200ms: Quota refreshes, resume for ~100ms
Result: Bursty, unpredictable latencyuber-go/automaxprocs: Automatic Detection
The automaxprocs library solves this by reading cgroup limits and adjusting GOMAXPROCS:
import _ "go.uber.org/automaxprocs"That's it. In your main.go, blank-import the package:
package main
import (
_ "go.uber.org/automaxprocs"
"fmt"
"runtime"
)
func main() {
// In a 1-core container:
fmt.Println(runtime.GOMAXPROCS(-1)) // Prints: 1 (adjusted!)
// Runs smoothly without contention
}Internally, automaxprocs reads cgroup files:
// Simplified automaxprocs logic
func detectCgroupLimit() int {
// cgroup v2
if cpuset, ok := readCgroupV2("cpuset.cpus"); ok {
return parseCPUSet(cpuset)
}
// cgroup v1
if quota, ok := readCgroupV1("cpu/cpu.cfs_quota_us"); ok {
if period, ok := readCgroupV1("cpu/cpu.cfs_period_us"); ok {
limit := quota / period
if limit > 0 {
return limit
}
}
}
return runtime.NumCPU() // Fallback
}Go 1.25: Built-in Cgroup Awareness
Starting with Go 1.25, the runtime automatically detects cgroup CPU limits without external libraries:
// Go 1.25+: GOMAXPROCS automatically respects cgroup limits
package main
import "runtime"
func main() {
// No import needed; runtime.GOMAXPROCS already adjusted
// by the scheduler based on cgroup cpu.cfs_quota_us
}Migration path:
- Go 1.24 and earlier: Use
uber-go/automaxprocs - Go 1.25+: Automatic, but automaxprocs still works (no harm)
Impact: Benchmarks
A CPU-bound workload in a 1-core container with GOMAXPROCS=16 vs adjusted GOMAXPROCS=1:
Scenario: Process 10,000 items, compute-heavy
Baseline (GOMAXPROCS=16):
Throughput: 4,200 items/sec (throttled, many context switches)
Latency p99: 850ms
CPU usage: ~140% (exceeds 100% due to kernel accounting)
With automaxprocs (GOMAXPROCS=1):
Throughput: 8,500 items/sec
Latency p99: 120ms
CPU usage: ~98%2x improvement in throughput, 7x improvement in latency.
GOMEMLIMIT: Soft Memory Limits (Go 1.19+)
Go 1.19 introduced GOMEMLIMIT, a soft limit on heap memory that influences GC behavior.
How GOMEMLIMIT Affects GC
package main
import (
"fmt"
"runtime/debug"
)
func main() {
// Set soft memory limit to 512MB
debug.SetGCPercent(100)
debug.SetMaxStack(256 * 1024 * 1024) // 256MB stack
// GOMEMLIMIT acts as a soft target
// GC triggers more frequently when approaching limit
}Without GOMEMLIMIT:
- GC runs based on GOGC percentage (default 100)
- Heap can grow unbounded
- Under high allocation load, GC pauses become massive
With GOMEMLIMIT:
- GC targets staying under the limit
- Trades more frequent GC pauses for bounded heap
- Prevents OOM kills in container environments
Setting GOMEMLIMIT Relative to Container Memory
In Kubernetes with 512MB container limit:
# Set GOMEMLIMIT to 80% of container memory
export GOMEMLIMIT=409600000 # 390MB in bytes
export GOGC=100
# Or use the automemlimit libraryThe remaining 20% is for:
- Go runtime internals
- Stack space for goroutines
- Buffer pools and caches
- OS page cache
Interaction with GOGC
# Aggressive: More frequent GC, lower peak heap
GOMEMLIMIT=400M GOGC=50 # GC at 50% of limit
# Balanced: Default behavior
GOMEMLIMIT=400M GOGC=100 # GC when actual > (last_live * 1.0)
# Relaxed: Fewer GC pauses, higher peak
GOMEMLIMIT=400M GOGC=200 # GC when actual > (last_live * 2.0)Preventing OOM Kills
The dangerous pattern: no limits.
# BAD: No memory limit
docker run myapp
# With limits but no GOMEMLIMIT:
# Container memory limit: 512MB
# Go heap grows freely until... OOM kill
# GOOD: Aligned limits
docker run -m 512m -e GOMEMLIMIT=400M myapp
# OOM becomes unlikely; GC manages memory proactivelyautomemlimit Library
import _ "github.com/KimMachineGun/automemlimit"automemlimit reads cgroup memory limits and sets GOMEMLIMIT automatically:
package main
import (
_ "github.com/KimMachineGun/automemlimit"
"fmt"
"runtime/debug"
)
func main() {
// In a 512MB container:
// automemlimit sets GOMEMLIMIT to ~450MB automatically
limit := debug.GCStats{}.PauseQuantiles[5]
fmt.Println("Memory managed automatically")
}Kubernetes Resource Tuning for Go
Requests vs Limits
In Kubernetes:
apiVersion: v1
kind: Pod
metadata:
name: go-app
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
cpu: "500m" # Scheduling guarantee
memory: "256Mi" # Scheduling guarantee
limits:
cpu: "1000m" # Hard cap
memory: "512Mi" # Hard caprequests: Used by scheduler; Go should perform well at this level. limits: Enforced by kernel; exceeding triggers throttling or OOM.
CPU Limits Cause Throttling
This is counterintuitive but critical:
# This config causes throttling even with idle cores!
resources:
limits:
cpu: "1000m" # Hard cap at 1 coreWith GOMAXPROCS=4 (4-core host), the runtime schedules 4 goroutine workers, but the cgroup throttles to 1 core. The other 3 cores sit idle.
Timeline:
Time 0-25ms: One P gets scheduled, uses ~1 core
Time 25-50ms: Throttle active, all Ps waiting
Time 50-75ms: Scheduling, one P goes againSolution: Align GOMAXPROCS to the limit:
# Option 1: Use automaxprocs (respects limits)
# Option 2: Set explicitly
export GOMAXPROCS=1 # Match the 1-core limit
# Option 3: Go 1.25+: AutomaticMemory Limits vs GOMEMLIMIT Alignment
resources:
limits:
memory: "512Mi"
# In deployment:
env:
- name: GOMEMLIMIT
value: "409600000" # ~390MB (80% of 512MB limit)The gap (512MB - 390MB = 122MB) protects against:
- Allocations between GC checks
- Runtime overhead
- Sudden spikes
Pod QoS Classes Impact on Scheduling
Kubernetes assigns QoS classes based on request/limit ratios:
# Guaranteed: requests == limits (highest priority)
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "500m"
# Burstable: requests < limits (medium priority)
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
# BestEffort: no requests/limits (lowest priority, evicted first)For Go services, prefer Guaranteed QoS. During node pressure, Guaranteed pods are never evicted.
Readiness/Liveness Probes: Zero-Allocation Health Checks
Health check endpoints should be fast and non-allocating:
package main
import (
"net/http"
)
var (
readyResponse = []byte("OK\n")
aliveResponse = []byte("OK\n")
)
func main() {
// Pre-allocate response bodies
http.HandleFunc("/ready", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/plain")
w.WriteHeader(http.StatusOK)
w.Write(readyResponse) // No allocation
})
http.HandleFunc("/live", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/plain")
w.WriteHeader(http.StatusOK)
w.Write(aliveResponse) // No allocation
})
http.ListenAndServe(":8080", nil)
}apiVersion: v1
kind: Pod
metadata:
name: go-app
spec:
containers:
- name: app
image: myapp:latest
livenessProbe:
httpGet:
path: /live
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5Vertical Pod Autoscaler and Go Memory Patterns
VPA observes actual memory usage and recommends resource adjustments:
kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: go-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: go-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
memory: "128Mi"
maxAllowed:
memory: "2Gi"
EOFVPA works well with Go because Go's GC stabilizes memory at predictable levels. Once VPA observes a few days of traffic, it recommends accurate limits.
Docker Image Optimization
Multi-stage Builds: Builder vs Runtime
A naive Dockerfile includes build tooling in the final image:
# BAD: Bloated image
FROM golang:1.22
WORKDIR /app
COPY . .
RUN go build -o app .
EXPOSE 8080
ENTRYPOINT ["/app"]This produces a 1.5GB image (Go toolchain included).
Better: Multi-stage build
# Stage 1: Builder
FROM golang:1.22 AS builder
WORKDIR /app
COPY go.mod go.sum .
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build \
-ldflags="-w -s" \
-o /tmp/app .
# Stage 2: Runtime (minimal)
FROM scratch
COPY --from=builder /tmp/app /app
EXPOSE 8080
ENTRYPOINT ["/app"]Final image size: 15MB (builder 1.5GB + scratch ~8MB).
Explanation:
CGO_ENABLED=0: Disable C dependencies (don't need libc in scratch)-ldflags="-w -s": Remove debug symbols (saves ~30%)FROM scratch: No base OS, just the binary
Base Image Choices
# scratch: Absolute minimum (just binary)
FROM scratch
COPY --from=builder /tmp/app /app
# Size: ~8MB binary
# Pros: Tiny, attack surface minimal
# Cons: No shell, no tools, can't exec into container# distroless: Security-focused minimal (includes CA certs, timezone data)
FROM gcr.io/distroless/base-debian12
COPY --from=builder /tmp/app /app
# Size: ~20MB (includes libc, CA certs)
# Pros: Small, has essentials, safer
# Cons: Can't debug (no shell)# alpine: Small with package manager (musl libc)
FROM alpine:3.19
RUN apk add --no-cache ca-certificates tzdata
COPY --from=builder /tmp/app /app
# Size: ~30MB
# Pros: Package manager, shell, tools
# Cons: musl vs glibc differences (rare issues)Recommendation: Use distroless for production (security + small), alpine for development (easier debugging).
Static Compilation
Ensure the binary runs on any system:
CGO_ENABLED=0 go build -o app .
# Verify statically linked
ldd ./app
# Should print: "not a dynamic executable" (good!)
# NOT: "linux-vdso.so.1", "libc.so.6" (bad, dynamic linking)Dynamic linking in containers causes:
- Base image mismatches (alpine's musl vs ubuntu's glibc)
- Dependency version conflicts
- Runtime errors in production
Always build with CGO_ENABLED=0 for container deployments.
UPX Compression: Size vs Startup Tradeoff
UPX compresses the binary executable:
# Build normally
CGO_ENABLED=0 go build -o app .
ls -lh app # 8MB
# Compress with UPX
upx -9 app
ls -lh app # 2.5MB (68% smaller!)Tradeoff:
Uncompressed:
Container image: 15MB
Startup time: 5ms
Memory: 50MB (loaded into RAM)
UPX compressed:
Container image: 10MB
Startup time: 25ms (decompression overhead)
Memory: 51MB (decompress into memory)UPX helps with image pull time and registry storage, but adds startup latency. For serverless, avoid UPX. For long-running services, it's neutral.
Layer Caching: Ordering Dockerfile Instructions
Docker builds are layer-cached. If you modify source code, all subsequent layers rebuild:
# BAD: Copy source early
FROM golang:1.22 AS builder
WORKDIR /app
COPY . . # Layer cache: busted on any file change
RUN go mod download
RUN go build -o app .On every code change, go mod download re-runs (unnecessary).
# GOOD: Copy dependencies first, source later
FROM golang:1.22 AS builder
WORKDIR /app
COPY go.mod go.sum . # Layer cache: busted only on dependency changes
RUN go mod download
COPY . . # Layer cache: busted on code changes
RUN go build -o app .With this order, changing only source code skips the RUN go mod download cache layer.
In CI, this saves 10-30 seconds per build.
.dockerignore Best Practices
# .dockerignore
.git
.gitignore
README.md
.github/
*.test
*.tmp
/vendor (if using vendor/)
.DS_StoreExcluding unnecessary files speeds up COPY . . and reduces build context.
Serverless / Lambda Performance
Cold Start Optimization
A Lambda cold start includes:
- Container initialization
- Runtime initialization
- Application startup
- First request execution
# Cold start timeline (milliseconds)
Container overhead: 100ms
Go runtime startup: 50ms
Application init: 100ms
First request handling: 200ms
Total cold start: 450msOptimization strategies:
1. Binary Size Reduction
# Normal build
go build -o bootstrap .
ls -lh bootstrap # 12MB
# Optimized
CGO_ENABLED=0 go build \
-ldflags="-w -s -X main.Version=prod" \
-trimpath \
-o bootstrap .
ls -lh bootstrap # 8.5MB (smaller = faster pull/decompress)2. Fast Initialization
// BAD: Initialization in request handler
func handleRequest(ctx context.Context, event any) (string, error) {
db := connectToDatabase() // Every request: connection, parsing
cache := initCache()
return "ok", nil
}
// GOOD: Initialize at cold start
var (
db *sql.DB
cache *Cache
)
func init() {
db = connectToDatabase()
cache = initCache()
}
func handleRequest(ctx context.Context, event any) (string, error) {
// Reuse existing connections
return "ok", nil
}Keep-Alive Strategies
Lambda provisioned concurrency keeps containers warm between requests:
# AWS SAM template
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
GoFunction:
Type: AWS::Serverless::Function
Properties:
Handler: bootstrap
Runtime: provided.al2
CodeUri: .
ProvisionedConcurrencyConfig:
ProvisionedConcurrentExecutions: 5
# Keeps 5 containers warm, no cold startsThis is expensive but eliminates cold start latency.
Connection Pooling in Serverless
Wrong: Create connection per request
func handleRequest(ctx context.Context, event any) (string, error) {
conn, err := net.Dial("tcp", "db.internal:5432")
defer conn.Close() // Closed after every request!
// Expensive: TCP handshake, TLS, query parsing
}Right: Reuse connection across invocations
var dbConn *sql.DB
func init() {
// Initialized once at cold start
dbConn, _ = sql.Open("postgres", "...")
dbConn.SetMaxOpenConns(2) // Serverless: small pool
dbConn.SetMaxIdleConns(1)
}
func handleRequest(ctx context.Context, event any) (string, error) {
row := dbConn.QueryRowContext(ctx, "SELECT ...")
// Reuses existing connection: fast!
}AWS Lambda Go Runtime vs provided.al2
# Using provided.al2 (recommended for Go)
FROM public.ecr.aws/lambda/provided:al2 as builder
WORKDIR ${LAMBDA_TASK_ROOT}
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
-ldflags="-w -s" \
-o bootstrap main.go
FROM public.ecr.aws/lambda/provided:al2
COPY --from=builder ${LAMBDA_TASK_ROOT}/bootstrap ${LAMBDA_TASK_ROOT}/
CMD [ "bootstrap" ]The provided.al2 runtime is lightweight and purpose-built for custom Go handlers. No bloat, just the Go binary.
Memory Sizing
In Lambda, more memory = more CPU = faster execution:
Memory vCPU Cost/month
128MB 0.013 $2.08
256MB 0.026 $4.17
512MB 0.053 $8.34
1024MB 0.106 $16.67For a 100ms execution time:
1024MB: 100ms execution → $0.0000167 per invocation
512MB: 200ms execution → $0.0000167 per invocation
256MB: 400ms execution → $0.0000167 per invocationSame cost, but 1024MB finishes 2x faster (better UX). Size for P99 latency.
Resource-Aware Go Applications
Reading Cgroup Limits Programmatically
import "runtime/cgroups"
func readCgroupLimit() (int64, error) {
// Read cgroup v2 memory limit
limit, err := os.ReadFile("/sys/fs/cgroup/memory.max")
if err == nil {
val, _ := strconv.ParseInt(strings.TrimSpace(string(limit)), 10, 64)
return val, nil
}
// Fallback to cgroup v1
limit, err := os.ReadFile("/sys/fs/cgroup/memory/memory.limit_in_bytes")
if err == nil {
val, _ := strconv.ParseInt(strings.TrimSpace(string(limit)), 10, 64)
return val, nil
}
return 0, err
}Alternatively, use a library:
import "github.com/containerd/cgroups/v3"
func getMemoryLimit(ctx context.Context) (int64, error) {
stats, err := cgroups.Load(cgroups.V1, cgroups.PidPath(os.Getpid()))
if err != nil {
return 0, err
}
return stats.Memory.MemoryLimit, nil
}Adaptive Worker Pools Based on Resources
package main
import (
"runtime"
"sync"
)
type WorkerPool struct {
workers int
tasks chan Task
}
func NewWorkerPool() *WorkerPool {
numWorkers := runtime.GOMAXPROCS(-1)
// Scale based on detected CPU
if numWorkers < 1 {
numWorkers = 1
}
if numWorkers > 128 {
numWorkers = 128 // Cap at reasonable limit
}
wp := &WorkerPool{
workers: numWorkers,
tasks: make(chan Task, numWorkers*2),
}
wp.start()
return wp
}
func (wp *WorkerPool) start() {
for i := 0; i < wp.workers; i++ {
go wp.worker()
}
}
func (wp *WorkerPool) worker() {
for task := range wp.tasks {
task.Execute()
}
}Graceful Shutdown
Ensure in-flight requests complete before exit:
package main
import (
"context"
"net/http"
"os"
"os/signal"
"sync"
"syscall"
"time"
)
func main() {
server := &http.Server{Addr: ":8080"}
var wg sync.WaitGroup
// Start server in goroutine
wg.Add(1)
go func() {
defer wg.Done()
server.ListenAndServe()
}()
// Wait for shutdown signal
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGTERM, syscall.SIGINT)
<-sigChan
// Graceful shutdown: 30 second timeout
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
server.Shutdown(ctx) // Stops accepting new connections
wg.Wait() // Wait for existing requests to finish
}This ensures Kubernetes can drain pods cleanly during rolling updates.
Health Check Endpoints
type HealthChecker struct {
mu sync.RWMutex
up bool
}
func (h *HealthChecker) Ready(w http.ResponseWriter, r *http.Request) {
h.mu.RLock()
ready := h.up
h.mu.RUnlock()
if !ready {
w.WriteHeader(http.StatusServiceUnavailable)
w.Write([]byte("not ready"))
return
}
w.WriteHeader(http.StatusOK)
w.Write([]byte("ok"))
}Set up=true only after initialization completes. Kubernetes waits for readiness probe to pass before routing traffic.
Monitoring in Containers
Runtime Metrics
Export Go runtime metrics for observability:
import (
"net/http"
_ "net/http/pprof"
"runtime"
)
func metricsHandler(w http.ResponseWriter, r *http.Request) {
var m runtime.MemStats
runtime.ReadMemStats(&m)
w.Header().Set("Content-Type", "text/plain")
fmt.Fprintf(w, "go_goroutines %d\n", runtime.NumGoroutine())
fmt.Fprintf(w, "go_gc_runs %d\n", m.NumGC)
fmt.Fprintf(w, "go_memory_heap_bytes %d\n", m.Alloc)
fmt.Fprintf(w, "go_memory_heap_max_bytes %d\n", m.TotalAlloc)
}Prometheus /metrics Endpoint
import "github.com/prometheus/client_golang/prometheus/promhttp"
func main() {
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":8080", nil)
}Prometheus scrapes this endpoint and provides Go metrics in dashboards.
pprof in Containers
Enable safely with authentication:
import (
"net"
"net/http"
_ "net/http/pprof"
)
func init() {
// Bind pprof to localhost only (not exposed)
go func() {
listener, _ := net.Listen("tcp", "127.0.0.1:6060")
http.Serve(listener, nil)
}()
}In production, port-forward to access:
kubectl port-forward deployment/go-app 6060:6060
# Then locally: go tool pprof http://localhost:6060/debug/pprof/heapPractical Configuration Examples
Dockerfile (Optimized)
# Multi-stage build for minimal size and startup
FROM golang:1.22 AS builder
WORKDIR /build
# Download dependencies
COPY go.mod go.sum ./
RUN go mod download
# Build application
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
-ldflags="-w -s -X main.Version=$(git describe --tags)" \
-trimpath \
-o /tmp/app ./cmd/main.go
# Runtime stage: distroless base
FROM gcr.io/distroless/base-debian12:nonroot
COPY --from=builder /tmp/app /usr/local/bin/app
# Health check
HEALTHCHECK --interval=10s --timeout=3s --start-period=5s \
CMD ["/usr/local/bin/app", "-health-check"]
ENTRYPOINT ["/usr/local/bin/app"]Kubernetes Deployment (Resource-Aware)
apiVersion: apps/v1
kind: Deployment
metadata:
name: go-api
spec:
replicas: 3
selector:
matchLabels:
app: go-api
template:
metadata:
labels:
app: go-api
spec:
containers:
- name: app
image: myregistry.azurecr.io/go-api:v1.2.3
imagePullPolicy: IfNotPresent
# Resource allocation
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
# Environment: Auto-tuned for container limits
env:
- name: GOMAXPROCS
valueFrom:
resourceFieldRef:
containerName: app
resource: limits.cpu
divisor: "1m"
- name: GOMEMLIMIT
value: "400MiB"
- name: GOGC
value: "100"
# Probes
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
# Graceful shutdown
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
terminationGracePeriodSeconds: 30Lambda Handler (Go)
package main
import (
"context"
"fmt"
"github.com/aws/aws-lambda-go/events"
"github.com/aws/aws-lambda-go/lambda"
)
var (
// Initialized once at cold start
initialized = false
)
func init() {
// Long-running initialization happens here
// Database connections, cache initialization, etc.
fmt.Println("Initializing...")
time.Sleep(100 * time.Millisecond) // Simulate work
initialized = true
}
func handler(ctx context.Context, event events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {
return events.APIGatewayProxyResponse{
StatusCode: 200,
Body: "Hello from Lambda",
}, nil
}
func main() {
lambda.Start(handler)
}Build and deploy:
GOOS=linux GOARCH=amd64 go build -o bootstrap main.go
zip function.zip bootstrap
aws lambda update-function-code --function-name myFunction --zip-file fileb://function.zipProduction Checklist
- Use Go 1.25+ for automatic cgroup CPU detection, else use automaxprocs
- Set GOMEMLIMIT to 80% of container memory limit
- Use multi-stage Docker builds with distroless or scratch base
- Build with
CGO_ENABLED=0for maximum portability - Configure Kubernetes requests/limits with Guaranteed QoS
- Implement health check endpoints (zero-allocation)
- Add graceful shutdown with terminationGracePeriodSeconds
- Enable runtime metrics for observability
- Test locally with container resource limits (docker run --cpus, --memory)
- Monitor GC behavior; adjust GOGC if needed
Conclusion
Optimizing Go for containers requires understanding resource constraints and configuring the runtime to match them. From automatic GOMAXPROCS detection to GOMEMLIMIT alignment, from multi-stage Docker builds to Kubernetes resource tuning, each optimization compounds. A properly tuned containerized Go application performs comparably to bare metal while gaining the operational benefits of containers and cloud platforms.
The key insight: Go doesn't automatically adapt to container constraints; you must configure it explicitly. With the patterns in this article, you'll build robust, efficient, cloud-native Go services.