Go Performance Guide
Ecosystem & Production

Binary Size and Build Optimization

Reducing Go binary size with ldflags, trimpath, and UPX, optimizing build times with caching and parallelism, cross-compilation strategies, and build tag patterns.

Introduction

Binary size and build optimization are often overlooked in Go development, yet they significantly impact deployment cost, cold start times, and operational efficiency. Whether you're deploying to serverless platforms, containerized environments, or edge computing nodes, every megabyte and millisecond counts. This guide covers practical techniques from basic ldflags to advanced profiling strategies, with benchmarks and real-world examples.

Why Binary Size Matters

Container Images and Distribution

Go's static compilation model creates self-contained binaries, but without optimization, even simple programs weigh 1.8MB or more. In containerized deployments:

  • A 100MB binary → 10x container image inflation with base images
  • Multi-region deployments: 100MB × 10 regions × 50 deploys/month = 50GB transferred
  • Container registry storage costs accumulate
  • Pull latency directly impacts pod scheduling time

Cold Start and Serverless

Serverless platforms charge for initialization time. AWS Lambda loads your entire code artifact into memory before execution starts. A smaller binary means:

  • Faster code loading
  • Reduced memory footprint
  • Lower cold start penalties (critical for bursty traffic)

Edge Computing

Edge locations have limited bandwidth and storage. Content delivery networks and edge workers benefit enormously from optimized binaries.

Memory Footprint

Large binaries loaded into memory increase the working set, reducing CPU cache efficiency and increasing paging pressure. This particularly hurts:

  • Embedded systems
  • Containerized environments with memory limits
  • High-density cloud deployments

Baseline: What Makes Go Binaries Large

A minimal Go program:

package main

func main() {
    println("Hello, World!")
}

Compiles to approximately 1.8MB on Linux x86_64. Why?

Static Linking

Go binaries are statically linked by default. The runtime, standard library (libc functions, threading support), and your code are all bundled together. This means:

  • No runtime dependency on system libraries
  • Complete portability (copy and run anywhere)
  • But also: every unused piece of the standard library is included

Debug Information (DWARF)

The default binary includes DWARF debugging symbols allowing debuggers to inspect source code. A "Hello World" binary contains:

  • Symbol table (~200KB)
  • DWARF debug info (~600KB)
  • Type metadata (~300KB)

Runtime and GC Code

The Go runtime (~400KB) includes:

  • Memory allocator
  • Garbage collector implementation
  • Goroutine scheduler
  • defer/panic handling

Type Metadata for Reflection

Even if you don't use reflection, type information is embedded to support dynamic type assertions and interface values.

Stripping Symbols and Debug Info with ldflags

The most effective immediate optimization: remove symbols and debug information.

Understanding ldflags

The -ldflags flag passes options to the linker. Key flags:

  • -s: Strip the symbol table (30-50KB saved)
  • -w: Strip DWARF debug information (600-800KB saved)
  • -X main.Version=v1.2.3: Embed version information without rebuilding

Before and After Comparison

# Default build
$ go build -o myapp
$ ls -lh myapp
-rwxr-xr-x  1 user  staff  1.8M Jan 10 12:00 myapp

# Stripped build
$ go build -ldflags="-s -w" -o myapp
$ ls -lh myapp
-rwxr-xr-x  1 user  staff  1.2M Jan 10 12:00 myapp

# Size reduction: 33%

Embedding Version Information

go build \
  -ldflags="-s -w -X main.Version=v1.2.3 -X main.BuildDate=$(date -u +'%Y-%m-%dT%H:%M:%SZ')" \
  -o myapp

In your code:

package main

var (
    Version = "dev"
    BuildDate = "unknown"
)

func main() {
    println("Version:", Version)
    println("Built:", BuildDate)
}

Trimpath: Removing Filesystem Paths

The -trimpath flag removes absolute filesystem paths from your binary.

Why This Matters

By default, Go embeds the full path to source files:

$ go build -o myapp
$ strings myapp | grep /home/
/home/developer/project/main.go
/home/developer/project/pkg/service.go
/home/developer/.../vendor/...

This leaks your directory structure and can reveal sensitive information about your development environment.

Using -trimpath

go build -trimpath -o myapp

This replaces absolute paths with module names:

$ strings myapp | grep main.go
github.com/myorg/myproject/main.go

Reproducible Builds

-trimpath enables reproducible builds: identical source produces identical binaries regardless of build location. Essential for:

  • Security audits
  • Supply chain verification
  • Binary integrity checks

Pure Go with CGO_ENABLED=0

Go supports C libraries via Cgo, but linking C code increases binary size.

Impact

# With CGO (default on most systems)
$ CGO_ENABLED=1 go build -o myapp
$ ls -lh myapp
-rwxr-xr-x  1 user  staff  2.1M

# Pure Go
$ CGO_ENABLED=0 go build -o myapp
$ ls -lh myapp
-rwxr-xr-x  1 user  staff  1.8M

When to Disable CGO

Disable when:

  • Targeting Linux/musl containers (avoids glibc linking)
  • You don't need os/user, net, or other Cgo-dependent packages
  • Deploying to multiple OS versions (Cgo-built binaries are OS version-specific)

When You Need CGO

Keep enabled for:

  • Native performance-critical code (e.g., BLAKE3, specialized crypto)
  • System-level functionality (e.g., ioctl calls)
  • Hardware interaction

UPX Compression: Trading Size for Startup

UPX (Ultimate Packer for eXecutables) compresses executables with a self-extracting decompression stub.

How UPX Works

Original Binary

[Compressed Data] + [Decompression Stub]

Smaller File on Disk

On Execution: Stub Decompresses → Runs Original

Installation and Usage

# Install UPX
brew install upx  # macOS
apt-get install upx  # Debian/Ubuntu

# Compress with default settings
upx myapp
ls -lh myapp  # Usually 40-50% smaller

# Maximum compression
upx --best myapp
ls -lh myapp  # 60-70% smaller, slower compression

# Aggressive compression
upx --brute myapp  # Most compression, very slow

Benchmark: Size vs Startup Time

Binary: 5MB Go service
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Method          Size    Startup (ms)   Tradeoff
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Original        5.0MB   15ms          Baseline
ldflags -s -w   3.3MB   15ms          Good (no penalty)
UPX --best      1.8MB   68ms          4.5x slower startup
UPX --brute     1.5MB   92ms          6x slower startup
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

When NOT to Use UPX

Avoid UPX for:

  • Latency-sensitive services: 50-200ms overhead per startup is significant
  • Container orchestration with frequent restarts: Kubernetes rolling updates, auto-scaling
  • Stateless microservices: Cold starts cost more than the bandwidth saved
  • Real-time systems: Unpredictable decompression time breaks latency guarantees

When UPX is Valuable

Use UPX for:

  • Distribution-heavy scenarios: Embedded systems, IoT devices, poor connectivity
  • Batch processing: Lambda functions that start rarely
  • Serverless with generous timeout windows: Startup penalty acceptable if called infrequently
  • Storage-constrained environments: Very limited disk space

Dead Code Elimination

The Go linker automatically removes unused code through reachability analysis, but several patterns can prevent optimization.

How the Linker Works

// package main
func main() {
    usedFunc()
}

func usedFunc() {
    println("called")
}

func unusedFunc() {
    println("never called")  // Linker removes this
}

When building, the linker traces from main, including only reachable functions and their dependencies.

Patterns That Keep Dead Code Alive

1. The //go:linkname Directive

//go:linkname external somepackage.exportedFunc
func externalFunc()

func main() {
    externalFunc()  // This keeps the entire somepackage alive
}

If you must use //go:linkname, isolate it in a minimal interface.

2. Unused Imported Packages

import (
    _ "net/http/pprof"  // Registers HTTP profiling handlers globally
    "unused/package"     // Not referenced; linker includes dependencies
)

Keep blank imports minimal. If only needed for side effects, document with a comment.

3. Reflection-Based Code

import "encoding/json"

type Config struct {
    Name string `json:"name"`
}

func main() {
    // JSON marshaling uses reflection, keeping runtime/marshal code alive
    json.Marshal(Config{})
}

No way around this; reflection requires type information. Use alternative serialization formats (protobuf, msgpack) to reduce type metadata.

Build Tags and Conditional Compilation

Build tags allow you to exclude code at compile time, enabling minimal production builds.

Syntax and Patterns

//go:build production && !debug
// +build production,!debug

package main

const DebugMode = false
const ProfileEndpointEnabled = false

func debugLog(msg string) {
    // Optimized away if DebugMode is false
    if DebugMode {
        println(msg)
    }
}

The linker will eliminate the debugLog call and the string constant if inlined correctly.

Minimal Production Build

Create a "prod" build configuration:

// debug.go
//go:build !production
// +build !production

package main

import "net/http"
import _ "net/http/pprof"

func init() {
    // Register debug handlers
    http.HandleFunc("/debug/stats", handleDebugStats)
}

func handleDebugStats(w http.ResponseWriter, r *http.Request) {
    // Debug implementation
}
// debug_stub.go
//go:build production
// +build production

package main

func init() {
    // Debug handlers not registered
}

Build with:

go build -tags=production -ldflags="-s -w" -o myapp-prod

Binary size comparison:

  • Default: 2.1MB
  • With -tags=production: 1.9MB (10% savings)
  • With -tags=production -ldflags="-s -w": 1.3MB (38% savings)

Profile-Guided Optimization (PGO)

Go 1.20+ supports PGO: the linker makes inlining and devirtualization decisions based on real execution profiles.

How PGO Works

  1. Collect a production profile: Run your service, capturing CPU profile
  2. Place profile in repo: pgo/default.prof
  3. Rebuild with PGO enabled: go build automatically detects and uses the profile

Collecting a Profile

# From pprof endpoint (existing service)
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30

# Save binary profile
curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu.prof

Using the Profile

mkdir -p pgo
cp cpu.prof pgo/default.prof

# Build with PGO (automatic detection)
go build -o myapp

The Go compiler now optimizes based on actual usage patterns.

Performance Impact

Benchmark: JSON parsing with PGO
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Without PGO: 1000000 ops/sec
With PGO:    1043000 ops/sec (+4.3%)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Benchmark: HTTP routing
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Without PGO: 500000 ops/sec
With PGO:    535000 ops/sec (+7%)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Typical improvements: 2-7% depending on workload.

Build Caching and Speed

Understanding Go's Build Cache

Go caches build artifacts in $GOCACHE (default: ~/.cache/go-build):

# View cache location
go env GOCACHE
# Output: /home/user/.cache/go-build

# Inspect cache usage
du -sh ~/.cache/go-build
# 2.5G typical for active projects

Seeing What's Rebuilt

go build -x -o myapp 2>&1 | head -20
# Shows all compile/link commands, highlighting what's rebuilt

Parallel Build with GOMAXPROCS

# Use all available CPUs during build
GOMAXPROCS=8 go build -o myapp

# Default: uses runtime.NumCPU()
# Override for slower systems or CI with restricted resources
GOMAXPROCS=2 go build

go build vs go install

# go build: compiles to current directory (fast for iteration)
go build -o myapp

# go install: compiles and caches globally (better for shared code)
go install ./cmd/myapp  # Installs to $GOBIN

What Triggers Recompilation

Go rebuilds when:

  • Source code changes (content hash)
  • Dependencies update (version changes)
  • Build flags change (-tags, -ldflags)
  • Go version changes

Example caching across builds:

# First build: 2.1s (compiles everything)
$ time go build -o myapp

# Change a comment, rebuild: 1.8s (reuses stdlib cache)
$ time go build -o myapp

# Change code, rebuild: 0.5s (incremental compilation of one package)
$ time go build -o myapp

CI/CD Cache Strategies

Docker layer caching:

FROM golang:1.21-alpine AS builder

WORKDIR /build

# Cache go.mod and go.sum separately
COPY go.mod go.sum ./
RUN go mod download

# Cache vendor directory if used
COPY . .

RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o app .

FROM alpine:latest
COPY --from=builder /build/app /app
CMD ["/app"]

GitHub Actions cache:

- uses: actions/setup-go@v4
  with:
    go-version: '1.21'
    cache: true  # Automatically caches GOMODCACHE

- run: go build -o myapp

# Cache location: ~/.cache/go-build
- uses: actions/cache@v3
  with:
    path: ~/.cache/go-build
    key: go-build-${{ runner.os }}-${{ hashFiles('**/go.sum') }}

Cross-Compilation

GOOS and GOARCH Matrix

Go makes cross-compilation trivial:

# Build for multiple platforms
GOOS=linux GOARCH=amd64 go build -o myapp-linux-amd64
GOOS=darwin GOARCH=arm64 go build -o myapp-macos-arm64
GOOS=windows GOARCH=amd64 go build -o myapp-windows-amd64.exe

# All combinations:
for OS in linux darwin windows; do
  for ARCH in amd64 arm64; do
    GOOS=$OS GOARCH=$ARCH go build -o myapp-$OS-$ARCH
  done
done

ARM Variants

ARM has multiple sub-versions:

# ARMv6 (Raspberry Pi Zero, original Pi)
GOOS=linux GOARCH=arm GOARM=6 go build -o myapp-armv6

# ARMv7 (Raspberry Pi 2/3/4)
GOOS=linux GOARCH=arm GOARM=7 go build -o myapp-armv7

# ARM64 (Raspberry Pi 4B 8GB, newer boards)
GOOS=linux GOARCH=arm64 go build -o myapp-arm64

Cross-Compiling with CGO

Pure Go code cross-compiles seamlessly, but CGO requires a C compiler for the target platform.

Using zig cc as a cross-compiler (recommended):

# Install zig
brew install zig

# Cross-compile CGO code to Linux from macOS
CGO_ENABLED=1 CC="zig cc -target x86_64-linux-gnu" GOOS=linux GOARCH=amd64 go build -o myapp

Alternative: Docker-based cross-compilation:

# Build inside Docker with target OS toolchain
docker run --rm -v "$PWD":/build golang:1.21 \
  sh -c "cd /build && CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -o myapp"

Multi-Architecture Docker Images

Modern Docker supports building for multiple architectures:

# Use buildx (experimental Docker feature)
# docker buildx create --name multiarch
# docker buildx use multiarch

FROM --platform=$BUILDPLATFORM golang:1.21 AS builder
ARG TARGETOS TARGETARCH

WORKDIR /build
COPY go.mod go.sum ./
RUN go mod download
COPY . .

RUN CGO_ENABLED=0 GOOS=$TARGETOS GOARCH=$TARGETARCH \
    go build -ldflags="-s -w" -o app .

FROM alpine:latest
COPY --from=builder /build/app /app
CMD ["/app"]

Build and push:

docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t myregistry/myapp:latest \
  --push .

Module Optimization

Analyzing Dependency Size

# Check dependency graph
go mod graph | head -20

# Analyze module sizes (with go-mod-stats tool)
go install github.com/muesli/go-mod-stats@latest
go-mod-stats

# Output:
# Module                          Size    Deps
# github.com/your/app             245KB   45
# vendor/github.com/lib/pq        892KB   3
# vendor/github.com/aws/sdk       5.2MB   120

Removing Unused Dependencies

# Tidy removes unused modules
go mod tidy

# Check unused code within dependencies
go run golang.org/x/tools/cmd/deadcode@latest ./...

Lighter Alternatives

Replace heavy dependencies:

DependencySizeAlternativeSizeSavings
encoding/jsonstdlibsonic or easyjson+200KB code-300KB at runtime
testingstdlibtestify+100KB-50KB net
logrus2.1MBslog (stdlib)0KB-2.1MB
gorm3.5MBsqlc (codegen)+150KB-3.3MB

Vendoring for Reproducibility

go mod vendor  # Creates vendor/ directory

# Builds use vendored dependencies (offline, reproducible)
go build -mod=vendor

Trade-offs:

  • Vendoring: +15MB repo size, guaranteed reproducibility
  • go.mod only: Smaller repo, depends on Go module proxy uptime

Reproducible Builds

Reproducible builds produce identical binaries from identical source, enabling:

  • Security audits (third-party verification)
  • Supply chain verification (compare against published checksums)
  • Regression testing (reproduce old versions exactly)

Building Reproducibly

go build \
  -trimpath \
  -ldflags="-s -w" \
  -buildvcs=false \
  -o myapp

Flags explained:

  • -trimpath: Remove filesystem paths
  • -ldflags="-s -w": Strip symbols/debug info (consistent output)
  • -buildvcs=false: Don't embed VCS metadata (varies by commit)

Verification

# Build twice on different machines
# Machine 1
go build -trimpath -ldflags="-s -w" -o myapp1

# Machine 2
go build -trimpath -ldflags="-s -w" -o myapp2

# Compare checksums
sha256sum myapp1 myapp2
# Must be identical

# If not identical, investigate:
go version -m myapp1  # Check Go version
strings myapp1 | grep -E "^/|\.go$"  # Check for embedded paths

Build Optimization Makefile

A comprehensive Makefile automating all optimization targets:

.PHONY: build build-prod build-stripped build-upx build-lean cross-compile clean help

APP_NAME := myapp
VERSION := $(shell git describe --tags --always)
BUILD_TIME := $(shell date -u +'%Y-%m-%dT%H:%M:%SZ')
GIT_COMMIT := $(shell git rev-parse --short HEAD)

LDFLAGS := -ldflags="-X main.Version=$(VERSION) -X main.BuildDate=$(BUILD_TIME) -X main.GitCommit=$(GIT_COMMIT)"
LDFLAGS_PROD := -ldflags="-s -w -X main.Version=$(VERSION) -X main.BuildDate=$(BUILD_TIME)"

help:
	@echo "Available targets:"
	@echo "  make build          - Standard build with debug symbols"
	@echo "  make build-prod     - Production build (stripped, trimpath)"
	@echo "  make build-stripped - Remove debug info only"
	@echo "  make build-upx      - UPX-compressed binary"
	@echo "  make build-lean     - Minimal binary with build tags"
	@echo "  make cross-compile  - Build for multiple platforms"
	@echo "  make clean          - Remove build artifacts"
	@echo "  make benchmark      - Measure binary size reduction"

# Standard build with symbols for local development
build:
	go build $(LDFLAGS) -o bin/$(APP_NAME) ./cmd/$(APP_NAME)

# Production build: trimpath + ldflags -s -w
build-prod:
	CGO_ENABLED=0 go build -trimpath $(LDFLAGS_PROD) -o bin/$(APP_NAME)-prod ./cmd/$(APP_NAME)

# Strip only debug info
build-stripped:
	go build -ldflags="-s -w" -o bin/$(APP_NAME)-stripped ./cmd/$(APP_NAME)

# UPX compression (optional dependency)
build-upx: build-prod
	upx --best bin/$(APP_NAME)-prod -o bin/$(APP_NAME)-upx

# Minimal build with production tags
build-lean:
	CGO_ENABLED=0 go build -tags=production -trimpath $(LDFLAGS_PROD) -o bin/$(APP_NAME)-lean ./cmd/$(APP_NAME)

# Cross-compilation matrix
cross-compile: clean
	mkdir -p dist
	GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -trimpath $(LDFLAGS_PROD) -o dist/$(APP_NAME)-linux-amd64 ./cmd/$(APP_NAME)
	GOOS=linux GOARCH=arm64 CGO_ENABLED=0 go build -trimpath $(LDFLAGS_PROD) -o dist/$(APP_NAME)-linux-arm64 ./cmd/$(APP_NAME)
	GOOS=darwin GOARCH=amd64 CGO_ENABLED=0 go build -trimpath $(LDFLAGS_PROD) -o dist/$(APP_NAME)-darwin-amd64 ./cmd/$(APP_NAME)
	GOOS=darwin GOARCH=arm64 CGO_ENABLED=0 go build -trimpath $(LDFLAGS_PROD) -o dist/$(APP_NAME)-darwin-arm64 ./cmd/$(APP_NAME)
	GOOS=windows GOARCH=amd64 CGO_ENABLED=0 go build -trimpath $(LDFLAGS_PROD) -o dist/$(APP_NAME)-windows-amd64.exe ./cmd/$(APP_NAME)
	@echo "Cross-compiled binaries in dist/"

# Benchmark size reductions
benchmark: clean
	@echo "Build size comparison:"
	@echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
	go build -o bin/$(APP_NAME)-default ./cmd/$(APP_NAME)
	@ls -lh bin/$(APP_NAME)-default | awk '{print "Default:           " $$5}'
	go build -ldflags="-s -w" -o bin/$(APP_NAME)-stripped ./cmd/$(APP_NAME)
	@ls -lh bin/$(APP_NAME)-stripped | awk '{print "Stripped (-s -w):   " $$5}'
	CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o bin/$(APP_NAME)-prod ./cmd/$(APP_NAME)
	@ls -lh bin/$(APP_NAME)-prod | awk '{print "Production:        " $$5}'
	@echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"

clean:
	rm -rf bin/ dist/
	go clean

test:
	go test -v ./...

lint:
	golangci-lint run ./...

Usage:

make build                  # Fast dev build
make build-prod             # Production-ready
make benchmark              # See size savings
make cross-compile          # Multi-platform
make help                   # Show all targets

Summary

Binary optimization in Go progresses through layers:

  1. Always apply: -ldflags="-s -w" (30% savings, zero overhead)
  2. Usually apply: -trimpath (security, reproducibility)
  3. Consider: CGO_ENABLED=0 (if not using CGO)
  4. For specific scenarios: UPX (cold-start heavy, bursty traffic)
  5. Advanced: Build tags, PGO, dead code elimination

The Makefile automates these decisions, and a simple benchmark shows exact savings for your binary. Most Go services benefit from 20-40% reduction with zero runtime cost, and serverless applications can achieve 60%+ with UPX at the cost of startup latency.

On this page