Profiling with pprof
Master CPU, memory, and goroutine profiling using Go's pprof tool to identify performance bottlenecks
Profiling with pprof
Go's pprof is one of the most powerful tools for identifying performance bottlenecks. Whether you're investigating high CPU usage, memory leaks, or goroutine explosions, pprof provides detailed insights into your application's runtime behavior.
Understanding pprof Basics
The pprof tool is built into Go and available through two primary interfaces: the runtime/pprof package for programmatic profiling and net/http/pprof for HTTP-based profiling in running services.
runtime/pprof Package
The runtime/pprof package provides low-level profiling capabilities for standalone applications:
package main
import (
"fmt"
"os"
"runtime/pprof"
)
func main() {
// Start CPU profiling
cpuFile, err := os.Create("cpu.prof")
if err != nil {
panic(err)
}
defer cpuFile.Close()
if err := pprof.StartCPUProfile(cpuFile); err != nil {
panic(err)
}
defer pprof.StopCPUProfile()
// Your application code here
expensiveOperation()
fmt.Println("CPU profile written to cpu.prof")
}
func expensiveOperation() {
sum := 0
for i := 0; i < 100000000; i++ {
sum += i
}
}Run with go run main.go and analyze the profile with go tool pprof cpu.prof.
net/http/pprof Integration
For HTTP services, net/http/pprof provides zero-cost profiling endpoints:
package main
import (
"fmt"
"net/http"
_ "net/http/pprof"
"time"
)
func handler(w http.ResponseWriter, r *http.Request) {
time.Sleep(100 * time.Millisecond)
fmt.Fprintf(w, "Hello World\n")
}
func main() {
http.HandleFunc("/api/hello", handler)
// pprof endpoints automatically registered at:
// /debug/pprof/
// /debug/pprof/profile (CPU)
// /debug/pprof/heap (Memory)
// /debug/pprof/goroutine
// /debug/pprof/mutex
// /debug/pprof/block
if err := http.ListenAndServe(":6060", nil); err != nil {
panic(err)
}
}Visit http://localhost:6060/debug/pprof/ to explore available profiles.
CPU Profiling
CPU profiling identifies which functions consume the most processing time. It uses statistical sampling to determine where your program spends CPU cycles.
Collecting CPU Profiles
package main
import (
"fmt"
"os"
"runtime/pprof"
)
func fibonacci(n int) int {
if n <= 1 {
return n
}
return fibonacci(n-1) + fibonacci(n-2)
}
func main() {
cpuFile, _ := os.Create("cpu.prof")
defer cpuFile.Close()
pprof.StartCPUProfile(cpuFile)
defer pprof.StopCPUProfile()
// Run intensive computation
for i := 0; i < 100; i++ {
fibonacci(30)
}
fmt.Println("Done")
}Analyzing CPU Profiles
go tool pprof cpu.profInteractive pprof commands:
top: Show top 10 functions by CPU timelist <function>: View source code with CPU time per lineweb: Generate graph visualization (requires graphviz)peek <function>: Brief info about a functionpdf: Export as PDF
Example session:
$ go tool pprof cpu.prof
File: main
Type: cpu
Time: Jan 10 2025 at 10:00am (1s total)
Entering interactive mode (type "help" for commands)
(pprof) top
Showing nodes accounting for 900ms, 90% of 1000ms total
Showing top 10 nodes out of 15
flat flat% sum% cum cum%
500ms 50.0% 50.0% 500ms 50.0% main.fibonacci
300ms 30.0% 80.0% 800ms 80.0% main.expensiveCompute
100ms 10.0% 90.0% 100ms 10.0% runtime.gcAssistant
(pprof) list fibonacci
Total: 1000ms
ROUTINE ======================== main.fibonacci in /app/main.go
500ms 800ms (flat, cum) 80.0% of Total
. . 3:func fibonacci(n int) int {
300ms 300ms 4: if n <= 1 {
200ms 200ms 5: return n
. 300ms 6: }
. 500ms 7: return fibonacci(n-1) + fibonacci(n-2)Tip: For live HTTP services, capture CPU profiles without stopping the service:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
Memory Profiling
Memory profiling tracks heap allocations, helping identify memory leaks and excessive allocation patterns.
Heap Allocation Profiling
package main
import (
"fmt"
"os"
"runtime"
"runtime/pprof"
)
func leakyFunction(iterations int) {
for i := 0; i < iterations; i++ {
// Allocate but forget to free - creates memory pressure
_ = make([]byte, 1024*1024) // 1MB allocation
}
}
func efficientFunction(iterations int) {
buf := make([]byte, 0, 10)
for i := 0; i < iterations; i++ {
buf = buf[:0] // Reuse buffer
buf = append(buf, byte(i))
}
}
func main() {
memFile, _ := os.Create("mem.prof")
defer memFile.Close()
// Run operations
runtime.GC()
if err := pprof.WriteHeapProfile(memFile); err != nil {
panic(err)
}
leakyFunction(1000)
if err := pprof.WriteHeapProfile(memFile); err != nil {
panic(err)
}
fmt.Println("Heap profile written")
}Profile Types: alloc_space vs inuse_space
$ go tool pprof -alloc_space mem.prof # Total allocated memory
$ go tool pprof -inuse_space mem.prof # Currently allocated memoryThe critical difference:
- alloc_space: Total memory ever allocated (includes freed memory)
- inuse_space: Active allocations right now (true memory usage)
- alloc_objects: Number of allocations made
- inuse_objects: Current number of objects
package main
import (
"fmt"
"os"
"runtime"
"runtime/pprof"
)
func demonstrateAllocation() {
// These allocations will be freed
for i := 0; i < 10000; i++ {
data := make([]byte, 1024)
_ = data // Use to prevent optimization
}
// Total alloc_space is high, but inuse_space is low
}
func main() {
f, _ := os.Create("heap.prof")
defer f.Close()
runtime.GC()
demonstrateAllocation()
runtime.GC()
pprof.WriteHeapProfile(f)
fmt.Println("Heap profile created")
}Goroutine Profiling
Identify goroutine leaks and contention:
package main
import (
"fmt"
"os"
"runtime"
"runtime/pprof"
"time"
)
func leakyWorker(done <-chan struct{}) {
for {
select {
case <-done:
return
default:
time.Sleep(1 * time.Millisecond)
}
}
}
func main() {
// Start goroutines but forget to stop some
for i := 0; i < 100; i++ {
go leakyWorker(make(chan struct{})) // Channel never closed!
}
time.Sleep(1 * time.Second)
// Write goroutine profile
f, _ := os.Create("goroutine.prof")
defer f.Close()
pprof.Lookup("goroutine").WriteTo(f, 0)
fmt.Printf("Active goroutines: %d\n", runtime.NumGoroutine())
}Analyze with:
go tool pprof goroutine.prof
(pprof) top
(pprof) list leakyWorkerBlock and Mutex Profiling
Block Profiling
Identifies goroutine blocking on channels and locks:
package main
import (
"fmt"
"os"
"runtime"
"runtime/pprof"
"sync"
)
func main() {
runtime.SetBlockProfileRate(1)
var wg sync.WaitGroup
ch := make(chan int, 1)
for i := 0; i < 100; i++ {
wg.Add(1)
go func() {
defer wg.Done()
// These will block waiting to send
ch <- 42
}()
}
wg.Wait()
f, _ := os.Create("block.prof")
defer f.Close()
pprof.Lookup("block").WriteTo(f, 0)
}Mutex Profiling
Track lock contention:
package main
import (
"fmt"
"os"
"runtime"
"runtime/pprof"
"sync"
)
func main() {
runtime.SetMutexProfileFraction(1)
var mu sync.Mutex
var wg sync.WaitGroup
for i := 0; i < 100; i++ {
wg.Add(1)
go func() {
defer wg.Done()
mu.Lock()
mu.Unlock()
}()
}
wg.Wait()
f, _ := os.Create("mutex.prof")
defer f.Close()
pprof.Lookup("mutex").WriteTo(f, 0)
}Reading Flame Graphs
Flame graphs visualize call stacks over time:
go tool pprof -http=:8080 cpu.profThis opens a web UI where you can:
- View "Flame Graph" tab for interactive visualization
- Click sections to zoom into specific functions
- Hover to see function names and percentages
- Use "View Options" to change visualization style
The x-axis represents total CPU time, height represents call depth.
pprof Web UI
When analyzing HTTP profiles, access the built-in web UI:
go tool pprof http://localhost:6060/debug/pprof/heapAvailable views:
- Graph: Call graph with percentages
- Flame Graph: Interactive stack visualization
- Top: Function list by metric
- Source: Annotated source code
- Disasm: Assembly code with profiling data
Production Profiling
Continuously profile live systems safely:
package main
import (
"log"
"net/http"
_ "net/http/pprof"
"runtime"
"runtime/pprof"
"time"
)
func init() {
// Write profiles periodically
go func() {
ticker := time.NewTicker(1 * time.Hour)
defer ticker.Stop()
for range ticker.C {
writeProfiles()
}
}()
}
func writeProfiles() {
timestamp := time.Now().Format("2006-01-02-15-04-05")
// Memory profile
memFile, _ := os.Create(fmt.Sprintf("profiles/heap-%s.prof", timestamp))
pprof.WriteHeapProfile(memFile)
memFile.Close()
// Goroutine profile
gorFile, _ := os.Create(fmt.Sprintf("profiles/goroutine-%s.prof", timestamp))
pprof.Lookup("goroutine").WriteTo(gorFile, 0)
gorFile.Close()
}
func main() {
go http.ListenAndServe(":6060", nil)
// Application logic
select {}
}Practical Workflow: Investigating a Slow Endpoint
Step 1: Identify the Problem
package main
import (
"fmt"
"net/http"
_ "net/http/pprof"
"time"
)
func slowEndpoint(w http.ResponseWriter, r *http.Request) {
// Simulate slow operation
sum := 0
for i := 0; i < 500000000; i++ {
sum += i
}
fmt.Fprintf(w, "Result: %d\n", sum)
}
func main() {
http.HandleFunc("/slow", slowEndpoint)
http.ListenAndServe(":8080", nil)
}Step 2: Capture CPU Profile
# Collect 30-second profile
go tool pprof http://localhost:8080/debug/pprof/profile?seconds=30
# Wait for requests to complete, then analyze
(pprof) top10
(pprof) list slowEndpointStep 3: Identify Bottleneck
The list command shows CPU time per line, revealing exact hotspots in your code.
Step 4: Optimize and Verify
func slowEndpoint(w http.ResponseWriter, r *http.Request) {
// Use formula instead of loop: sum = n*(n-1)/2
n := 500000000
sum := n * (n - 1) / 2
fmt.Fprintf(w, "Result: %d\n", sum)
}Capture new profile to confirm improvement.
Common pprof Patterns
Find memory leaks:
go tool pprof -base=baseline.prof current.profCompare profiles:
go tool pprof -http=:8080 old.prof new.profExport data:
go tool pprof -svg cpu.prof > cpu.svg
go tool pprof -pdf cpu.prof > cpu.pdfFilter results:
(pprof) focus handler
(pprof) ignore runtimeKey Takeaways
- Use
net/http/pprofin production for zero-cost profiling - CPU profiling identifies compute bottlenecks
- Heap profiling distinguishes between alloc_space (total) and inuse_space (current)
- Goroutine profiling detects leaks
- Block/mutex profiling reveals synchronization issues
- Flame graphs provide intuitive visualization
- Always profile with real workloads before optimizing