Use Go's execution tracer to visualize goroutine scheduling, GC events, and I/O operations

Execution Tracing

While pprof shows where CPU time goes, execution tracing reveals when and why things happen. It captures goroutine scheduling, garbage collection events, network I/O, and synchronization points—essential for understanding concurrent performance issues.

Runtime/trace Fundamentals

The runtime/trace package provides low-level tracing capabilities:

package main

import (
    "fmt"
    "os"
    "runtime/trace"
)

func main() {
    // Start tracing
    f, err := os.Create("trace.out")
    if err != nil {
        panic(err)
    }
    defer f.Close()

    if err := trace.Start(f); err != nil {
        panic(err)
    }
    defer trace.Stop()

    // Application runs here with tracing active
    doWork()

    fmt.Println("Trace written to trace.out")
}

func doWork() {
    // Simulate some concurrent work
    done := make(chan struct{})
    for i := 0; i < 100; i++ {
        go func() {
            defer func() { done <- struct{}{} }()
            // Simulate work
            sum := 0
            for j := 0; j < 1000000; j++ {
                sum += j
            }
        }()
    }

    // Wait for completion
    for i := 0; i < 100; i++ {
        <-done
    }
}

Run and analyze:

go run main.go
go tool trace trace.out

This opens a web-based interactive viewer in your browser.

Integration with net/http/pprof

For HTTP services, add tracing endpoints:

package main

import (
    "net/http"
    _ "net/http/pprof"
)

func main() {
    // Tracing is available at /debug/pprof/trace
    // Capture 5-second trace:
    // curl http://localhost:6060/debug/pprof/trace?seconds=5 > trace.out
    // go tool trace trace.out

    http.HandleFunc("/api/data", func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("OK"))
    })

    http.ListenAndServe(":6060", nil)
}

Capture traces from running services:

curl http://localhost:6060/debug/pprof/trace?seconds=10 > trace.out
go tool trace trace.out

Understanding the Trace Viewer

The trace viewer UI displays several critical views:

Goroutine Timeline

Shows each goroutine's execution over time:

Green: Running on CPU
Orange: Blocked (network, locks, channels)
Gray: Runnable but waiting for CPU

Network Blocking Profile

Displays I/O operations and their duration:

package main

import (
    "io"
    "net"
    "os"
    "runtime/trace"
)

func main() {
    f, _ := os.Create("trace.out")
    defer f.Close()
    trace.Start(f)
    defer trace.Stop()

    // Network I/O appears in trace
    conn, _ := net.Dial("tcp", "example.com:80")
    defer conn.Close()

    io.Copy(io.Discard, conn)
}

In the trace viewer, you'll see network blocking times annotated.

GC Events

The trace captures every garbage collection:

package main

import (
    "fmt"
    "os"
    "runtime"
    "runtime/trace"
)

func main() {
    f, _ := os.Create("trace.out")
    defer f.Close()
    trace.Start(f)
    defer trace.Stop()

    // Trigger allocations to see GC in trace
    for i := 0; i < 10; i++ {
        data := make([]byte, 10000000)
        _ = data
    }

    var m runtime.MemStats
    runtime.ReadMemStats(&m)
    fmt.Printf("GC runs: %d\n", m.NumGC)
}

Trace shows:

Mark phase duration
Sweep phase duration
GC pause times
Goroutines blocked by GC

Programmatic Task and Region Tracking

Use tasks and regions to annotate your trace:

package main

import (
    "context"
    "fmt"
    "os"
    "runtime/trace"
)

func processRequest(ctx context.Context, id int) {
    // Create a task for this logical unit of work
    task := trace.NewTask(ctx, fmt.Sprintf("request-%d", id))
    defer task.End()

    // Create regions within the task
    trace.WithRegion(ctx, "parseInput", func() {
        // Simulate input parsing
        sum := 0
        for i := 0; i < 100000; i++ {
            sum += i
        }
    })

    trace.WithRegion(ctx, "processData", func() {
        // Simulate processing
        sum := 0
        for i := 0; i < 1000000; i++ {
            sum += i
        }
    })

    trace.WithRegion(ctx, "formatOutput", func() {
        // Simulate formatting
        _ = fmt.Sprintf("Result: %d", 42)
    })
}

func main() {
    f, _ := os.Create("trace.out")
    defer f.Close()
    trace.Start(f)
    defer trace.Stop()

    ctx := context.Background()

    // Process multiple requests to see concurrent execution
    for i := 0; i < 50; i++ {
        go processRequest(ctx, i)
    }

    // Wait for completion
    select {}
}

In the trace viewer, click the "Trace" tab to see execution flow with your custom regions highlighted.

Analyzing Goroutine Scheduling

Identify scheduling issues and imbalance:

package main

import (
    "os"
    "runtime/trace"
    "sync"
)

func main() {
    f, _ := os.Create("trace.out")
    defer f.Close()
    trace.Start(f)
    defer trace.Stop()

    var wg sync.WaitGroup
    ch := make(chan int, 1) // Buffered channel creates imbalance

    for i := 0; i < 100; i++ {
        wg.Add(2)

        go func(id int) {
            defer wg.Done()
            // Sender goroutines
            for j := 0; j < 10000; j++ {
                ch <- j
            }
        }(i)

        go func(id int) {
            defer wg.Done()
            // Receiver goroutines
            for j := 0; j < 10000; j++ {
                <-ch
            }
        }(i)
    }

    wg.Wait()
}

The trace reveals:

Which goroutines block waiting for channel operations
CPU utilization across cores
Synchronization bottlenecks

Detecting Goroutine Contention

Find lock contention patterns:

package main

import (
    "context"
    "os"
    "runtime/trace"
    "sync"
    "time"
)

func main() {
    f, _ := os.Create("trace.out")
    defer f.Close()
    trace.Start(f)
    defer trace.Stop()

    var mu sync.Mutex
    var wg sync.WaitGroup

    for i := 0; i < 1000; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            task := trace.NewTask(context.Background(), "criticalSection")
            defer task.End()

            mu.Lock()
            defer mu.Unlock()

            // Critical section
            time.Sleep(1 * time.Millisecond)
        }(i)
    }

    wg.Wait()
}

In the trace:

Orange sections show lock blocking
Concentration of orange indicates high contention
Look for periods where most goroutines are blocked

GC Impact Analysis

Understand how garbage collection affects performance:

package main

import (
    "os"
    "runtime"
    "runtime/trace"
)

func allocateMemory(iterations int) {
    for i := 0; i < iterations; i++ {
        // Small allocations trigger frequent GC
        data := make([]byte, 1024)
        _ = data
    }
}

func main() {
    f, _ := os.Create("trace.out")
    defer f.Close()
    trace.Start(f)
    defer trace.Stop()

    // Show GC impact on concurrent work
    done := make(chan struct{})

    go func() {
        defer func() { done <- struct{}{} }()
        allocateMemory(100000)
    }()

    go func() {
        defer func() { done <- struct{}{} }()
        // Count operations while GC runs
        count := 0
        for i := 0; i < 1000000000; i++ {
            count++
        }
    }()

    <-done
    <-done

    var m runtime.MemStats
    runtime.ReadMemStats(&m)
}

The trace shows:

GC pause frequency and duration
Impact on other goroutines during GC
Relationship between allocation rate and GC frequency

Practical Example: Tracing a Request Pipeline

package main

import (
    "context"
    "fmt"
    "net/http"
    "os"
    "runtime/trace"
    "sync"
    "time"
)

func processRequest(ctx context.Context, id int) string {
    task := trace.NewTask(ctx, fmt.Sprintf("request-%d", id))
    defer task.End()

    var result string

    // Stage 1: Fetch
    trace.WithRegion(ctx, "fetch", func() {
        time.Sleep(10 * time.Millisecond)
        result = fmt.Sprintf("data-%d", id)
    })

    // Stage 2: Transform
    trace.WithRegion(ctx, "transform", func() {
        time.Sleep(5 * time.Millisecond)
        result = result + "-processed"
    })

    // Stage 3: Enrich (parallel sub-tasks)
    trace.WithRegion(ctx, "enrich", func() {
        var wg sync.WaitGroup
        enrichments := make([]string, 3)

        for i := 0; i < 3; i++ {
            wg.Add(1)
            idx := i
            go func() {
                defer wg.Done()
                subTask := trace.NewTask(ctx, fmt.Sprintf("enrich-%d", idx))
                defer subTask.End()

                time.Sleep(3 * time.Millisecond)
                enrichments[idx] = fmt.Sprintf("e%d", idx)
            }()
        }

        wg.Wait()
        for _, e := range enrichments {
            result = result + "-" + e
        }
    })

    // Stage 4: Serialize
    trace.WithRegion(ctx, "serialize", func() {
        time.Sleep(2 * time.Millisecond)
        result = fmt.Sprintf(`{"data": "%s"}`, result)
    })

    return result
}

func main() {
    f, _ := os.Create("trace.out")
    defer f.Close()
    trace.Start(f)
    defer trace.Stop()

    ctx := context.Background()

    // Simulate concurrent requests
    var wg sync.WaitGroup
    for i := 0; i < 100; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            _ = processRequest(ctx, id)
        }(i)
    }

    wg.Wait()

    fmt.Println("Trace complete")
}

Run and analyze:

go run main.go
go tool trace trace.out

In the viewer:

Click on request tasks to see execution timeline
Identify which stages are bottlenecks
See parallelism during enrichment stage
Observe scheduling of concurrent requests

Sampling Strategies for Production

Full tracing has overhead. Use sampling for production:

package main

import (
    "math/rand"
    "net/http"
    _ "net/http/pprof"
    "os"
    "runtime/trace"
    "sync/atomic"
    "time"
)

var traceEnabled atomic.Bool

func enableTracing() {
    if traceEnabled.Load() {
        return
    }

    traceEnabled.Store(true)
    f, _ := os.Create("prod-trace.out")
    trace.Start(f)

    // Stop after 10 seconds
    time.AfterFunc(10*time.Second, func() {
        trace.Stop()
        f.Close()
        traceEnabled.Store(false)
    })
}

func handleRequest(w http.ResponseWriter, r *http.Request) {
    // Sample 1% of requests for tracing
    if rand.Float32() < 0.01 && !traceEnabled.Load() {
        enableTracing()
    }

    w.Write([]byte("OK"))
}

func main() {
    http.HandleFunc("/api", handleRequest)
    http.ListenAndServe(":8080", nil)
}

Correlating Traces with pprof Data

Combine tracing and profiling for complete picture:

# Capture both simultaneously
go test -bench=BenchmarkWork \
    -trace=trace.out \
    -cpuprofile=cpu.prof \
    -benchmem

# Analyze trace for scheduling issues
go tool trace trace.out

# Analyze pprof for hotspots
go tool pprof cpu.prof

Use trace viewer to identify when hotspots occur, then use pprof to understand why they're expensive.

View Options: Toggle goroutines, processors, network blocking
Zoom: Select time range to focus on specific events
Search: Find specific goroutine IDs or task names
Statistics: View execution stats per goroutine
Synchronization: See channel, lock, and semaphore events
Execution Timeline: Drag to pan, scroll to zoom on time axis

Common Trace Patterns

Runnable Goroutine Spike: Many goroutines queued for CPU

Solution: Reduce concurrency or add backpressure

Blocking During GC: Goroutines suspended by garbage collection

Solution: Reduce allocation rate or tune GC with GOGC

Channel Contention: Orange blocking on channel operations

Solution: Buffer channels or redesign communication pattern

Lock Contention: Goroutines waiting for mutexes

Solution: Reduce critical section size or use lock-free algorithms

Key Takeaways

Use runtime/trace to capture detailed execution flow
Tasks and regions annotate traces with application semantics
Trace viewer reveals scheduling inefficiencies invisible to pprof
GC impact becomes visible in traces
Sampling reduces production overhead
Correlate traces with pprof for complete performance picture
Goroutine timeline shows when (not just how much) work happens
Identify contention on channels, locks, and I/O operations

Execution Tracing

Execution Tracing

Runtime/trace Fundamentals

Integration with net/http/pprof

Understanding the Trace Viewer

Goroutine Timeline

Network Blocking Profile

GC Events

Programmatic Task and Region Tracking

Analyzing Goroutine Scheduling

Detecting Goroutine Contention

GC Impact Analysis

Practical Example: Tracing a Request Pipeline

Sampling Strategies for Production

Correlating Traces with pprof Data

Trace Viewer Navigation Tips

Common Trace Patterns

Key Takeaways

On this page