Runtime Bootstrap — What Happens Before main()

The complete Go program startup sequence — from OS process creation through runtime initialization, scheduler setup, and the journey to your main() function.

Introduction

Before your main() function executes, the Go runtime has accomplished extraordinary work: initialized the garbage collector, started the scheduler, created goroutines, detected CPU features, and prepared the entire execution environment. Understanding this bootstrap sequence reveals where startup overhead comes from and how to optimize it.

The journey from ./program execution to your first line of Go code involves assembly code, C functions, and runtime initialization. Most of this is transparent, but understanding the sequence helps you:

Diagnose slow startup times
Avoid expensive operations in init() functions
Understand goroutine lifecycle
Appreciate the sophisticated machinery running behind the scenes

This article traces the complete startup sequence, examining key data structures and initialization steps.

The Bootstrap Journey: High-Level Overview

OS executes program binary
    ↓
_rt0_{'GOARCH'}_{'GOOS'} (assembly entry point)
    ↓
runtime.rt0_go (assembly + runtime setup)
    ↓
[Set up g0 and m0]
    ↓
[Detect CPU features]
    ↓
[Initialize command-line args]
    ↓
[OS-specific initialization]
    ↓
[Scheduler initialization]
    ↓
[Create main goroutine]
    ↓
[Start event loop (schedule)]
    ↓
runtime.main (real main, in scheduler)
    ↓
[Start system monitor goroutine]
    ↓
[Initialize runtime locks]
    ↓
[Run runtime.init()]
    ↓
[Start GC workers]
    ↓
[Run all package init() functions]
    ↓
[Call user main()]
    ↓
[Exit]

Let's walk through each stage in detail.

Phase 1: Entry Point Assembly

rt0GOARCH_GOOS

When the OS executes your Go binary, it jumps to a platform-specific entry point. The Go runtime provides assembly functions for each architecture/OS combination:

_rt0_amd64_linux on 64-bit Linux/x86
_rt0_amd64_darwin on macOS
_rt0_arm64_linux on 64-bit ARM Linux

These are defined in runtime/asm_* files in the Go source. For example, on amd64/Linux:

// runtime/asm_amd64.s (simplified)
TEXT ·_rt0_amd64_linux(SB),NOSPLIT|NOFRAME,$0
    MOVQ    0(SP), DI       // argc into DI
    MOVQ    8(SP), SI       // argv into SI
    JMP     ·_rt0_amd64(SB) // jump to architecture-specific entry

This assembly saves command-line arguments and jumps to _rt0_amd64, which is defined in asm_amd64.s and performs early setup before entering runtime.rt0_go.

The entry point function receives:

DI (x86-64 register): argc (argument count)
SI (x86-64 register): argv (argument array pointer)
Stack: Already set up by the OS

No Go code runs at this point—we're in pure assembly, before the Go runtime exists.

Transition to rt0_go

The architecture-specific entry points eventually call runtime.rt0_go, a function defined in asm_* files that bridges assembly and Go code:

// runtime/asm_amd64.s (simplified)
TEXT ·rt0_go(SB),NOSPLIT|NOFRAME,$0
    // DI = argc, SI = argv
    // ... register setup ...

    // Jump to Go function (rt0_go_GOOS defined in Go)
    CALL    runtime·rt0_go_linux(SB)

    // Should never return
    JMP     ·_rt0_amd64_wait(SB)

Actually, the transition is more complex. Let me explain the real flow:

The assembly rt0_go function performs critical setup:

Initialize m0 (machine/OS thread):

// Allocate and initialize the bootstrap machine
// This is the OS thread running the program

Initialize g0 (bootstrap goroutine):

// Create the bootstrap goroutine
// g0 runs the scheduler itself

Set up stack and TLS (Thread-Local Storage):

// Set up the initial stack for the bootstrap goroutine
// Initialize TLS so runtime.getg() returns g0

Call into Go code:

// After setup, jump to Go runtime functions
CALL    runtime·args(SB)
CALL    runtime·osinit(SB)
CALL    runtime·schedinit(SB)
// ... etc

Phase 2: Runtime Initialization in rt0_go

After assembly setup, the Go runtime initializes in a specific sequence. This happens in runtime/asm_*.s and runtime/proc.go.

g0 and m0: The Bootstrap Goroutine and Machine

The Go runtime uses two key structures:

g0 (bootstrap goroutine):

A special goroutine that runs the scheduler itself
Stack allocated by the runtime (not the user)
Runs runtime.schedule(), never returns to user code
One g0 per OS thread (but usually just m0's g0 matters)

m0 (bootstrap OS thread):

The OS thread that executed the program
Represents the actual kernel thread
Every goroutine ultimately executes on some m
The first m0 is special—it's bootstrapped by assembly

Simplified structure:

// runtime/runtime2.go
type m struct {
    g0 *g       // goroutine with scheduling stack
    curg *g     // current goroutine running on this thread
    // ... many other fields ...
    id int64    // machine ID
    // ... condition variables, locks, etc ...
}

type g struct {
    m *m        // which machine is running this g?
    // ... stack pointers, status, etc ...
    sched gobuf // execution state
    // ... many other fields ...
}

The bootstrap process:

OS thread created by OS
    ↓
_rt0 entry point
    ↓
Assembly code allocates m0 in data segment
    ↓
Assembly code allocates g0
    ↓
Assembly code sets TLS: currentG = g0
    ↓
Assembly code sets m0.curg = g0
    ↓
Assembly code sets up g0.m = m0
    ↓
Now runtime.getg() returns g0
    ↓
Assembly jumps to Go code (runtime.args, etc)

m0 and g0 are usually allocated in the runtime's data segment, not dynamically allocated. This is necessary because allocation depends on the memory allocator, which hasn't initialized yet.

CPU Feature Detection: runtime.cpuinit

The cpuinit function (in runtime/cpu_*.go) detects CPU capabilities:

// runtime/cpu_x86.go (simplified)
func cpuinit() {
    // Detect SSE, SSE2, AVX, AVX2, etc.
    // Set flags: cpu.hasSSE, cpu.hasAVX, etc.

    // Use CPUID instruction to query CPU
    eax, ebx, ecx, edx := cpuid(1) // CPUID leaf 1

    // Check feature bits
    if edx&(1<<25) != 0 {  // SSE support
        cpu.hasSSE = true
    }
    if ecx&(1<<28) != 0 {  // AVX support
        cpu.hasAVX = true
    }
    // ... more checks ...
}

These flags are used by:

Crypto routines (SHA-NI, AES-NI)
String operations (SSE optimized)
Floating-point code
Runtime itself

The CPU feature detection adds minimal startup overhead (microseconds).

Processing Arguments: runtime.args

The args function saves command-line arguments and environment variables:

// runtime/runtime.go
func args(c int, v **byte) {
    argc = c
    argv = v

    // Scan environment to find GODEBUG, GOGC, etc.
    for i := uintptr(0); i < argc; i++ {
        // ... process environment ...
    }
}

Arguments are parsed at this stage for runtime configuration:

GODEBUG: Debugging flags
GOGC: GC trigger percentage
GOMAXPROCS: Max CPU cores to use (overrideable later)

Environment variable parsing adds negligible overhead for typical programs.

OS-Specific Initialization: runtime.osinit

The osinit function performs operating system-specific setup:

// runtime/os_linux.go (simplified)
func osinit() {
    // Detect number of CPUs
    ncpu := int32(sysconf(_SC_NPROCESSORS_ONLN))

    // Detect page size
    // pagesize = sysconf(_SC_PAGE_SIZE)

    // Set up signal handlers (for preemption, etc.)
    // Install SIGCHLD, SIGURG, etc.

    // Initialize timers
}

On Linux, this calls several sysconf syscalls:

_SC_NPROCESSORS_ONLN: number of online CPUs
_SC_PAGE_SIZE: memory page size (usually 4KB)

These syscalls add minimal overhead (microseconds each).

Signal handlers are installed here to enable:

Goroutine preemption (SIGURG)
Profiling interrupts (SIGPROF)
Stack trace signals (SIGQUIT)

Phase 3: Scheduler Initialization: runtime.schedinit

This is the heavyweight initialization. schedinit sets up the entire scheduler:

// runtime/proc.go (simplified)
func schedinit() {
    // Determine GOMAXPROCS
    procs := runtime.GOMAXPROCS(-1)  // current value
    if env := sys.Getenv("GOMAXPROCS"); env != "" {
        // Override with environment variable
    }

    // Initialize memory allocator
    mallocinit()

    // Initialize garbage collector
    gcInit()

    // Create P's (processors)
    // One P per allowed CPU core
    allp = make([]*p, 0, procs)
    for i := 0; i < procs; i++ {
        pp := allocP()
        allp = append(allp, pp)
    }

    // Associate m0 with its P
    m0.p = allp[0]

    // Other initialization...
}

Key initialization:

Memory Allocator: mallocinit()

The Go memory allocator is initialized. This is significant because:

Heap is divided into chunks
Free lists are established
Span allocator is set up
Cache per P is initialized

func mallocinit() {
    // Allocate heap metadata
    // Initialize mspan allocator
    // Set up per-P caches (mcache)
    // For each P, create an mcache (allocation cache)

    for _, pp := range allp {
        pp.mcache = allocMCache()
    }

    // Initialize arena metadata
}

This happens once, early in startup. The allocator is lightweight but non-trivial.

Garbage Collector: gcInit()

GC metadata is initialized:

func gcInit() {
    // Initialize write barrier state
    // Set up GC worker pools
    // Initialize mark queue
    // Read GOGC environment variable

    if gogc := sys.Getenv("GOGC"); gogc != "" {
        // Parse GOGC percentage
    }
}

The GC is in "off" state initially. It starts running after main.main() actually begins (see below).

Processor (P) Creation

Each P represents a logical processor:

type p struct {
    id       int32       // processor ID (0, 1, 2, ...)
    status   uint32      // Pidle, Prunnable, Prunning, Psyscall, ...
    m        *m          // running machine
    mcache   *mcache     // allocation cache
    runq     runqueuee   // queue of runnable goroutines
    // ... many other fields ...
}

One P is created per allowed CPU core:

GOMAXPROCS=4
    ↓
allp = [P0, P1, P2, P3]
    ↓
Each P owns:
  - mcache (allocation cache)
  - runq (runnable goroutine queue)
  - defers (deferred function stack)

The number of P's can be changed with runtime.GOMAXPROCS() at runtime, but this happens in bootstrap.

Phase 4: Creating the Main Goroutine

After scheduler initialization, the runtime creates a goroutine for runtime.main:

// In runtime/asm_*.s
CALL    runtime·newproc(SB)  // newproc(funcPC(runtime.main))

This is handled by runtime.newproc:

// runtime/proc.go
func newproc(fn *funcval) {
    // Create a new g
    gp := malg(stackMin)  // allocate new goroutine

    gp.startpc = fn.fn    // entry point: runtime.main

    // Create task to run this goroutine
    systemstack(func() {
        // Queue the goroutine on scheduler
        runqput(gp)  // add to runnable queue
    })
}

The main goroutine (let's call it g_main) is created but doesn't run yet. It's queued in the scheduler.

Phase 5: Start the Event Loop: runtime.mstart

After creating the main goroutine, the bootstrap machine (m0) enters the scheduling loop:

// runtime/asm_*.s
CALL    runtime·mstart(SB)

This transfers control to Go's scheduler:

// runtime/proc.go
func mstart() {
    gp := getg()  // current g (g0)
    gp.m.mstartfn = nil

    // Enter scheduling loop (never returns)
    schedule()
}

func schedule() {
    gp := getg().m.curg

    for {
        // Find next runnable goroutine
        gp = findRunnable()  // might return g_main

        // Execute it
        execute(gp)  // switch to gp
    }
}

The schedule() function loops forever:

Finds a runnable goroutine
Switches to it (context switch)
That goroutine runs until it blocks or yields
Control returns to schedule()

When schedule() first runs, it finds the main goroutine and switches to it.

Phase 6: runtime.main — Inside the Scheduler

Now the main goroutine is running. This executes runtime.main (not your main.main):

// runtime/proc.go
func main() {
    // The main goroutine is now running inside the scheduler

    // 1. Start the system monitor goroutine
    systemstack(func() {
        newproc1(funcPC(sysmon), nil, nil, m0, 0)
    })

    // 2. Runtime locks (formerly in main, now concurrent)
    runtime_inittask()  // initialize runtime locks

    // 3. Run runtime.init() functions
    // These are in the runtime package itself
    runtime_init()

    // 4. Start GC background workers
    gcinit()

    // 5. Run main.init() functions (all packages)
    // This is the Go code equivalent of C's __libc_csu_init
    main.init()

    // 6. Finally, run user main()
    main.main()

    // 7. Handle exit
    exit(0)
}

Let's examine each step:

Start the System Monitor: sysmon

The system monitor runs in a separate goroutine:

// runtime/proc.go
func sysmon() {
    // Runs continuously in the background

    for {
        // Check for long-running goroutines (preemption)
        // Mark goroutines that have run too long as preemptible

        // Check for deadlocks
        // If no progress has been made, trigger panic

        // Wake up for GC if needed
        // Force GC if too much time has passed

        // Service the network poller
        // Process async network I/O

        // Adjust sleep/wake scheduling

        sleep(10 * 1e6)  // ~10ms sleep
    }
}

sysmon is a daemon goroutine (runs forever), and it's responsible for:

Preemption: After ~10ms, mark running goroutines as preemptible
- This prevents infinite loops from starving other goroutines
- Uses signal-based preemption (SIGURG)
Deadlock detection: If there are runnable goroutines but nothing is running, panic
- Catches programmer errors (all goroutines deadlocked)
GC forcing: If GOGC is set to 100 and no GC progress, trigger one
- Prevents memory from growing unbounded
Network polling: Service async I/O operations
- Wake up goroutines waiting for network events
Adjust timers: Reschedule timers based on elapsed time

sysmon adds minimal overhead (wakes up ~100 times per second by default).

runtime_inittask and runtime_init

After sysmon starts, the runtime initializes its own package:

func runtime_init() {
    // Initialize runtime-internal goroutines
    // Set up background workers
    // Initialize OS/architecture-specific subsystems

    // For example, on Linux:
    // - Initialize epoll for network I/O
    // - Set up timer wheels
    // - Initialize cgroup support
}

This is lightweight for most programs.

GC Initialization: gcinit()

After runtime.init(), the garbage collector is fully initialized:

func gcinit() {
    // Create background GC workers
    // These are separate goroutines that assist with GC

    // For each P:
    for _, pp := range allp {
        // Create GC worker goroutine
        go gcWorker(pp)
    }
}

GC worker goroutines start but don't run yet (not runnable). They're woken up during GC phases.

main.init(): Package Initialization Functions

All init() functions in your program run here:

main.init() calls:
    ↓
init all imported packages
    ↓
For each package (in dependency order):
    - Call init() functions (in file declaration order)
    - Call const/var initializers
    ↓
Call main package's init() functions
    ↓
Return to runtime.main

The order is topological—if package A imports package B, package B's init() runs first.

Multiple init() functions per package run in declaration order:

package main

func init() { fmt.Println("init 1") }
func init() { fmt.Println("init 2") }

// Output: init 1, then init 2

init() functions are synchronous—main() blocks until all complete.

This is where startup overhead often accumulates:

// BAD: expensive init()
func init() {
    // Database connection pools
    db = initDatabase()  // blocks startup

    // Load large config files
    config = loadLargeConfig()

    // Warm up caches
    cache.WarmUp()  // delays main() start
}

// GOOD: lazy initialization
var db *sql.DB

func getDB() *sql.DB {
    if db == nil {
        db = initDatabase()  // first-call initialization
    }
    return db
}

func init() {
    // Minimal setup
    // Defer expensive operations to first use
}

Call main.main()

Finally, your actual main function runs:

// Your code
func main() {
    fmt.Println("Hello, World!")
}

At this point:

Scheduler is running
GC is active
All goroutines and channels work
System monitor is running
All packages are initialized

Normal Go execution begins.

Phase 7: Program Termination

When main.main() returns, execution continues in runtime.main:

func main() {
    // ... your code ...

    // main.main() returns here
    exit(0)
}

func exit(code int) {
    // Clean up
    // Flush output
    // Call deferred functions? NO! They don't run

    // Exit OS process
    exitProcess(int32(code))
}

Important: Deferred functions do NOT run after main.main() returns. Only defers in main() itself run:

func main() {
    defer fmt.Println("This runs")
    defer fmt.Println("This also runs")

    return  // or implicit return at end of main()
}

func helper() {
    defer fmt.Println("This DOES NOT run")
    return
}
// Output:
// This also runs
// This runs

But os.Exit() prevents even main's defers:

func main() {
    defer fmt.Println("This DOES NOT run with os.Exit")

    os.Exit(0)  // immediate termination
}

Environment Variables and Runtime Configuration

During bootstrap, several environment variables are processed:

GODEBUG

Controls debugging output and behavior:

GODEBUG=gctrace=1 ./program

Common values:

gctrace=1: Print GC statistics after each collection
madvdontneed=1: Return memory to OS (Linux)
asyncpreemptoff=1: Disable async preemption
panicOnFault=1: Panic on memory faults (debugging)

GOGC

Controls garbage collection trigger:

GOGC=75 ./program

Meaning: trigger GC when heap grows 75% from last collection.

GOGC=100 (default): trigger GC at 100% growth
GOGC=50: trigger at 50% growth (more frequent, less latency)
GOGC=200: trigger at 200% growth (less frequent, better throughput)
GOGC=off: disable GC (dangerous, but useful for benchmarking)

GOMAXPROCS

Maximum number of CPU cores to use:

GOMAXPROCS=4 ./program

Can also be set in code:

runtime.GOMAXPROCS(4)

The command-line value is read in bootstrap, but can be changed at any time.

Other Variables

GODOMAIN: DNS resolution settings
GOHOSTOS, GOHOSTARCH: Build system info
GOROOT: Path to Go installation
GOPATH: Workspace path (affects module loading)
GOFLAGS: Compiler and tool flags

Most of these don't affect runtime bootstrap. GODEBUG and GOGC are processed during initialization.

Startup Performance Analysis

Measuring Startup Time

To measure startup overhead:

# Measure total startup time
time ./program

# Measure to first user code
time ./program_with_measurement_in_main

# Profile startup
go run -cpuprofile=cpu.prof main.go
go tool pprof cpu.prof

Reducing Startup Time

Keep init() functions lightweight

// BAD
func init() {
    // 1MB database loaded
    data = loadDatabase()
}

// GOOD
var data []byte
func getData() []byte {
    if data == nil {
        data = loadDatabase()
    }
    return data
}

Reduce import chains

Every imported package's init() runs. Large dependency chains slow startup:

# Check dependencies
go list -deps ./... | wc -l

# Minimize imports
# Move heavy dependencies to optional/lazy loading

Use build tags for optional features

//go:build full
// +build full

func init() {
    // This only compiles/runs if built with `-tags full`
    loadAllData()
}

Defer goroutine startup

// BAD: blocks startup
func init() {
    runBackgroundWorker()  // waits for it to start
}

// GOOD: start background work later
func main() {
    go runBackgroundWorker()  // non-blocking
}

Avoid global allocations

// BAD: heap allocation in startup
var cache = initLargeCache()

// GOOD: lazy allocation
var cache *Cache

func getCache() *Cache {
    if cache == nil {
        cache = initLargeCache()
    }
    return cache
}

Startup Overhead Breakdown (Typical Program)

OS startup + runtime setup:        ~1 ms
cpu.init():                        ~0.1 ms
os.init():                         ~0.5 ms
Memory allocator init:             ~0.5 ms
GC init:                           ~0.2 ms
P/M initialization:                ~0.1 ms
Package init() functions:          ~5-50 ms  (depends on imports)
GC workers startup:                ~0.1 ms
sysmon startup:                    ~0.01 ms
                                   --------
Total:                             ~8-60 ms

(Varies widely based on imports and init() functions)

Programs with minimal imports and lightweight init() can start in under 5ms. Programs with heavy dependencies (ORMs, HTTP clients, etc.) commonly take 50-200ms.

Advanced: Visualizing Bootstrap

Go provides tools to measure bootstrap phases:

GODEBUG=gctrace

Shows GC activity:

GODEBUG=gctrace=1 ./program 2>&1 | head

Output shows when GC starts, how long it runs, memory stats.

pprof Startup Profiling

Profile the startup sequence:

package main

import (
    "runtime/pprof"
    "os"
)

func init() {
    f, _ := os.Create("startup.prof")
    pprof.StartCPUProfile(f)

    // Profile startup...
    // Will stop when pprof finishes
}

func main() {
    pprof.StopCPUProfile()
    // Your code
}

Then analyze:

go tool pprof startup.prof

Tracing

Go's execution tracer shows goroutine scheduling:

GOEXPERIMENT=gctrace ./program -trace=trace.out
go tool trace trace.out

Visualizes when goroutines start, stop, and yield control.

Complete Bootstrap Timeline

Here's a concrete timeline for a simple program:

T+0ms:    OS executes binary
T+0.1ms:  _rt0_amd64_linux entry point
T+0.2ms:  rt0_go: setup g0, m0, stack
T+0.3ms:  CPU feature detection (cpuinit)
T+0.4ms:  Process arguments (args)
T+0.5ms:  OS detection (osinit)
T+1.0ms:  Scheduler init (schedinit)
T+1.1ms:    - malloc init
T+1.2ms:    - GC init
T+1.3ms:    - Create P's
T+2.0ms:  Create main goroutine
T+2.1ms:  Enter schedule() loop
T+2.2ms:  Start sysmon goroutine
T+2.3ms:  runtime.init() (minimal)
T+2.4ms:  main.init() (package initializers)
T+2.5ms:  (example) Load config file
T+7.5ms:  (example) Connect to database
T+8.0ms:  Call main.main()
T+8.1ms:  User code executes
T+8.2ms+: Normal program execution

The exact timeline depends on what your program does in init().

Key Insights for Performance

init() functions block the entire program
- Defer expensive operations
- Use lazy initialization
imports affect startup time
- Fewer imports = faster startup
- Each package's init() runs
The scheduler is sophisticated
- But it has overhead
- Minimal time to reach main(), but present
Memory allocation is initialized early
- No allocation happens during bootstrap setup
- Once mallocinit() completes, you can allocate
GC is ready but inactive initially
- First GC happens when heap exceeds GOGC threshold
- Avoid excessive allocation in init()
Goroutines are fully functional in main()
- Create goroutines, channels, etc. freely
- Scheduler handles them correctly

Summary

The Go runtime's bootstrap sequence is a marvel of engineering:

Assembly entry point
    ↓
Platform-specific setup (g0, m0, stack)
    ↓
Runtime detection and initialization
    ↓
Scheduler and memory allocator setup
    ↓
Main goroutine creation
    ↓
Enter event loop
    ↓
System monitor activation
    ↓
Package initialization in dependency order
    ↓
User main() execution

Understanding this sequence helps you write efficient code:

Keep init() functions lightweight
Defer expensive operations
Minimize import chains
Use lazy initialization

The runtime's sophisticated machinery is transparent, but understanding it reveals where optimization opportunities lie and why certain patterns matter for startup performance.

Runtime Bootstrap — What Happens Before main()

On this page