Runtime Bootstrap — What Happens Before main()
The complete Go program startup sequence — from OS process creation through runtime initialization, scheduler setup, and the journey to your main() function.
Introduction
Before your main() function executes, the Go runtime has accomplished extraordinary work: initialized the garbage collector, started the scheduler, created goroutines, detected CPU features, and prepared the entire execution environment. Understanding this bootstrap sequence reveals where startup overhead comes from and how to optimize it.
The journey from ./program execution to your first line of Go code involves assembly code, C functions, and runtime initialization. Most of this is transparent, but understanding the sequence helps you:
- Diagnose slow startup times
- Avoid expensive operations in
init()functions - Understand goroutine lifecycle
- Appreciate the sophisticated machinery running behind the scenes
This article traces the complete startup sequence, examining key data structures and initialization steps.
The Bootstrap Journey: High-Level Overview
OS executes program binary
↓
_rt0_{'GOARCH'}_{'GOOS'} (assembly entry point)
↓
runtime.rt0_go (assembly + runtime setup)
↓
[Set up g0 and m0]
↓
[Detect CPU features]
↓
[Initialize command-line args]
↓
[OS-specific initialization]
↓
[Scheduler initialization]
↓
[Create main goroutine]
↓
[Start event loop (schedule)]
↓
runtime.main (real main, in scheduler)
↓
[Start system monitor goroutine]
↓
[Initialize runtime locks]
↓
[Run runtime.init()]
↓
[Start GC workers]
↓
[Run all package init() functions]
↓
[Call user main()]
↓
[Exit]Let's walk through each stage in detail.
Phase 1: Entry Point Assembly
rt0GOARCH_GOOS
When the OS executes your Go binary, it jumps to a platform-specific entry point. The Go runtime provides assembly functions for each architecture/OS combination:
_rt0_amd64_linuxon 64-bit Linux/x86_rt0_amd64_darwinon macOS_rt0_arm64_linuxon 64-bit ARM Linux
These are defined in runtime/asm_* files in the Go source. For example, on amd64/Linux:
// runtime/asm_amd64.s (simplified)
TEXT ·_rt0_amd64_linux(SB),NOSPLIT|NOFRAME,$0
MOVQ 0(SP), DI // argc into DI
MOVQ 8(SP), SI // argv into SI
JMP ·_rt0_amd64(SB) // jump to architecture-specific entryThis assembly saves command-line arguments and jumps to _rt0_amd64, which is defined in asm_amd64.s and performs early setup before entering runtime.rt0_go.
The entry point function receives:
- DI (x86-64 register):
argc(argument count) - SI (x86-64 register):
argv(argument array pointer) - Stack: Already set up by the OS
No Go code runs at this point—we're in pure assembly, before the Go runtime exists.
Transition to rt0_go
The architecture-specific entry points eventually call runtime.rt0_go, a function defined in asm_* files that bridges assembly and Go code:
// runtime/asm_amd64.s (simplified)
TEXT ·rt0_go(SB),NOSPLIT|NOFRAME,$0
// DI = argc, SI = argv
// ... register setup ...
// Jump to Go function (rt0_go_GOOS defined in Go)
CALL runtime·rt0_go_linux(SB)
// Should never return
JMP ·_rt0_amd64_wait(SB)Actually, the transition is more complex. Let me explain the real flow:
The assembly rt0_go function performs critical setup:
-
Initialize m0 (machine/OS thread):
// Allocate and initialize the bootstrap machine // This is the OS thread running the program -
Initialize g0 (bootstrap goroutine):
// Create the bootstrap goroutine // g0 runs the scheduler itself -
Set up stack and TLS (Thread-Local Storage):
// Set up the initial stack for the bootstrap goroutine // Initialize TLS so runtime.getg() returns g0 -
Call into Go code:
// After setup, jump to Go runtime functions CALL runtime·args(SB) CALL runtime·osinit(SB) CALL runtime·schedinit(SB) // ... etc
Phase 2: Runtime Initialization in rt0_go
After assembly setup, the Go runtime initializes in a specific sequence. This happens in runtime/asm_*.s and runtime/proc.go.
g0 and m0: The Bootstrap Goroutine and Machine
The Go runtime uses two key structures:
g0 (bootstrap goroutine):
- A special goroutine that runs the scheduler itself
- Stack allocated by the runtime (not the user)
- Runs
runtime.schedule(), never returns to user code - One g0 per OS thread (but usually just m0's g0 matters)
m0 (bootstrap OS thread):
- The OS thread that executed the program
- Represents the actual kernel thread
- Every goroutine ultimately executes on some m
- The first m0 is special—it's bootstrapped by assembly
Simplified structure:
// runtime/runtime2.go
type m struct {
g0 *g // goroutine with scheduling stack
curg *g // current goroutine running on this thread
// ... many other fields ...
id int64 // machine ID
// ... condition variables, locks, etc ...
}
type g struct {
m *m // which machine is running this g?
// ... stack pointers, status, etc ...
sched gobuf // execution state
// ... many other fields ...
}The bootstrap process:
OS thread created by OS
↓
_rt0 entry point
↓
Assembly code allocates m0 in data segment
↓
Assembly code allocates g0
↓
Assembly code sets TLS: currentG = g0
↓
Assembly code sets m0.curg = g0
↓
Assembly code sets up g0.m = m0
↓
Now runtime.getg() returns g0
↓
Assembly jumps to Go code (runtime.args, etc)m0 and g0 are usually allocated in the runtime's data segment, not dynamically allocated. This is necessary because allocation depends on the memory allocator, which hasn't initialized yet.
CPU Feature Detection: runtime.cpuinit
The cpuinit function (in runtime/cpu_*.go) detects CPU capabilities:
// runtime/cpu_x86.go (simplified)
func cpuinit() {
// Detect SSE, SSE2, AVX, AVX2, etc.
// Set flags: cpu.hasSSE, cpu.hasAVX, etc.
// Use CPUID instruction to query CPU
eax, ebx, ecx, edx := cpuid(1) // CPUID leaf 1
// Check feature bits
if edx&(1<<25) != 0 { // SSE support
cpu.hasSSE = true
}
if ecx&(1<<28) != 0 { // AVX support
cpu.hasAVX = true
}
// ... more checks ...
}These flags are used by:
- Crypto routines (SHA-NI, AES-NI)
- String operations (SSE optimized)
- Floating-point code
- Runtime itself
The CPU feature detection adds minimal startup overhead (microseconds).
Processing Arguments: runtime.args
The args function saves command-line arguments and environment variables:
// runtime/runtime.go
func args(c int, v **byte) {
argc = c
argv = v
// Scan environment to find GODEBUG, GOGC, etc.
for i := uintptr(0); i < argc; i++ {
// ... process environment ...
}
}Arguments are parsed at this stage for runtime configuration:
GODEBUG: Debugging flagsGOGC: GC trigger percentageGOMAXPROCS: Max CPU cores to use (overrideable later)
Environment variable parsing adds negligible overhead for typical programs.
OS-Specific Initialization: runtime.osinit
The osinit function performs operating system-specific setup:
// runtime/os_linux.go (simplified)
func osinit() {
// Detect number of CPUs
ncpu := int32(sysconf(_SC_NPROCESSORS_ONLN))
// Detect page size
// pagesize = sysconf(_SC_PAGE_SIZE)
// Set up signal handlers (for preemption, etc.)
// Install SIGCHLD, SIGURG, etc.
// Initialize timers
}On Linux, this calls several sysconf syscalls:
_SC_NPROCESSORS_ONLN: number of online CPUs_SC_PAGE_SIZE: memory page size (usually 4KB)
These syscalls add minimal overhead (microseconds each).
Signal handlers are installed here to enable:
- Goroutine preemption (SIGURG)
- Profiling interrupts (SIGPROF)
- Stack trace signals (SIGQUIT)
Phase 3: Scheduler Initialization: runtime.schedinit
This is the heavyweight initialization. schedinit sets up the entire scheduler:
// runtime/proc.go (simplified)
func schedinit() {
// Determine GOMAXPROCS
procs := runtime.GOMAXPROCS(-1) // current value
if env := sys.Getenv("GOMAXPROCS"); env != "" {
// Override with environment variable
}
// Initialize memory allocator
mallocinit()
// Initialize garbage collector
gcInit()
// Create P's (processors)
// One P per allowed CPU core
allp = make([]*p, 0, procs)
for i := 0; i < procs; i++ {
pp := allocP()
allp = append(allp, pp)
}
// Associate m0 with its P
m0.p = allp[0]
// Other initialization...
}Key initialization:
Memory Allocator: mallocinit()
The Go memory allocator is initialized. This is significant because:
- Heap is divided into chunks
- Free lists are established
- Span allocator is set up
- Cache per P is initialized
func mallocinit() {
// Allocate heap metadata
// Initialize mspan allocator
// Set up per-P caches (mcache)
// For each P, create an mcache (allocation cache)
for _, pp := range allp {
pp.mcache = allocMCache()
}
// Initialize arena metadata
}This happens once, early in startup. The allocator is lightweight but non-trivial.
Garbage Collector: gcInit()
GC metadata is initialized:
func gcInit() {
// Initialize write barrier state
// Set up GC worker pools
// Initialize mark queue
// Read GOGC environment variable
if gogc := sys.Getenv("GOGC"); gogc != "" {
// Parse GOGC percentage
}
}The GC is in "off" state initially. It starts running after main.main() actually begins (see below).
Processor (P) Creation
Each P represents a logical processor:
type p struct {
id int32 // processor ID (0, 1, 2, ...)
status uint32 // Pidle, Prunnable, Prunning, Psyscall, ...
m *m // running machine
mcache *mcache // allocation cache
runq runqueuee // queue of runnable goroutines
// ... many other fields ...
}One P is created per allowed CPU core:
GOMAXPROCS=4
↓
allp = [P0, P1, P2, P3]
↓
Each P owns:
- mcache (allocation cache)
- runq (runnable goroutine queue)
- defers (deferred function stack)The number of P's can be changed with runtime.GOMAXPROCS() at runtime, but this happens in bootstrap.
Phase 4: Creating the Main Goroutine
After scheduler initialization, the runtime creates a goroutine for runtime.main:
// In runtime/asm_*.s
CALL runtime·newproc(SB) // newproc(funcPC(runtime.main))This is handled by runtime.newproc:
// runtime/proc.go
func newproc(fn *funcval) {
// Create a new g
gp := malg(stackMin) // allocate new goroutine
gp.startpc = fn.fn // entry point: runtime.main
// Create task to run this goroutine
systemstack(func() {
// Queue the goroutine on scheduler
runqput(gp) // add to runnable queue
})
}The main goroutine (let's call it g_main) is created but doesn't run yet. It's queued in the scheduler.
Phase 5: Start the Event Loop: runtime.mstart
After creating the main goroutine, the bootstrap machine (m0) enters the scheduling loop:
// runtime/asm_*.s
CALL runtime·mstart(SB)This transfers control to Go's scheduler:
// runtime/proc.go
func mstart() {
gp := getg() // current g (g0)
gp.m.mstartfn = nil
// Enter scheduling loop (never returns)
schedule()
}
func schedule() {
gp := getg().m.curg
for {
// Find next runnable goroutine
gp = findRunnable() // might return g_main
// Execute it
execute(gp) // switch to gp
}
}The schedule() function loops forever:
- Finds a runnable goroutine
- Switches to it (context switch)
- That goroutine runs until it blocks or yields
- Control returns to schedule()
When schedule() first runs, it finds the main goroutine and switches to it.
Phase 6: runtime.main — Inside the Scheduler
Now the main goroutine is running. This executes runtime.main (not your main.main):
// runtime/proc.go
func main() {
// The main goroutine is now running inside the scheduler
// 1. Start the system monitor goroutine
systemstack(func() {
newproc1(funcPC(sysmon), nil, nil, m0, 0)
})
// 2. Runtime locks (formerly in main, now concurrent)
runtime_inittask() // initialize runtime locks
// 3. Run runtime.init() functions
// These are in the runtime package itself
runtime_init()
// 4. Start GC background workers
gcinit()
// 5. Run main.init() functions (all packages)
// This is the Go code equivalent of C's __libc_csu_init
main.init()
// 6. Finally, run user main()
main.main()
// 7. Handle exit
exit(0)
}Let's examine each step:
Start the System Monitor: sysmon
The system monitor runs in a separate goroutine:
// runtime/proc.go
func sysmon() {
// Runs continuously in the background
for {
// Check for long-running goroutines (preemption)
// Mark goroutines that have run too long as preemptible
// Check for deadlocks
// If no progress has been made, trigger panic
// Wake up for GC if needed
// Force GC if too much time has passed
// Service the network poller
// Process async network I/O
// Adjust sleep/wake scheduling
sleep(10 * 1e6) // ~10ms sleep
}
}sysmon is a daemon goroutine (runs forever), and it's responsible for:
-
Preemption: After ~10ms, mark running goroutines as preemptible
- This prevents infinite loops from starving other goroutines
- Uses signal-based preemption (SIGURG)
-
Deadlock detection: If there are runnable goroutines but nothing is running, panic
- Catches programmer errors (all goroutines deadlocked)
-
GC forcing: If GOGC is set to 100 and no GC progress, trigger one
- Prevents memory from growing unbounded
-
Network polling: Service async I/O operations
- Wake up goroutines waiting for network events
-
Adjust timers: Reschedule timers based on elapsed time
sysmon adds minimal overhead (wakes up ~100 times per second by default).
runtime_inittask and runtime_init
After sysmon starts, the runtime initializes its own package:
func runtime_init() {
// Initialize runtime-internal goroutines
// Set up background workers
// Initialize OS/architecture-specific subsystems
// For example, on Linux:
// - Initialize epoll for network I/O
// - Set up timer wheels
// - Initialize cgroup support
}This is lightweight for most programs.
GC Initialization: gcinit()
After runtime.init(), the garbage collector is fully initialized:
func gcinit() {
// Create background GC workers
// These are separate goroutines that assist with GC
// For each P:
for _, pp := range allp {
// Create GC worker goroutine
go gcWorker(pp)
}
}GC worker goroutines start but don't run yet (not runnable). They're woken up during GC phases.
main.init(): Package Initialization Functions
All init() functions in your program run here:
main.init() calls:
↓
init all imported packages
↓
For each package (in dependency order):
- Call init() functions (in file declaration order)
- Call const/var initializers
↓
Call main package's init() functions
↓
Return to runtime.mainThe order is topological—if package A imports package B, package B's init() runs first.
Multiple init() functions per package run in declaration order:
package main
func init() { fmt.Println("init 1") }
func init() { fmt.Println("init 2") }
// Output: init 1, then init 2init() functions are synchronous—main() blocks until all complete.
This is where startup overhead often accumulates:
// BAD: expensive init()
func init() {
// Database connection pools
db = initDatabase() // blocks startup
// Load large config files
config = loadLargeConfig()
// Warm up caches
cache.WarmUp() // delays main() start
}
// GOOD: lazy initialization
var db *sql.DB
func getDB() *sql.DB {
if db == nil {
db = initDatabase() // first-call initialization
}
return db
}
func init() {
// Minimal setup
// Defer expensive operations to first use
}Call main.main()
Finally, your actual main function runs:
// Your code
func main() {
fmt.Println("Hello, World!")
}At this point:
- Scheduler is running
- GC is active
- All goroutines and channels work
- System monitor is running
- All packages are initialized
Normal Go execution begins.
Phase 7: Program Termination
When main.main() returns, execution continues in runtime.main:
func main() {
// ... your code ...
// main.main() returns here
exit(0)
}
func exit(code int) {
// Clean up
// Flush output
// Call deferred functions? NO! They don't run
// Exit OS process
exitProcess(int32(code))
}Important: Deferred functions do NOT run after main.main() returns. Only defers in main() itself run:
func main() {
defer fmt.Println("This runs")
defer fmt.Println("This also runs")
return // or implicit return at end of main()
}
func helper() {
defer fmt.Println("This DOES NOT run")
return
}
// Output:
// This also runs
// This runsBut os.Exit() prevents even main's defers:
func main() {
defer fmt.Println("This DOES NOT run with os.Exit")
os.Exit(0) // immediate termination
}Environment Variables and Runtime Configuration
During bootstrap, several environment variables are processed:
GODEBUG
Controls debugging output and behavior:
GODEBUG=gctrace=1 ./programCommon values:
gctrace=1: Print GC statistics after each collectionmadvdontneed=1: Return memory to OS (Linux)asyncpreemptoff=1: Disable async preemptionpanicOnFault=1: Panic on memory faults (debugging)
GOGC
Controls garbage collection trigger:
GOGC=75 ./programMeaning: trigger GC when heap grows 75% from last collection.
GOGC=100(default): trigger GC at 100% growthGOGC=50: trigger at 50% growth (more frequent, less latency)GOGC=200: trigger at 200% growth (less frequent, better throughput)GOGC=off: disable GC (dangerous, but useful for benchmarking)
GOMAXPROCS
Maximum number of CPU cores to use:
GOMAXPROCS=4 ./programCan also be set in code:
runtime.GOMAXPROCS(4)The command-line value is read in bootstrap, but can be changed at any time.
Other Variables
GODOMAIN: DNS resolution settingsGOHOSTOS,GOHOSTARCH: Build system infoGOROOT: Path to Go installationGOPATH: Workspace path (affects module loading)GOFLAGS: Compiler and tool flags
Most of these don't affect runtime bootstrap. GODEBUG and GOGC are processed during initialization.
Startup Performance Analysis
Measuring Startup Time
To measure startup overhead:
# Measure total startup time
time ./program
# Measure to first user code
time ./program_with_measurement_in_main
# Profile startup
go run -cpuprofile=cpu.prof main.go
go tool pprof cpu.profReducing Startup Time
- Keep init() functions lightweight
// BAD
func init() {
// 1MB database loaded
data = loadDatabase()
}
// GOOD
var data []byte
func getData() []byte {
if data == nil {
data = loadDatabase()
}
return data
}- Reduce import chains
Every imported package's init() runs. Large dependency chains slow startup:
# Check dependencies
go list -deps ./... | wc -l
# Minimize imports
# Move heavy dependencies to optional/lazy loading- Use build tags for optional features
//go:build full
// +build full
func init() {
// This only compiles/runs if built with `-tags full`
loadAllData()
}- Defer goroutine startup
// BAD: blocks startup
func init() {
runBackgroundWorker() // waits for it to start
}
// GOOD: start background work later
func main() {
go runBackgroundWorker() // non-blocking
}- Avoid global allocations
// BAD: heap allocation in startup
var cache = initLargeCache()
// GOOD: lazy allocation
var cache *Cache
func getCache() *Cache {
if cache == nil {
cache = initLargeCache()
}
return cache
}Startup Overhead Breakdown (Typical Program)
OS startup + runtime setup: ~1 ms
cpu.init(): ~0.1 ms
os.init(): ~0.5 ms
Memory allocator init: ~0.5 ms
GC init: ~0.2 ms
P/M initialization: ~0.1 ms
Package init() functions: ~5-50 ms (depends on imports)
GC workers startup: ~0.1 ms
sysmon startup: ~0.01 ms
--------
Total: ~8-60 ms
(Varies widely based on imports and init() functions)Programs with minimal imports and lightweight init() can start in under 5ms. Programs with heavy dependencies (ORMs, HTTP clients, etc.) commonly take 50-200ms.
Advanced: Visualizing Bootstrap
Go provides tools to measure bootstrap phases:
GODEBUG=gctrace
Shows GC activity:
GODEBUG=gctrace=1 ./program 2>&1 | headOutput shows when GC starts, how long it runs, memory stats.
pprof Startup Profiling
Profile the startup sequence:
package main
import (
"runtime/pprof"
"os"
)
func init() {
f, _ := os.Create("startup.prof")
pprof.StartCPUProfile(f)
// Profile startup...
// Will stop when pprof finishes
}
func main() {
pprof.StopCPUProfile()
// Your code
}Then analyze:
go tool pprof startup.profTracing
Go's execution tracer shows goroutine scheduling:
GOEXPERIMENT=gctrace ./program -trace=trace.out
go tool trace trace.outVisualizes when goroutines start, stop, and yield control.
Complete Bootstrap Timeline
Here's a concrete timeline for a simple program:
T+0ms: OS executes binary
T+0.1ms: _rt0_amd64_linux entry point
T+0.2ms: rt0_go: setup g0, m0, stack
T+0.3ms: CPU feature detection (cpuinit)
T+0.4ms: Process arguments (args)
T+0.5ms: OS detection (osinit)
T+1.0ms: Scheduler init (schedinit)
T+1.1ms: - malloc init
T+1.2ms: - GC init
T+1.3ms: - Create P's
T+2.0ms: Create main goroutine
T+2.1ms: Enter schedule() loop
T+2.2ms: Start sysmon goroutine
T+2.3ms: runtime.init() (minimal)
T+2.4ms: main.init() (package initializers)
T+2.5ms: (example) Load config file
T+7.5ms: (example) Connect to database
T+8.0ms: Call main.main()
T+8.1ms: User code executes
T+8.2ms+: Normal program executionThe exact timeline depends on what your program does in init().
Key Insights for Performance
-
init() functions block the entire program
- Defer expensive operations
- Use lazy initialization
-
imports affect startup time
- Fewer imports = faster startup
- Each package's init() runs
-
The scheduler is sophisticated
- But it has overhead
- Minimal time to reach main(), but present
-
Memory allocation is initialized early
- No allocation happens during bootstrap setup
- Once mallocinit() completes, you can allocate
-
GC is ready but inactive initially
- First GC happens when heap exceeds GOGC threshold
- Avoid excessive allocation in init()
-
Goroutines are fully functional in main()
- Create goroutines, channels, etc. freely
- Scheduler handles them correctly
Summary
The Go runtime's bootstrap sequence is a marvel of engineering:
Assembly entry point
↓
Platform-specific setup (g0, m0, stack)
↓
Runtime detection and initialization
↓
Scheduler and memory allocator setup
↓
Main goroutine creation
↓
Enter event loop
↓
System monitor activation
↓
Package initialization in dependency order
↓
User main() executionUnderstanding this sequence helps you write efficient code:
- Keep init() functions lightweight
- Defer expensive operations
- Minimize import chains
- Use lazy initialization
The runtime's sophisticated machinery is transparent, but understanding it reveals where optimization opportunities lie and why certain patterns matter for startup performance.