Value Copy Costs

Understanding Go's value model: direct parts, indirect parts, type sizes, and the performance impact of value copying in various scenarios.

Introduction

Go's memory model requires developers to understand the distinction between direct and indirect value parts. When you copy a value in Go, only the direct part gets copied. For some types like slices and maps, this means the underlying data remains shared. For others like large arrays and structs, copying becomes expensive. This article explores how to identify copy costs and optimize them.

Direct Parts vs Indirect Parts

Every Go type falls into one of two categories:

Direct-only types: The entire value is stored inline. Copying copies everything.

Primitive types: bool, int, uint, float64, etc.
Arrays: [N]T — the entire array is the direct part
Structs: all fields are stored inline

Header + Backing types: The direct part is just a header pointing to backing data stored elsewhere (usually the heap).

string: pointer (8 bytes) + length (8 bytes) = 16 bytes total
[]T slice: pointer (8 bytes) + length (8 bytes) + capacity (8 bytes) = 24 bytes total
map[K]V: just a pointer (8 bytes) on 64-bit systems
interface{}: type pointer (8 bytes) + data pointer (8 bytes) = 16 bytes total
chan T: just a pointer (8 bytes)
Function values: just a pointer (8 bytes)

When you copy a slice, you copy only the 24-byte header. The underlying array remains in the same location. When you copy an array, you copy all the bytes.

Type Sizes Reference Table

Understanding the size of types helps predict copy costs:

Type	Size (64-bit)	Category	Notes
`bool`	1 byte	Direct-only
`int8` / `uint8`	1 byte	Direct-only
`int16` / `uint16`	2 bytes	Direct-only
`int32` / `uint32`	4 bytes	Direct-only
`int64` / `uint64`	8 bytes	Direct-only
`int` / `uint`	8 bytes	Direct-only	On 64-bit systems
`float32`	4 bytes	Direct-only
`float64`	8 bytes	Direct-only
`uintptr`	8 bytes	Direct-only
`*T`	8 bytes	Direct-only	Any pointer type
`string`	16 bytes	Header only	Pointer (8) + length (8)
`[]T` slice	24 bytes	Header only	Pointer (8) + length (8) + cap (8)
`interface{}`	16 bytes	Header only	Type pointer (8) + data pointer (8)
`map[K]V`	8 bytes	Header only	Just a pointer
`chan T`	8 bytes	Header only	Just a pointer

The Cost of Copying Large Values

Copying is cheap for small types and expensive for large ones. Let's benchmark:

package main

import (
	"fmt"
	"testing"
)

// Small struct: 4 fields (32 bytes on 64-bit)
type SmallStruct struct {
	A int64
	B int64
	C int64
	D int64
}

// Large struct: 5 fields (40 bytes on 64-bit)
type LargeStruct struct {
	A int64
	B int64
	C int64
	D int64
	E int64
}

// Medium array: 100 integers
type MediumArray [100]int64

// Large array: 10,000 integers
type LargeArray [10000]int64

func BenchmarkSmallStructCopy(b *testing.B) {
	s := SmallStruct{1, 2, 3, 4}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = s // Compiler still tracks the copy
	}
}

func BenchmarkLargeStructCopy(b *testing.B) {
	s := LargeStruct{1, 2, 3, 4, 5}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = s
	}
}

func BenchmarkMediumArrayCopy(b *testing.B) {
	arr := MediumArray{}
	for i := range arr {
		arr[i] = int64(i)
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = arr
	}
}

func BenchmarkLargeArrayCopy(b *testing.B) {
	arr := LargeArray{}
	for i := range arr {
		arr[i] = int64(i)
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = arr
	}
}

func BenchmarkSliceHeaderCopy(b *testing.B) {
	s := make([]int64, 10000)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = s
	}
}

func main() {
	fmt.Println("Run with: go test -bench=. -benchmem")
}

Sample Results (on a typical modern CPU):

SmallStructCopy: ~0.5 ns/op
LargeStructCopy: ~1.0 ns/op (5 fields vs 4 fields causes measurable difference)
MediumArrayCopy: ~20 ns/op (800 bytes)
LargeArrayCopy: ~2000 ns/op (80,000 bytes)
SliceHeaderCopy: ~0.2 ns/op (only 24 bytes copied)

Notice the dramatic difference: a slice copy is orders of magnitude faster than a large array copy, even though the slice contains the same data, because only the header is copied.

The Compiler's Optimization Threshold

The Go compiler has a special optimization for small structs and arrays. Values with 4 or fewer machine-word-sized fields can often be optimized to use registers instead of stack allocation. Crossing this threshold can trigger a performance cliff:

type Person struct {
	ID        int64
	Name      *string
	Email     *string
	CreatedAt int64
}
// 4 pointers/ints — compiler may optimize heavily

type PersonPlus struct {
	ID        int64
	Name      *string
	Email     *string
	CreatedAt int64
	Active    bool
}
// 5 fields — may spill to memory operations

Common Value Copy Scenarios

Scenario 1: Function Parameters (Pass by Value)

Every time you call a function with a value parameter, Go copies the argument:

func ProcessData(data [1000]int) {
	// Copies 8000 bytes into the function!
}

// Better for large types:
func ProcessDataPtr(data *[1000]int) {
	// Only 8 bytes copied (the pointer)
}

Scenario 2: Range Loop with Value Copy

One of the most common performance pitfalls:

type Event struct {
	ID       int64
	Timestamp int64
	Message  string
	Data     [256]byte
}

// Expensive: copies Event (including [256]byte) for each iteration
for _, event := range events {
	process(event) // Event copied here!
}

// Better: use index-only range
for i := range events {
	process(&events[i]) // Only pointer copied
}

// Or: range over pointers
for _, event := range ptrEvents {
	process(event) // Only pointer copied
}

Let's benchmark this:

type LargeEvent struct {
	ID    int64
	Value int64
	Blob  [512]byte // 512 bytes
}

func BenchmarkRangeWithValue(b *testing.B) {
	events := make([]LargeEvent, 1000)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sum := int64(0)
		for _, e := range events {
			sum += e.ID + e.Value
		}
	}
}

func BenchmarkRangeWithIndex(b *testing.B) {
	events := make([]LargeEvent, 1000)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sum := int64(0)
		for idx := range events {
			sum += events[idx].ID + events[idx].Value
		}
	}
}

func BenchmarkRangePointers(b *testing.B) {
	events := make([]*LargeEvent, 1000)
	for i := range events {
		e := &LargeEvent{}
		events[i] = e
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sum := int64(0)
		for _, e := range events {
			sum += e.ID + e.Value
		}
	}
}

Results: RangeWithValue is ~4x slower than RangeWithIndex due to the constant copying of the 512-byte blob.

Scenario 3: Channel Send and Receive

Channels copy values:

type Packet struct {
	Header [64]byte
	Payload [4096]byte
}

// Expensive: sends copy of entire Packet
ch := make(chan Packet)
ch <- packet

// Better: send pointer
chPtr := make(chan *Packet)
chPtr <- &packet

Scenario 4: Map Insertion

Maps copy their value types:

type Config struct {
	Setting1 string
	Setting2 int
	Setting3 [1000]byte
}

configs := make(map[string]Config)
configs["key"] = cfg // Copies entire Config including [1000]byte

// Better approach: store pointers
configPtrs := make(map[string]*Config)
configPtrs["key"] = &cfg

Scenario 5: `interface{}` Boxing

Assigning a value to an interface{} requires boxing (copying the value onto the heap):

var data [1000]int
var i interface{} = data // Array gets copied during boxing

var ptr *[1000]int = &data
i = ptr // Only pointer (8 bytes) boxed

Value vs Pointer: Decision Tree

Use value parameters when:

The type is ≤ 4 machine words (32 bytes on 64-bit)
The function doesn't modify the value
You need the safety of value semantics (copy-on-assignment)

Use pointer parameters when:

The type is > 4 machine words
The function modifies the value and you want those changes visible to the caller
The value is expensive to copy

func ProcessSmall(s SmallStruct) {
	// Value copy is cheap, semantics are clear
}

func ProcessLarge(l *LargeStruct) {
	// Pointer is much faster
}

func Modify(s *SmallStruct) {
	// We want to modify the original, so pointer is correct
	s.Field = newValue
}

Optimization Patterns

Pattern 1: Use Pointers in Hot Paths

// Hot path that processes millions of events
type Event struct {
	ID    int64
	Type  int
	Value int64
	Data  [128]byte
}

// Slower: copies Event each iteration
for _, e := range events {
	total += processEvent(e)
}

// Faster: use pointer slices
func BenchmarkEventProcessing(b *testing.B) {
	events := make([]*Event, 10000)
	for i := range events {
		events[i] = &Event{ID: int64(i)}
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		sum := int64(0)
		for _, e := range events {
			sum += e.ID
		}
	}
}

Pattern 2: Pre-allocate Large Values

// Don't create large values in loops
for i := 0; i < 1000; i++ {
	data := [1000]int{} // Allocated each iteration
	process(data)
}

// Better: allocate once, reuse
data := [1000]int{}
for i := 0; i < 1000; i++ {
	// Clear if needed
	clear(data[:])
	process(data)
}

Pattern 3: Return Pointers from Factory Functions

// If constructing a large value, return a pointer
func NewConfig() *Config {
	return &Config{...}
}

// Not:
func NewConfig() Config {
	return Config{...}
}

Memory Layout and Alignment

The compiler aligns struct fields to optimize memory access. This can affect actual size:

type Aligned struct {
	A bool    // 1 byte, then 7 bytes padding
	B int64   // 8 bytes (needs 8-byte alignment)
	C int32   // 4 bytes, then 4 bytes padding
	D int64   // 8 bytes
}
// Total: 32 bytes (not 21)

type Better struct {
	A int64
	B int64
	C int32
	D bool
}
// Total: 24 bytes (fields sorted by size)

Optimizing alignment reduces struct size, which reduces copy costs.

Summary and Recommendations

Understand your type sizes: Small values (32 bytes or less) are cheap to copy. Larger types should use pointers.
Watch the range loop pitfall: Avoid copying large values in for _, v := range slice loops. Use index-only range or pointer slices instead.
Profile before optimizing: Use go test -bench and memory profilers to identify actual copy costs in your code.
Use pointers in hot paths: If a code path executes millions of times, even small optimizations matter.
Consider the compiler threshold: Structs with 4 fields often compile more efficiently than those with 5+.
Document your choice: When you use pointers for large values, make it clear in comments why you're not using value semantics.

The most impactful optimization is often as simple as changing for _, e := range events to for i := range events when each e is large. This single-character change can improve performance by 50-80% in some cases.

On this page