Go Performance Guide
Memory Management

Slice and Array Performance Tricks

Advanced slice operations, growth algorithms, and optimization techniques for efficient array and slice manipulation in Go.

Introduction

Slices are one of Go's most powerful features, combining dynamic growth with efficient memory management. However, naive slice usage can lead to unnecessary allocations, redundant copying, and poor performance. This article covers advanced techniques for slice manipulation that Go developers should know.

How Append Works: Growth Algorithm

When you append to a slice beyond its capacity, Go must allocate a new backing array. The growth strategy is:

  • For capacities under 1024: double the capacity
  • For capacities of 1024 or more: grow by 25% each time (capacity + capacity/4)

This logarithmic growth keeps allocation frequency low while avoiding excessive unused space.

func CapacityGrowth() {
	s := make([]int, 0, 1)

	for cap(s) <= 2000 {
		fmt.Printf("len=%d, cap=%d\n", len(s), cap(s))
		s = append(s, 0)
	}
	// Output:
	// len=1, cap=1
	// len=2, cap=2
	// len=3, cap=4
	// len=4, cap=4
	// len=5, cap=8
	// len=8, cap=8
	// len=9, cap=16
	// ... (doubling continues)
	// len=1024, cap=1024
	// len=1025, cap=1280  (1024 + 1024/4)
	// len=1280, cap=1280
	// len=1281, cap=1600  (1280 + 1280/4)
}

Avoiding Wasted Capacity

When you know the final size, allocate it upfront:

// Wasteful: causes multiple allocations
var results []Result
for item := range items {
	results = append(results, processItem(item))
}

// Better: pre-allocate
results := make([]Result, 0, len(items))
for item := range items {
	results = append(results, processItem(item))
}

Benchmark:

func BenchmarkAppendWithoutCapacity(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		var s []int
		for j := 0; j < 10000; j++ {
			s = append(s, j)
		}
	}
}

func BenchmarkAppendWithCapacity(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		s := make([]int, 0, 10000)
		for j := 0; j < 10000; j++ {
			s = append(s, j)
		}
	}
}

Results: WithCapacity allocates once; WithoutCapacity allocates ~15 times.

Advanced Append Techniques

Clipping the Backing Array

When appending to a slice, the new data might overwrite existing data in the original backing array:

original := make([]int, 10)
for i := range original {
	original[i] = i
}

sliced := original[2:5] // Elements 2, 3, 4
// Capacity of sliced is 10-2 = 8 (from index 2 to the end)

// Appending to sliced can overwrite original!
sliced = append(sliced, 999)
// original[5] is now 999

// To prevent this, clip the capacity:
clipped := original[2:5:5] // len=3, cap=3
clipped = append(clipped, 999) // Must allocate new array

This is crucial when you want independent copies:

func SplitData(data []int) ([]int, []int) {
	mid := len(data) / 2

	// WRONG: both halves share backing array
	// left := data[:mid]
	// right := data[mid:]
	// append to right corrupts left!

	// CORRECT: clip to prevent accidental sharing
	left := data[:mid:mid]
	right := data[mid:]
	return left, right
}

Growing in One Step

Instead of appending elements one by one (causing multiple allocations if capacity is exceeded), grow in a single operation:

// Method 1: append with make
s := []int{1, 2, 3}
s = append(s, make([]int, 1000)...)
// s now has length 1003, no per-element allocation cost

// Method 2: explicit pre-allocation (Go 1.20+)
s := make([]int, 0, initialSize)
s = append(s, elements...)

// Method 3: slices.Grow (Go 1.22+)
s = slices.Grow(s, n) // Grow by at least n elements

Benchmark:

func BenchmarkGrowPiecemeal(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		var s []int
		for j := 0; j < 1000; j++ {
			s = append(s, j)
		}
	}
}

func BenchmarkGrowInBatch(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		s := make([]int, 0, 1000)
		for j := 0; j < 1000; j++ {
			s = append(s, j)
		}
	}
}

Efficient Slice Operations

Clone vs Copy vs Append

Go provides multiple ways to copy a slice. Each has different performance characteristics:

original := []int{1, 2, 3, 4, 5}

// Method 1: slices.Clone (Go 1.21+, most efficient)
copy1 := slices.Clone(original)

// Method 2: make + copy
copy2 := make([]int, len(original))
copy(copy2, original)

// Method 3: append to nil
copy3 := append([]int(nil), original...)

// Method 4: append to empty slice with capacity
copy4 := make([]int, 0, len(original))
copy4 = append(copy4, original...)

Benchmark:

func BenchmarkCloneNative(b *testing.B) {
	original := make([]int, 10000)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = slices.Clone(original)
	}
}

func BenchmarkCloneMakeCopy(b *testing.B) {
	original := make([]int, 10000)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		dest := make([]int, len(original))
		copy(dest, original)
	}
}

func BenchmarkCloneAppendNil(b *testing.B) {
	original := make([]int, 10000)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = append([]int(nil), original...)
	}
}

Results: slices.Clone is usually fastest; append approaches are slower but still reasonable.

Merging Slices

Merging two slices is straightforward, but merging three or more requires careful allocation:

// Two slices: simple append
a := []int{1, 2}
b := []int{3, 4}
merged := append(append([]int(nil), a...), b...)

// Multiple slices: use slices.Concat (Go 1.22+)
c := []int{5, 6}
merged := slices.Concat(a, b, c)
// This pre-calculates total capacity and allocates once

// If slices.Concat unavailable, pre-allocate manually:
totalLen := len(a) + len(b) + len(c)
result := make([]int, 0, totalLen)
result = append(result, a...)
result = append(result, b...)
result = append(result, c...)

Insert Operation

Use slices.Insert for efficient insertion:

s := []int{1, 2, 4, 5}
s = slices.Insert(s, 2, 3) // Insert 3 at index 2
// Result: [1, 2, 3, 4, 5]

// Insert multiple elements
s = slices.Insert(s, 2, 10, 11, 12)
// Result: [1, 2, 10, 11, 12, 3, 4, 5]

Insert allocates a new array if needed and is optimized for its use case.

Slice to Array Pointer Conversion

Go 1.17 introduced conversion from slice to array pointer. This is faster than copying:

slice := []byte{1, 2, 3, 4, 5}

// Convert to array pointer (zero-copy)
arrPtr := (*[5]byte)(slice)

// Now arrPtr points directly to slice's backing array
// No allocation, no copy
arrPtr[0] = 99
// slice[0] is now 99

// This is useful when you need to pass to functions expecting arrays
func ProcessArray(arr *[256]byte) {
	// ...
}

// Convert slice to array pointer if length is known
if len(sliceData) >= 256 {
	ProcessArray((*[256]byte)(sliceData))
}

WARNING: The array pointer points into the slice's backing array. If the slice is garbage collected or reallocated, the pointer becomes invalid.

Benchmark:

func BenchmarkSliceToArrayCopy(b *testing.B) {
	slice := make([]byte, 256)
	for i := range slice {
		slice[i] = byte(i)
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		arr := [256]byte{}
		copy(arr[:], slice)
	}
}

func BenchmarkSliceToArrayPointer(b *testing.B) {
	slice := make([]byte, 256)
	for i := range slice {
		slice[i] = byte(i)
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		arr := (*[256]byte)(slice)
		_ = arr
	}
}

Results: ArrayPointer is orders of magnitude faster (zero-copy).

Clearing Values Efficiently

The clear() Builtin

Go 1.21 introduced clear() for efficiently zeroing slices:

s := []int{1, 2, 3, 4, 5}
clear(s) // All elements become 0
// Equivalent to: for i := range s { s[i] = 0 }

// Works for maps and channels too
m := map[string]int{"a": 1, "b": 2}
clear(m) // Empties the map

The clear() builtin is specially optimized and uses memclr for large slices, making it much faster than a manual loop:

func BenchmarkClearManual(b *testing.B) {
	s := make([]int, 10000)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		for j := range s {
			s[j] = 0
		}
	}
}

func BenchmarkClearBuiltin(b *testing.B) {
	s := make([]int, 10000)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		clear(s)
	}
}

Results: clear() is ~10x faster on large slices.

Capacity Management with Subslicing

When subslicing, you can control the capacity:

s := make([]int, 100)

// Full capacity subslice (capacity extends to original end)
sub1 := s[10:20]    // len=10, cap=90 (from index 10 to 100)

// Limited capacity subslice (capacity ends at index 30)
sub2 := s[10:20:30] // len=10, cap=20

// Zero-capacity subslice
sub3 := s[10:10:10] // len=0, cap=0

// Appending to sub1 can overwrite original
sub1 = append(sub1, 999)
// Original s[20] is now 999

// Appending to sub3 forces allocation
sub3 = append(sub3, 999)
// Doesn't touch original

This technique is essential when working with overlapping slices:

func ProcessChunks(data []byte, chunkSize int) {
	for i := 0; i < len(data); i += chunkSize {
		end := i + chunkSize
		if end > len(data) {
			end = len(data)
		}
		// Clip capacity to prevent accidental overlap
		chunk := data[i:end:end]
		processChunk(chunk)
	}
}

Using Index Tables for Performance

Instead of if-chains or switch statements, lookup tables are extremely fast:

// Traditional approach: chain of if statements
func IsWhitespace(b byte) bool {
	return b == ' ' || b == '\t' || b == '\n' || b == '\r'
}

// Optimized approach: lookup table
var whitespaceTable = [256]bool{
	' ':  true,
	'\t': true,
	'\n': true,
	'\r': true,
}

func IsWhitespaceTable(b byte) bool {
	return whitespaceTable[b]
}

// Even better: bit table (256 bytes -> 32 bytes)
var whitespaceSet = [32]uint8{
	0b0001_1100, // bits 2,3,4 set (whitespace positions)
	// ... more entries
}

func IsWhitespaceBitTable(b byte) bool {
	return whitespaceSet[b>>3]&(1<<(b&7)) != 0
}

Benchmark:

var input = []byte("The quick brown fox jumps over the lazy dog")

func BenchmarkIsWhitespaceChain(b *testing.B) {
	count := 0
	for i := 0; i < b.N; i++ {
		for _, ch := range input {
			if IsWhitespace(ch) {
				count++
			}
		}
	}
}

func BenchmarkIsWhitespaceTable(b *testing.B) {
	count := 0
	for i := 0; i < b.N; i++ {
		for _, ch := range input {
			if whitespaceTable[ch] {
				count++
			}
		}
	}
}

Results: Table lookup is ~5-10x faster due to CPU cache behavior and branch prediction.

Preallocating Buffers

For code that processes many items, preallocate and reuse buffers:

// Poor: new buffer per item
func ProcessItems(items []Item) []Result {
	var results []Result
	for _, item := range items {
		// Buffer allocated here, discarded after
		buf := make([]byte, 1024)
		result := processItem(item, buf)
		results = append(results, result)
	}
	return results
}

// Better: single reusable buffer
func ProcessItemsOptimized(items []Item) []Result {
	results := make([]Result, 0, len(items))
	buf := make([]byte, 1024)
	for _, item := range items {
		// Reuse same buffer
		clear(buf)
		result := processItem(item, buf)
		results = append(results, result)
	}
	return results
}

Avoiding Large Array Literals as Operands

Large array literals are inefficient:

// Inefficient: array literal created, copied for comparison
func IsValidCode(code [4]byte) bool {
	return code == [4]byte{'A', 'B', 'C', 'D'}
}

// Better: package-level constant
var validCode = [4]byte{'A', 'B', 'C', 'D'}

func IsValidCode(code [4]byte) bool {
	return code == validCode
}

// Or: use string comparison
func IsValidCodeStr(code string) bool {
	return code == "ABCD"
}

Common Pitfall: Range Loop with Large Elements

Remember from the previous article: don't use the second iteration variable for large elements:

type Event struct {
	ID    int64
	Data  [512]byte
}

events := make([]Event, 1000)

// Slow: copies Event (520 bytes) 1000 times
for _, e := range events {
	processEvent(&e)
}

// Fast: only pointer copies
for i := range events {
	processEvent(&events[i])
}

Summary and Key Takeaways

  1. Pre-allocate slices: Know the final size? Use make([]T, 0, capacity).

  2. Use slices.Clone, Concat, Insert: Modern Go provides optimized functions for common operations.

  3. Clip capacity when needed: Use the third index in slicing to prevent unintended data sharing.

  4. Use clear() for zeroing: Faster and more idiomatic than manual loops.

  5. Use lookup tables: For byte classification or small domain mapping, arrays are orders of magnitude faster than if-chains.

  6. Reuse buffers: Don't allocate new buffers in tight loops; allocate once and clear.

  7. Use array pointers: When converting slice to array pointer is safe, zero-copy conversion is extremely fast.

  8. Profile and measure: Slice operations seem simple, but they hide complex allocation patterns. Use benchmarks to verify improvements.

The most impactful optimization is often pre-allocating slices with correct capacity. A single line—results := make([]Result, 0, len(items))—can reduce allocations from O(log n) to O(1) and improve performance by 50-80%.

On this page