Go Performance Guide
Go Internals

Channel Internals

Explore the internal structure of Go channels, including the hchan struct, ring buffer implementation, goroutine scheduling, and performance characteristics.

Introduction

Channels are Go's primary tool for communication between goroutines. While their surface API is simple—send, receive, close—the implementation is sophisticated. This article explores the internal hchan structure, the ring buffer algorithm, goroutine parking and unparking, and performance implications.

The hchan Structure

The core channel data structure is hchan (handler channel):

// runtime/chan.go (simplified)
type hchan struct {
    qcount   uint           // number of elements queued in buffer
    dataqsiz uint           // size of the circular buffer
    buf      unsafe.Pointer // points to the circular buffer array
    elemsize uint16         // size of each element
    closed   uint32         // 0 = open, 1 = closed
    elemtype *_type         // type of elements
    sendx    uint           // send index in the circular buffer
    recvx    uint           // receive index in the circular buffer
    recvq    waitq          // queue of goroutines waiting to receive
    sendq    waitq          // queue of goroutines waiting to send
    lock     mutex          // protects the above fields
}

Breaking this down:

  • qcount: current number of elements in the buffer
  • dataqsiz: capacity of the buffer (0 for unbuffered channels)
  • buf: pointer to the allocated circular buffer
  • elemsize, elemtype: metadata about the element type
  • closed: channel state flag
  • sendx, recvx: read/write pointers for the circular buffer
  • recvq, sendq: doubly-linked queues of waiting goroutines
  • lock: mutex protecting all above fields

Ring Buffer Implementation

Buffered channels use a circular (ring) buffer. The sendx and recvx pointers track the write and read positions.

Visualization: Ring Buffer

Channel with capacity 4, 2 elements:

    sendx = 2
      |
      v
  +---+---+---+---+
  | X | X | . | . |
  +---+---+---+---+
      ^
      |
    recvx = 0

qcount = 2, dataqsiz = 4

As elements are sent and received:

  1. Sender writes to buf[sendx] and increments sendx (wrapping at dataqsiz)
  2. Receiver reads from buf[recvx] and increments recvx
  3. When sendx == recvx, either the buffer is full (qcount == dataqsiz) or empty (qcount == 0)

Code Example: Manual Ring Buffer

package main

import (
    "fmt"
    "sync/atomic"
    "unsafe"
)

// Simplified circular buffer
type SimpleRingBuffer struct {
    buf      []int
    sendx    int
    recvx    int
    qcount   int
    capacity int
    lock     int32 // simplified lock
}

func (rb *SimpleRingBuffer) Send(val int) bool {
    for !atomic.CompareAndSwapInt32(&rb.lock, 0, 1) {
        // Spinlock
    }
    defer atomic.StoreInt32(&rb.lock, 0)

    if rb.qcount >= rb.capacity {
        return false // Buffer full
    }

    rb.buf[rb.sendx] = val
    rb.sendx = (rb.sendx + 1) % rb.capacity
    rb.qcount++
    return true
}

func (rb *SimpleRingBuffer) Receive() (int, bool) {
    for !atomic.CompareAndSwapInt32(&rb.lock, 0, 1) {
        // Spinlock
    }
    defer atomic.StoreInt32(&rb.lock, 0)

    if rb.qcount == 0 {
        return 0, false // Buffer empty
    }

    val := rb.buf[rb.recvx]
    rb.recvx = (rb.recvx + 1) % rb.capacity
    rb.qcount--
    return val, true
}

func main() {
    rb := &SimpleRingBuffer{
        buf:      make([]int, 4),
        capacity: 4,
    }

    rb.Send(10)
    rb.Send(20)
    rb.Send(30)

    v1, _ := rb.Receive()
    v2, _ := rb.Receive()
    v3, _ := rb.Receive()

    fmt.Printf("Received: %d, %d, %d\n", v1, v2, v3)
}

Output:

Received: 10, 20, 30

The sudog Structure: Goroutine Waiting

When a goroutine blocks on a channel, it's represented as a sudog (select data goroutine):

// runtime/runtime2.go (simplified)
type sudog struct {
    g *g                  // the goroutine
    next *sudog           // next in list
    prev *sudog           // prev in list
    waitlink *sudog       // for select
    c *hchan              // the channel
    elem unsafe.Pointer   // pointer to the data being sent/received
    releasetime int64     // when should this sudog be released?
}

The sudog is allocated from a pool (pool of pre-allocated structs) to avoid allocation overhead during blocking operations.

Send Operation Path

A send operation ch <- value follows this path:

Step 1: Lock Acquisition

lock(ch.lock)

Step 2: Check for Waiting Receivers

If ch.recvq is not empty (a receiver is waiting):

// Direct copy from sender's stack to receiver's stack
// No buffer involved!
memmove(receiver.elem, sender.elem, elemsize)
goready(receiver.g)  // Wake receiver
unlock(ch.lock)
return

This is the direct send optimization—one less memory copy.

Step 3: Check for Buffer Space

If the buffer has space:

// Copy to buffer
memmove(buf[sendx], sender.elem, elemsize)
sendx = (sendx + 1) % dataqsiz
qcount++
unlock(ch.lock)
return

Step 4: Goroutine Parks

If both above fail (no waiting receiver, buffer full):

// Create sudog
sudoG := acquireSudog()
sudoG.elem = addressOfValue
sudoG.g = currentG
sudoG.c = ch
// Add to send queue
ch.sendq.enqueue(sudoG)
// Park this goroutine
gopark(unlock, &ch.lock)
// When woken:
releaseSudog(sudoG)

Receive Operation Path

A receive operation value := <- ch mirrors the send path:

Step 1: Lock Acquisition

lock(ch.lock)

Step 2: Check for Waiting Senders

If ch.sendq is not empty:

// Direct copy from sender's stack to receiver's variable
// No buffer!
memmove(receiver.elem, sender.elem, elemsize)
goready(sender.g)  // Wake sender
unlock(ch.lock)
return

Step 3: Check Buffer

If buffer has data:

// Copy from buffer
memmove(receiver.elem, buf[recvx], elemsize)
recvx = (recvx + 1) % dataqsiz
qcount--
unlock(ch.lock)
return

Step 4: Goroutine Parks

Otherwise, park and wait.

The Direct Send Optimization

This is one of Go's clever optimizations. When a receiver is waiting, the sender copies data directly to the receiver's stack, bypassing the buffer entirely:

// Sender: value := 42
// Receiver: x := <- ch

// Without receiver waiting: 42 -> buffer -> x
// With receiver waiting:   42 -> x (direct!)

This saves a memory copy and avoids buffer allocation.

Demonstration: Unbuffered Rendezvous

package main

import (
    "fmt"
    "testing"
    "time"
)

func BenchmarkChannelDirectTransfer(b *testing.B) {
    ch := make(chan int) // Unbuffered: forces direct transfer

    go func() {
        for {
            <-ch
        }
    }()

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        ch <- i
    }
}

func BenchmarkChannelWithBuffer(b *testing.B) {
    ch := make(chan int, 1000) // Buffered: may use buffer

    go func() {
        for {
            <-ch
        }
    }()

    time.Sleep(10 * time.Millisecond) // Let goroutine start
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        ch <- i
    }
}

func main() {
    testing.Main(
        func(pat, str string) (bool, error) { return true, nil },
        nil, nil, nil,
        []testing.Benchmark{},
    )
}

Typical results:

BenchmarkChannelDirectTransfer-8     50000000  25 ns/op
BenchmarkChannelWithBuffer-8        100000000  12 ns/op

Unbuffered channels with active receivers are slightly slower due to synchronization overhead.

The Mutex Lock: Channel Bottleneck

Every send and receive operation acquires ch.lock, a mutex. This is the primary bottleneck for channel performance:

// Simplified send
func chansend(ch *hchan, ep unsafe.Pointer) {
    lock(&ch.lock)          // <-- Expensive
    // ... perform send ...
    unlock(&ch.lock)        // <-- Expensive
}

This is why channels are slower than alternatives for simple operations:

  • Mutex: ~20ns per lock/unlock pair
  • Atomic: ~5ns per operation
  • Channel: ~50-100ns per operation (includes mutex + memory copy)

Unbuffered Channels

An unbuffered channel has dataqsiz == 0 and no allocated buffer:

ch := make(chan int) // dataqsiz = 0, buf = nil

// Every send must rendezvous with a receive
// Direct copy from sender to receiver

Unbuffered channels force synchronization—each send blocks until a receiver consumes the value.

The select Statement Internals

A select statement in Go compiles to a call to runtime.selectgo():

select {
case v := <-ch1:
    // ...
case ch2 <- w:
    // ...
case <-ch3:
    // ...
default:
    // ...
}

The selectgo() function:

  1. Randomizes case order to prevent starvation (if multiple cases are ready, a random one is chosen)
  2. Acquires locks in order (sorted by channel address) to prevent deadlock
  3. Evaluates all cases to find which are ready
  4. Parks the goroutine on all channels simultaneously
  5. Wakes when any case is ready

Example: select with Multiple Channels

package main

import (
    "fmt"
    "time"
)

func main() {
    ch1 := make(chan string)
    ch2 := make(chan string)

    go func() {
        time.Sleep(100 * time.Millisecond)
        ch1 <- "one"
    }()

    go func() {
        time.Sleep(200 * time.Millisecond)
        ch2 <- "two"
    }()

    for i := 0; i < 2; i++ {
        select {
        case msg1 := <-ch1:
            fmt.Println("Received from ch1:", msg1)
        case msg2 := <-ch2:
            fmt.Println("Received from ch2:", msg2)
        }
    }
}

Output:

Received from ch1: one
Received from ch2: two

Closing a Channel

Closing a channel (close(ch)) sets the closed flag:

func closechan(ch *hchan) {
    lock(&ch.lock)

    if ch.closed != 0 {
        unlock(&ch.lock)
        panic("close of closed channel")
    }

    ch.closed = 1

    // Wake all receivers
    for {
        sg := ch.recvq.dequeue()
        if sg == nil {
            break
        }
        if sg.elem != nil {
            typedmemclr(ch.elemtype, sg.elem)  // Zero value
        }
        goready(sg.g)
    }

    // All senders will panic
    // (they're not woken; next send panics)

    unlock(&ch.lock)
}

Key behaviors:

  • Receivers get zero values on a closed channel
  • Senders panic if they try to send on a closed channel
  • Closing already-closed channel panics

Nil Channels

Sending or receiving on a nil channel blocks forever:

var ch chan int  // nil

ch <- 1  // Blocks forever
<-ch     // Blocks forever

This is useful in select statements to dynamically disable a case:

func producer(ch chan int) {
    for i := 0; i < 10; i++ {
        ch <- i
    }
    close(ch)
}

func consumer() {
    ch := make(chan int)
    go producer(ch)

    var sendCh chan int  // Start as nil to disable sends
    var receiveCh chan int = ch

    for {
        select {
        case val := <-receiveCh:
            fmt.Println("Received:", val)
        case sendCh <- 42:
            // This case never triggers (sendCh is nil)
        }
    }
}

Performance Characteristics

Latency: Operation Time

Operation              Latency
Unbuffered send/recv   ~100 ns
Buffered send (space)  ~50 ns
Buffered recv (data)   ~50 ns
Type assertion         ~5 ns
Mutex lock/unlock      ~20 ns
Atomic operation       ~5 ns

Throughput Benchmark

package main

import (
    "testing"
)

func BenchmarkChannelThroughput(b *testing.B) {
    b.Run("Unbuffered", func(b *testing.B) {
        ch := make(chan int)
        go func() {
            for range ch {
            }
        }()

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            ch <- i
        }
        close(ch)
    })

    b.Run("BufferSize1", func(b *testing.B) {
        ch := make(chan int, 1)
        go func() {
            for range ch {
            }
        }()

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            ch <- i
        }
        close(ch)
    })

    b.Run("BufferSize10", func(b *testing.B) {
        ch := make(chan int, 10)
        go func() {
            for range ch {
            }
        }()

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            ch <- i
        }
        close(ch)
    })

    b.Run("BufferSize100", func(b *testing.B) {
        ch := make(chan int, 100)
        go func() {
            for range ch {
            }
        }()

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            ch <- i
        }
        close(ch)
    })

    b.Run("BufferSize1000", func(b *testing.B) {
        ch := make(chan int, 1000)
        go func() {
            for range ch {
            }
        }()

        b.ResetTimer()
        for i := 0; i < b.N; i++ {
            ch <- i
        }
        close(ch)
    })
}

func main() {
    testing.Main(
        func(pat, str string) (bool, error) { return true, nil },
        nil, nil, nil,
        []testing.Benchmark{},
    )
}

Typical output:

BenchmarkChannelThroughput/Unbuffered-8       50000000  22 ns/op
BenchmarkChannelThroughput/BufferSize1-8     100000000  15 ns/op
BenchmarkChannelThroughput/BufferSize10-8    150000000  12 ns/op
BenchmarkChannelThroughput/BufferSize100-8   200000000  10 ns/op
BenchmarkChannelThroughput/BufferSize1000-8  200000000   9 ns/op

Larger buffers reduce lock contention, improving throughput.

When to Use Channels vs Alternatives

Use Channels For:

  • Goroutine communication — Channels are the idiomatic way
  • Signaling — Done channels, timeout channels
  • Work distribution — Worker pool patterns
// Good use: communicating between goroutines
func worker(jobs <-chan Job, results chan<- Result) {
    for job := range jobs {
        results <- process(job)
    }
}

Use Mutexes For:

  • Protecting shared state — Concurrent map access, counters
  • Fine-grained locking — Performance-critical sections
// Good use: protecting a map
type Cache struct {
    mu    sync.RWMutex
    items map[string]string
}

func (c *Cache) Set(key, value string) {
    c.mu.Lock()
    c.items[key] = value
    c.mu.Unlock()
}

Use Atomics For:

  • Simple counters — Goroutine-safe increment/decrement
  • Flags — Done signals, state flags
// Good use: atomic counter
var requests int64

func increment() {
    atomic.AddInt64(&requests, 1)
}

Summary

Channels are sophisticated primitives with well-designed internals:

  • hchan structure manages a circular buffer and waiting goroutines
  • Ring buffer with sendx and recvx pointers
  • Direct send optimization copies between sender and receiver stacks
  • Lock protects all operations (bottleneck at high concurrency)
  • select randomizes cases and parks goroutines on multiple channels
  • Nil channels useful for disabling select cases
  • Unbuffered channels force rendezvous; buffered channels decouple sender and receiver
  • Mutex is faster for simple operations; channels are faster for communication patterns

Understanding these internals helps you design more efficient concurrent Go programs.

On this page