Resilient Connection Handling

Master resilience patterns in Go to prevent cascading failures and maximize throughput under adverse conditions

Resilient connection handling is the cornerstone of high-performance distributed systems. When a downstream service degrades, a naive client that continues hammering it with requests destroys both systems. This guide covers proven patterns to protect your Go services: circuit breakers, backpressure, load shedding, bulkheads, and intelligent retries.

Why Resilience is a Performance Concern

Performance optimization often focuses on throughput and latency. But in distributed systems, resilience directly impacts performance. Here's why:

Cascading Failures

When a downstream service becomes unhealthy, clients that keep sending requests waste resources:

Threads/goroutines pile up waiting for timeouts, consuming memory
Connection pools exhaust as connections hang, starving other requests
Upstream services degrade because their clients become slow or unresponsive
Latency spikes propagate backward through the call chain

A single unhealthy service can collapse an entire cluster.

The Cost of Naive Retry

Retrying without circuit breaking creates a retry storm:

Client 1: GET /api → 500ms timeout → retry → timeout → retry...
Client 2: GET /api → 500ms timeout → retry → timeout → retry...
Client 3: ...

In seconds, a degraded service receives 10x normal traffic, accelerating its death.

Performance Under Failure

Without resilience patterns:

Throughput under failure: 50 req/sec (most blocked)
Latency under failure: 5-30 seconds (timeout waiting)
Recovery time: 5+ minutes (when does service realize it's healthy?)

With proper resilience:

Throughput under failure: 1000 req/sec (shedding load gracefully)
Latency under failure: under 100ms (fast-fail)
Recovery time: under 10 seconds (auto-detect health)

Circuit Breaker Pattern

The circuit breaker prevents cascading failures by stopping request flow when a service is unhealthy.

States and Transitions

A circuit breaker has three states:

Closed: Normal operation, requests pass through
Open: Service is failing, requests are rejected immediately
Half-Open: Testing if service recovered, allowing trial requests

Closed --[failure threshold exceeded]--> Open
  ^                                        |
  |                                        |
  +----------[timeout elapsed]-- Half-Open
            (allows trial requests)

Using sony/gobreaker

The github.com/sony/gobreaker library provides a robust circuit breaker implementation.

package main

import (
	"fmt"
	"github.com/sony/gobreaker"
	"net/http"
	"time"
)

func main() {
	// Create a circuit breaker with custom settings
	cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
		Name:        "API",
		MaxRequests: 3,                    // Allow 3 requests in half-open state
		Interval:    time.Second * 10,    // Check health every 10 seconds
		Timeout:     time.Second * 30,    // Open for 30 seconds before half-open
		ReadyToTrip: readyToTrip,         // Custom trip logic
		OnStateChange: onStateChange,     // Observe state changes
	})

	// Use the circuit breaker
	result, err := cb.Execute(func() (interface{}, error) {
		return callDownstreamAPI()
	})

	if err != nil {
		handleError(err) // Could be circuit.ErrOpenState
	}
	fmt.Println(result)
}

// Custom logic: trip after 50% failures in last 100 requests
func readyToTrip(counts gobreaker.Counts) bool {
	failureRatio := float64(counts.TotalFailures) / float64(counts.Requests)
	return counts.Requests >= 100 && failureRatio >= 0.5
}

// Monitor state changes for observability
func onStateChange(name string, from gobreaker.State, to gobreaker.State) {
	fmt.Printf("Circuit breaker %s: %s -> %s\n", name, from, to)
}

func callDownstreamAPI() (interface{}, error) {
	resp, err := http.Get("http://downstream/api")
	if err != nil {
		return nil, err
	}
	defer resp.Body.Close()

	if resp.StatusCode >= 500 {
		return nil, fmt.Errorf("server error: %d", resp.StatusCode)
	}
	return resp.Body, nil
}

func handleError(err error) {
	if err == gobreaker.ErrOpenState {
		fmt.Println("Circuit is open, rejecting request")
	} else {
		fmt.Println("Request failed:", err)
	}
}

Circuit Breaker Configuration

Key settings for your use case:

MaxRequests: How many requests to allow in half-open state (3-10 typical)
Interval: Reset counter this often (10-30 seconds)
Timeout: How long to stay open before trying again (10-60 seconds)
ReadyToTrip: Customize failure detection (error rate, error count, latency)

Tip: Set Timeout much longer than Interval. If a service is failing, give it time to recover before testing.

Custom Circuit Breaker Implementation

For educational purposes or special requirements, implement a circuit breaker from scratch:

package resilience

import (
	"fmt"
	"sync"
	"time"
)

type State int

const (
	StateClosed State = iota
	StateOpen
	StateHalfOpen
)

type CircuitBreaker struct {
	mu              sync.RWMutex
	state           State
	lastFailTime    time.Time
	failureCount    int64
	successCount    int64
	maxFailures     int64
	resetTimeout    time.Duration
	lastStateChange time.Time
}

func NewCircuitBreaker(maxFailures int64, timeout time.Duration) *CircuitBreaker {
	return &CircuitBreaker{
		state:        StateClosed,
		maxFailures:  maxFailures,
		resetTimeout: timeout,
	}
}

func (cb *CircuitBreaker) Call(fn func() error) error {
	cb.mu.Lock()
	defer cb.mu.Unlock()

	// Check if we should transition to half-open
	if cb.state == StateOpen {
		if time.Since(cb.lastStateChange) > cb.resetTimeout {
			cb.setState(StateHalfOpen)
		} else {
			return fmt.Errorf("circuit breaker open")
		}
	}

	// Execute the function
	err := fn()

	if err != nil {
		cb.failureCount++
		cb.lastFailTime = time.Now()

		// Transition to open if thresholds exceeded
		if cb.state == StateClosed && cb.failureCount >= cb.maxFailures {
			cb.setState(StateOpen)
		} else if cb.state == StateHalfOpen {
			// One failure while half-open, back to open
			cb.setState(StateOpen)
		}
		return err
	}

	// Success
	cb.successCount++
	if cb.state == StateHalfOpen {
		// Healthy, close the circuit
		cb.setState(StateClosed)
		cb.failureCount = 0
		cb.successCount = 0
	}
	return nil
}

func (cb *CircuitBreaker) setState(newState State) {
	if cb.state != newState {
		fmt.Printf("CB transition: %v -> %v\n", cb.state, newState)
		cb.state = newState
		cb.lastStateChange = time.Now()
	}
}

func (cb *CircuitBreaker) State() State {
	cb.mu.RLock()
	defer cb.mu.RUnlock()
	return cb.state
}

Usage:

cb := NewCircuitBreaker(5, 30*time.Second)

err := cb.Call(func() error {
	return callAPI()
})

Backpressure and Rate Limiting

Backpressure protects downstream services by limiting how much work we send them.

golang.org/x/time/rate Limiter

The standard Go rate limiter uses the token bucket algorithm:

package main

import (
	"fmt"
	"golang.org/x/time/rate"
	"context"
)

func main() {
	// Create a limiter: 100 requests per second, burst of 10
	limiter := rate.NewLimiter(100, 10)

	// Option 1: Blocking (wait until we can take a token)
	for i := 0; i < 50; i++ {
		if err := limiter.Wait(context.Background()); err != nil {
			fmt.Println("Context cancelled:", err)
			break
		}
		go handleRequest(i)
	}

	// Option 2: Non-blocking (check if token available)
	if limiter.Allow() {
		fmt.Println("Request allowed")
	} else {
		fmt.Println("Rate limit exceeded, reject")
	}

	// Option 3: Reserve tokens for batch operations
	reservation := limiter.ReserveN(time.Now(), 5)
	if !reservation.OK() {
		fmt.Println("Cannot reserve 5 tokens")
		return
	}
	time.Sleep(reservation.Delay()) // Wait if needed
	fmt.Println("Processing batch of 5 requests")
}

func handleRequest(id int) {
	fmt.Printf("Request %d processed\n", id)
}

Custom Backpressure with Semaphore

Control concurrent work with a weighted semaphore:

package main

import (
	"context"
	"fmt"
	"golang.org/x/sync/semaphore"
	"sync"
)

func main() {
	// Semaphore limits concurrent requests to downstream service
	sem := semaphore.NewWeighted(int64(100))
	var wg sync.WaitGroup

	for i := 0; i < 1000; i++ {
		wg.Add(1)
		go func(id int) {
			defer wg.Done()

			// Acquire a permit
			if err := sem.Acquire(context.Background(), 1); err != nil {
				fmt.Println("Failed to acquire permit")
				return
			}
			defer sem.Release(1)

			// Process the request
			fmt.Printf("Processing request %d\n", id)
		}(i)
	}

	wg.Wait()
}

Tip: Semaphores are ideal for connection pool management. Set the weight equal to your connection pool size.

Load Shedding

Under extreme load, shed requests gracefully instead of queuing them indefinitely.

package main

import (
	"fmt"
	"net/http"
	"sync/atomic"
	"time"
)

type LoadShedder struct {
	maxConcurrent int64
	current       int64
	rejectedCount int64
}

func NewLoadShedder(maxConcurrent int64) *LoadShedder {
	return &LoadShedder{
		maxConcurrent: maxConcurrent,
	}
}

// Middleware that sheds load
func (ls *LoadShedder) Middleware(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		current := atomic.AddInt64(&ls.current, 1)
		defer atomic.AddInt64(&ls.current, -1)

		if current > ls.maxConcurrent {
			atomic.AddInt64(&ls.rejectedCount, 1)
			w.Header().Set("Retry-After", "5")
			http.Error(w, "Service overloaded", http.StatusServiceUnavailable)
			return
		}

		next.ServeHTTP(w, r)
	})
}

func (ls *LoadShedder) Stats() (int64, int64) {
	return atomic.LoadInt64(&ls.current), atomic.LoadInt64(&ls.rejectedCount)
}

func main() {
	shedder := NewLoadShedder(100)

	mux := http.NewServeMux()
	mux.HandleFunc("/api", func(w http.ResponseWriter, r *http.Request) {
		time.Sleep(100 * time.Millisecond)
		w.WriteHeader(http.StatusOK)
	})

	http.ListenAndServe(":8080", shedder.Middleware(mux))
}

Key aspects:

Track concurrent requests atomically
Return 503 Service Unavailable when limit exceeded
Include Retry-After header to tell clients when to retry
Fast rejection (no resource waste) vs slow timeout

Retry with Exponential Backoff and Jitter

Simple retries amplify failure. Exponential backoff with jitter prevents retry storms.

Implementation from Scratch

package main

import (
	"fmt"
	"math"
	"math/rand"
	"time"
)

type RetryConfig struct {
	MaxAttempts int
	InitialWait time.Duration
	MaxWait     time.Duration
	Multiplier  float64
}

func DefaultRetryConfig() RetryConfig {
	return RetryConfig{
		MaxAttempts: 5,
		InitialWait: 100 * time.Millisecond,
		MaxWait:     10 * time.Second,
		Multiplier:  2.0,
	}
}

func RetryWithBackoff(cfg RetryConfig, fn func() error) error {
	var lastErr error

	for attempt := 0; attempt < cfg.MaxAttempts; attempt++ {
		err := fn()
		if err == nil {
			return nil // Success
		}
		lastErr = err

		if attempt == cfg.MaxAttempts-1 {
			break // Last attempt
		}

		// Calculate backoff with jitter
		wait := time.Duration(math.Min(
			float64(cfg.MaxWait),
			float64(cfg.InitialWait)*math.Pow(cfg.Multiplier, float64(attempt)),
		))

		// Add random jitter: ±25%
		jitter := time.Duration(rand.Int63n(int64(wait / 2)))
		totalWait := wait + jitter

		fmt.Printf("Attempt %d failed, retrying after %v\n", attempt+1, totalWait)
		time.Sleep(totalWait)
	}

	return fmt.Errorf("all %d attempts failed: %w", cfg.MaxAttempts, lastErr)
}

func main() {
	cfg := DefaultRetryConfig()
	cfg.MaxAttempts = 4

	err := RetryWithBackoff(cfg, func() error {
		// Simulate API call
		return fmt.Errorf("temporary failure")
	})
	fmt.Println("Final result:", err)
}

Using cenkalti/backoff

For production, use a well-tested library:

package main

import (
	"fmt"
	"github.com/cenkalti/backoff/v4"
	"context"
)

func main() {
	exp := backoff.NewExponentialBackOff()
	exp.InitialInterval = 100 * time.Millisecond
	exp.MaxInterval = 10 * time.Second
	exp.MaxElapsedTime = 1 * time.Minute

	// Retry with timeout
	err := backoff.RetryNotify(
		func() error {
			return callAPI()
		},
		exp,
		func(err error, d time.Duration) {
			fmt.Printf("Retry after %v due to: %v\n", d, err)
		},
	)

	if err != nil {
		fmt.Println("Failed after retries:", err)
	}
}

func callAPI() error {
	// Your API call
	return nil
}

Critical: Always include jitter. Without it, synchronized retries from multiple clients create thundering herds that re-crash the service.

Bulkhead Pattern

Isolate resources per-service to prevent one failing service from starving others.

package main

import (
	"fmt"
	"golang.org/x/sync/semaphore"
	"context"
	"sync"
	"time"
)

type BulkheadPool struct {
	services map[string]*semaphore.Weighted
	mu       sync.RWMutex
}

func NewBulkheadPool() *BulkheadPool {
	return &BulkheadPool{
		services: make(map[string]*semaphore.Weighted),
	}
}

// Register a service with its resource limit
func (bp *BulkheadPool) Register(serviceName string, maxConcurrent int64) {
	bp.mu.Lock()
	defer bp.mu.Unlock()
	bp.services[serviceName] = semaphore.NewWeighted(maxConcurrent)
}

// Execute a call with bulkhead isolation
func (bp *BulkheadPool) Execute(ctx context.Context, serviceName string, fn func() error) error {
	bp.mu.RLock()
	sem, ok := bp.services[serviceName]
	bp.mu.RUnlock()

	if !ok {
		return fmt.Errorf("unknown service: %s", serviceName)
	}

	// Acquire a slot
	if err := sem.Acquire(ctx, 1); err != nil {
		return fmt.Errorf("bulkhead rejected request: %w", err)
	}
	defer sem.Release(1)

	return fn()
}

func main() {
	pool := NewBulkheadPool()
	pool.Register("user-service", 50)
	pool.Register("order-service", 100)

	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
	defer cancel()

	// These services won't starve each other
	err := pool.Execute(ctx, "user-service", func() error {
		fmt.Println("Calling user-service")
		return nil
	})
	fmt.Println("Result:", err)
}

Advantages:

Service A failing doesn't consume all resources
Service B continues with its allocated pool
Fair resource distribution across multiple services

Timeout Cascades

Context propagation prevents timeout leaks through your call chain.

Correct Timeout Propagation

package main

import (
	"context"
	"fmt"
	"net/http"
	"time"
)

// Handler receives a 5-second timeout
func handleRequest(ctx context.Context, r *http.Request) error {
	// Create derived context with shorter timeout for each downstream call
	// Reserve time for work at this level
	deadline, ok := ctx.Deadline()
	if !ok {
		return fmt.Errorf("no deadline")
	}

	remaining := time.Until(deadline)
	fmt.Printf("Remaining time: %v\n", remaining)

	// Allocate time budget: 1s for this level, rest for downstream
	thisLevelBudget := 1 * time.Second
	downstreamBudget := remaining - thisLevelBudget

	if downstreamBudget <= 0 {
		return fmt.Errorf("insufficient time remaining")
	}

	// Create context for downstream call with proper deadline
	downstreamCtx, cancel := context.WithTimeout(ctx, downstreamBudget)
	defer cancel()

	return callDownstream(downstreamCtx)
}

func callDownstream(ctx context.Context) error {
	// This context has proper timeout inherited
	select {
	case <-time.After(100 * time.Millisecond):
		fmt.Println("Downstream completed")
		return nil
	case <-ctx.Done():
		return ctx.Err()
	}
}

func main() {
	// Entry point: 5-second timeout
	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
	defer cancel()

	err := handleRequest(ctx, nil)
	fmt.Println("Error:", err)
}

Common Mistakes

// WRONG: Each timeout starts fresh (total could be 5 + 5 + 5 = 15s)
ctx1, _ := context.WithTimeout(context.Background(), 5*time.Second)
callA(ctx1)

ctx2, _ := context.WithTimeout(context.Background(), 5*time.Second)
callB(ctx2)

// CORRECT: Derived contexts share the parent deadline
ctx, _ := context.WithTimeout(context.Background(), 5*time.Second)
ctxA, _ := context.WithTimeout(ctx, 2*time.Second) // Can't exceed 5s
callA(ctxA)

ctxB, _ := context.WithTimeout(ctx, 2*time.Second) // Still bounded by 5s
callB(ctxB)

Health Checks and Readiness Probes

Graceful degradation requires knowing when to shed load.

package main

import (
	"fmt"
	"net/http"
	"sync"
	"sync/atomic"
	"time"
)

type HealthChecker struct {
	mu              sync.RWMutex
	healthy         bool
	lastCheckTime   time.Time
	failureCount    int64
	consecutiveFail int64
	checkInterval   time.Duration
}

func NewHealthChecker(interval time.Duration) *HealthChecker {
	hc := &HealthChecker{
		healthy:       true,
		checkInterval: interval,
	}
	go hc.periodicCheck()
	return hc
}

func (hc *HealthChecker) periodicCheck() {
	ticker := time.NewTicker(hc.checkInterval)
	defer ticker.Stop()

	for range ticker.C {
		if hc.checkDownstream() {
			atomic.StoreInt64(&hc.consecutiveFail, 0)
			hc.setHealthy(true)
		} else {
			count := atomic.AddInt64(&hc.consecutiveFail, 1)
			atomic.AddInt64(&hc.failureCount, 1)

			// Mark unhealthy after 3 consecutive failures
			if count >= 3 {
				hc.setHealthy(false)
			}
		}
	}
}

func (hc *HealthChecker) checkDownstream() bool {
	ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
	defer cancel()

	req, _ := http.NewRequestWithContext(ctx, "GET", "http://downstream/health", nil)
	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		return false
	}
	defer resp.Body.Close()
	return resp.StatusCode == http.StatusOK
}

func (hc *HealthChecker) setHealthy(healthy bool) {
	hc.mu.Lock()
	defer hc.mu.Unlock()
	hc.healthy = healthy
}

func (hc *HealthChecker) IsHealthy() bool {
	hc.mu.RLock()
	defer hc.mu.RUnlock()
	return hc.healthy
}

// Middleware to use health status
func (hc *HealthChecker) ReadinessMiddleware(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		if !hc.IsHealthy() {
			http.Error(w, "Service unhealthy", http.StatusServiceUnavailable)
			return
		}
		next.ServeHTTP(w, r)
	})
}

Combining Patterns: The Resilience Stack

Use multiple patterns together for maximum robustness:

package main

import (
	"context"
	"fmt"
	"github.com/sony/gobreaker"
	"golang.org/x/time/rate"
	"net/http"
	"time"
)

type ResilientClient struct {
	breaker  *gobreaker.CircuitBreaker
	limiter  *rate.Limiter
	timeout  time.Duration
	retries  int
}

func NewResilientClient() *ResilientClient {
	cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
		Name:        "HTTP",
		MaxRequests: 3,
		Interval:    10 * time.Second,
		Timeout:     30 * time.Second,
		ReadyToTrip: func(counts gobreaker.Counts) bool {
			ratio := float64(counts.TotalFailures) / float64(counts.Requests)
			return counts.Requests >= 10 && ratio >= 0.5
		},
	})

	limiter := rate.NewLimiter(100, 10) // 100 req/s, burst 10

	return &ResilientClient{
		breaker: cb,
		limiter: limiter,
		timeout: 5 * time.Second,
		retries: 3,
	}
}

func (rc *ResilientClient) Do(ctx context.Context, req *http.Request) (*http.Response, error) {
	// 1. Rate limiting (backpressure)
	if err := rc.limiter.Wait(ctx); err != nil {
		return nil, fmt.Errorf("rate limited: %w", err)
	}

	// 2. Circuit breaker + retry
	var lastErr error
	for attempt := 0; attempt < rc.retries; attempt++ {
		result, err := rc.breaker.Execute(func() (interface{}, error) {
			// 3. Timeout (cascading via context)
			callCtx, cancel := context.WithTimeout(ctx, rc.timeout)
			defer cancel()

			return http.DefaultClient.Do(req.WithContext(callCtx))
		})

		if err == nil {
			return result.(*http.Response), nil
		}

		lastErr = err

		// Don't retry if circuit is open
		if err == gobreaker.ErrOpenState {
			break
		}

		// Exponential backoff
		backoff := time.Duration(1<<uint(attempt)) * 100 * time.Millisecond
		select {
		case <-time.After(backoff):
		case <-ctx.Done():
			return nil, ctx.Err()
		}
	}

	return nil, fmt.Errorf("all retries exhausted: %w", lastErr)
}

func main() {
	client := NewResilientClient()
	req, _ := http.NewRequest("GET", "http://api.example.com/data", nil)

	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	defer cancel()

	resp, err := client.Do(ctx, req)
	if err != nil {
		fmt.Println("Request failed:", err)
	} else {
		fmt.Println("Success:", resp.StatusCode)
		resp.Body.Close()
	}
}

This stack:

Rate limiter protects downstream (backpressure)
Circuit breaker fails fast when service is down
Retries with backoff handle transient failures
Timeout prevents hanging connections
Context propagation ensures timeouts cascade properly

Benchmarks: Circuit Breaker Impact

Here's a realistic benchmark showing throughput under failure:

package main

import (
	"fmt"
	"github.com/sony/gobreaker"
	"sync"
	"sync/atomic"
	"testing"
	"time"
)

// Simulated failure: service returns error for 10 seconds
func failingService() error {
	return fmt.Errorf("service error")
}

// With circuit breaker
func benchmarkWithCircuitBreaker(b *testing.B) {
	cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
		Name:        "test",
		MaxRequests: 1,
		Interval:    time.Second,
		Timeout:     5 * time.Second,
		ReadyToTrip: func(counts gobreaker.Counts) bool {
			return counts.TotalFailures >= 3
		},
	})

	successCount := int64(0)
	failureCount := int64(0)

	b.ReportAllocs()
	b.ResetTimer()

	var wg sync.WaitGroup
	for i := 0; i < 100; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for j := 0; j < b.N/100; j++ {
				_, err := cb.Execute(func() (interface{}, error) {
					return nil, failingService()
				})
				if err == gobreaker.ErrOpenState {
					atomic.AddInt64(&failureCount, 1)
				} else if err == nil {
					atomic.AddInt64(&successCount, 1)
				} else {
					atomic.AddInt64(&failureCount, 1)
				}
			}
		}()
	}
	wg.Wait()

	b.StopTimer()
	fmt.Printf("CB: Success=%d, FastFail=%d\n",
		atomic.LoadInt64(&successCount),
		atomic.LoadInt64(&failureCount))
}

// Without circuit breaker (naive retry)
func benchmarkWithoutCircuitBreaker(b *testing.B) {
	successCount := int64(0)
	failureCount := int64(0)

	b.ReportAllocs()
	b.ResetTimer()

	var wg sync.WaitGroup
	for i := 0; i < 100; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for j := 0; j < b.N/100; j++ {
				if err := failingService(); err != nil {
					atomic.AddInt64(&failureCount, 1)
				} else {
					atomic.AddInt64(&successCount, 1)
				}
			}
		}()
	}
	wg.Wait()

	b.StopTimer()
	fmt.Printf("No CB: Success=%d, Failures=%d\n",
		atomic.LoadInt64(&successCount),
		atomic.LoadInt64(&failureCount))
}

func main() {
	// Simulate multiple concurrent clients
	fmt.Println("=== With Circuit Breaker ===")
	fmt.Println("All requests fail fast after threshold (rejected by CB)")

	fmt.Println("\n=== Without Circuit Breaker ===")
	fmt.Println("All requests attempt the call, wasting resources")
}

Expected Results

With Circuit Breaker (under 5-second service outage):

Requests during closed: ~1000 attempts (fail fast)
Requests during open: ~900 rejected (0ms latency)
Total throughput: ~1900 requests/sec

Without Circuit Breaker (naive approach):

All ~1900 requests attempt call
Each waits for timeout (~100ms default)
Effective throughput: ~10 requests/sec
19x throughput improvement with circuit breaker

Best Practices Summary

Always use circuit breakers for downstream service calls
Combine with retries: CB prevents cascading, retries handle transient failures
Add rate limiting as backpressure on yourself
Shed load (return 503) under extreme load, don't queue indefinitely
Use bulkheads for isolation when calling multiple services
Propagate contexts with proper timeout budgets
Monitor health probes and gracefully degrade when dependencies fail
Add jitter to all retry intervals
Set reasonable timeouts (~5-30 seconds) per service SLA
Test failure scenarios with chaos engineering

The combination of these patterns transforms your Go services from fragile to resilient, maintaining performance even when dependencies fail.

On this page