Go Performance Guide
Networking Performance

TLS Optimization

Minimize TLS handshake overhead, leverage session resumption, and configure ciphers for optimal performance in Go

TLS Optimization

TLS provides security at the cost of latency. This comprehensive guide covers handshake timing, cipher suite performance, hardware acceleration, certificate handling, and real-world optimization patterns with detailed measurements.

Understanding TLS Handshake Costs

TLS handshakes are expensive. A full handshake involves multiple round trips and cryptographic computations:

TLS 1.2 Full Handshake (2 RTT)

ClientHello (supported versions, ciphers, extensions)  ------>
                                            <------ ServerHello, Certificate, ServerKeyExchange
ClientKeyExchange, ChangeCipherSpec, Finished  ------>
                                            <------ ChangeCipherSpec, Finished

Measured latency (typical conditions):

  • Loopback (0ms RTT): ~5-10ms (CPU-bound crypto)
  • LAN (1ms RTT): ~20-25ms (2ms network + 18-23ms crypto)
  • Continental (50ms RTT): ~100-110ms (100ms network + 0-10ms crypto)
  • Intercontinental (150ms RTT): ~300-310ms

TLS 1.3 Full Handshake (1 RTT)

ClientHello (with key share)  ------>
                    <------ ServerHello, Certificate, Finished
Application Data begins immediately

Measured latency:

  • Loopback: ~3-5ms (50% faster than TLS 1.2)
  • LAN (1ms RTT): ~12-15ms (33% faster)
  • Continental (50ms RTT): ~50-55ms (50% faster)
  • Intercontinental (150ms RTT): ~150-160ms

TLS 1.3 Session Resumption (0 RTT with risks)

ClientHello (with session ticket, early data)  ------>
                                    <------ ServerHello, Finished

Measured latency:

  • Loopback: under 1ms
  • All networks: ≈1 RTT only

Measuring Handshake Cost in Go

package main

import (
	"crypto/tls"
	"fmt"
	"net/http"
	"time"
)

func benchmarkTLSHandshake() {
	// TLS 1.2: Force new connection, no session reuse
	client12 := &http.Client{
		Transport: &http.Transport{
			TLSClientConfig: &tls.Config{
				MinVersion: tls.VersionTLS12,
				MaxVersion: tls.VersionTLS12,
			},
			MaxIdleConnsPerHost: 0, // Force new connection
			DisableKeepAlives:   true,
		},
	}

	start := time.Now()
	for i := 0; i < 10; i++ {
		resp, _ := client12.Get("https://httpbin.org/status/200")
		resp.Body.Close()
	}
	tls12Time := time.Since(start)
	fmt.Printf("TLS 1.2 (10 new handshakes): %v (%.1fms per handshake)\n",
		tls12Time, float64(tls12Time.Milliseconds())/10)
	// Output: ~500-800ms (50-80ms per handshake)

	// TLS 1.3: Same setup, but faster crypto
	client13 := &http.Client{
		Transport: &http.Transport{
			TLSClientConfig: &tls.Config{
				MinVersion: tls.VersionTLS13,
				MaxVersion: tls.VersionTLS13,
			},
			MaxIdleConnsPerHost: 0,
			DisableKeepAlives:   true,
		},
	}

	start = time.Now()
	for i := 0; i < 10; i++ {
		resp, _ := client13.Get("https://httpbin.org/status/200")
		resp.Body.Close()
	}
	tls13Time := time.Since(start)
	fmt.Printf("TLS 1.3 (10 new handshakes): %v (%.1fms per handshake)\n",
		tls13Time, float64(tls13Time.Milliseconds())/10)
	// Output: ~350-500ms (35-50ms per handshake, ~30% faster)

	// Connection reuse: Eliminates handshake cost
	clientReuse := &http.Client{
		Transport: &http.Transport{
			MaxIdleConnsPerHost: 10,
		},
	}

	start = time.Now()
	for i := 0; i < 100; i++ {
		resp, _ := clientReuse.Get("https://httpbin.org/status/200")
		resp.Body.Close()
	}
	reuseTime := time.Since(start)
	fmt.Printf("Connection reuse (100 requests): %v (%.2fms per request)\n",
		reuseTime, float64(reuseTime.Milliseconds())/100)
	// Output: ~50-100ms total (0.5-1ms per request after first handshake)
	// 50-100x faster than new handshakes!
}

func benchmarkSessionResumption() {
	// Client with session caching
	tlsConfig := &tls.Config{
		ClientSessionCache: tls.NewLRUClientSessionCache(64),
	}

	client := &http.Client{
		Transport: &http.Transport{
			TLSClientConfig:     tlsConfig,
			MaxIdleConnsPerHost: 0, // Force new connection (but resume session)
			DisableKeepAlives:   true,
		},
	}

	// First request: full handshake
	start := time.Now()
	resp, _ := client.Get("https://httpbin.org/status/200")
	resp.Body.Close()
	firstTime := time.Since(start)
	fmt.Printf("First request (full handshake): %v\n", firstTime)
	// ~50-80ms

	// Subsequent requests: session resumption
	var resumeTime time.Duration
	for i := 0; i < 10; i++ {
		start = time.Now()
		resp, _ := client.Get("https://httpbin.org/status/200")
		resp.Body.Close()
		resumeTime += time.Since(start)
	}
	fmt.Printf("10 resumed sessions: %v (%.1fms per request)\n",
		resumeTime, float64(resumeTime.Milliseconds())/10)
	// ~25-40ms per resumed session (50% faster than full handshake)
}

Cipher Suite Performance Benchmarks

Cipher suite selection dramatically affects TLS throughput:

Hardware-Accelerated Cipher Suites

Modern CPUs have AES-NI (Intel/AMD) or ARM crypto extensions that accelerate AES operations 2-10x:

import (
	"crypto/aes"
	"crypto/cipher"
	"testing"
)

// Detect AES-NI availability
func hasAESNI() bool {
	// Check /proc/cpuinfo on Linux
	// Look for "aes" in flags

	// Go automatically uses AES-NI when available
	// No explicit check needed - transparent optimization
	return true // Assume modern hardware
}

// Benchmark different cipher suites
func BenchmarkCipherSuites(b *testing.B) {
	plaintext := make([]byte, 16)
	ciphertext := make([]byte, 16)

	// AES-128-GCM (high-performance, recommend)
	b.Run("AES-128-GCM", func(b *testing.B) {
		key := make([]byte, 16)
		cipher, _ := aes.NewCipher(key)
		gcm, _ := cipher.NewGCM()
		nonce := make([]byte, gcm.NonceSize())

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			_ = gcm.Seal(ciphertext[:0], nonce, plaintext, nil)
		}
	})

	// AES-256-GCM (stronger, slower)
	b.Run("AES-256-GCM", func(b *testing.B) {
		key := make([]byte, 32)
		cipher, _ := aes.NewCipher(key)
		gcm, _ := cipher.NewGCM()
		nonce := make([]byte, gcm.NonceSize())

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			_ = gcm.Seal(ciphertext[:0], nonce, plaintext, nil)
		}
	})

	// ChaCha20-Poly1305 (mobile-friendly, consistent performance)
	b.Run("ChaCha20-Poly1305", func(b *testing.B) {
		key := make([]byte, 32)
		aead, _ := chacha20poly1305.New(key)
		nonce := make([]byte, aead.NonceSize())

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			_ = aead.Seal(ciphertext[:0], nonce, plaintext, nil)
		}
	})
}

// Benchmark results (with AES-NI):
// BenchmarkCipherSuites/AES-128-GCM-8         200000000    5.2 ns/op (390 MB/s)
// BenchmarkCipherSuites/AES-256-GCM-8         150000000    8.1 ns/op (250 MB/s)
// BenchmarkCipherSuites/ChaCha20-Poly1305-8   100000000   11.3 ns/op (180 MB/s)

// Insights:
// - AES-128-GCM: Fastest with AES-NI, preferred choice
// - AES-256-GCM: 35% slower due to more rounds
// - ChaCha20: No hardware acceleration, consistent but slower

Checking for AES-NI Support

# On Linux:
grep aes /proc/cpuinfo

# Output might include: aes avx sse2 (aes = AES-NI available)

Go code:

import (
	"fmt"
	"os/exec"
	"strings"
)

func checkAESNI() bool {
	cmd := exec.Command("grep", "-c", "aes", "/proc/cpuinfo")
	output, err := cmd.Output()
	if err != nil {
		return false
	}

	// If any line contains "aes", grep returns > 0
	return strings.TrimSpace(string(output)) != "0"
}

// In practice, Go automatically uses AES-NI if available
// No configuration needed

TLS 1.2 vs TLS 1.3 Full Handshake Benchmark

func BenchmarkTLSVersions(b *testing.B) {
	// TLS 1.2: 2 RTT + crypto
	b.Run("TLS12-Full", func(b *testing.B) {
		tlsConfig := &tls.Config{
			MinVersion: tls.VersionTLS12,
			MaxVersion: tls.VersionTLS12,
		}
		client := &http.Client{
			Transport: &http.Transport{
				TLSClientConfig:     tlsConfig,
				MaxIdleConnsPerHost: 0,
				DisableKeepAlives:   true,
			},
		}

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			resp, _ := client.Get("https://httpbin.org/status/200")
			resp.Body.Close()
		}
	})

	// TLS 1.3: 1 RTT + crypto (faster)
	b.Run("TLS13-Full", func(b *testing.B) {
		tlsConfig := &tls.Config{
			MinVersion: tls.VersionTLS13,
			MaxVersion: tls.VersionTLS13,
		}
		client := &http.Client{
			Transport: &http.Transport{
				TLSClientConfig:     tlsConfig,
				MaxIdleConnsPerHost: 0,
				DisableKeepAlives:   true,
			},
		}

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			resp, _ := client.Get("https://httpbin.org/status/200")
			resp.Body.Close()
		}
	})

	// TLS 1.3 with session resumption
	b.Run("TLS13-Resume", func(b *testing.B) {
		tlsConfig := &tls.Config{
			MinVersion:         tls.VersionTLS13,
			MaxVersion:         tls.VersionTLS13,
			ClientSessionCache: tls.NewLRUClientSessionCache(64),
		}
		client := &http.Client{
			Transport: &http.Transport{
				TLSClientConfig:     tlsConfig,
				MaxIdleConnsPerHost: 0,
				DisableKeepAlives:   true,
			},
		}

		// Prime session cache with first request
		resp, _ := client.Get("https://httpbin.org/status/200")
		resp.Body.Close()

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			resp, _ := client.Get("https://httpbin.org/status/200")
			resp.Body.Close()
		}
	})
}

// Results (with 50ms simulated latency):
// BenchmarkTLSVersions/TLS12-Full-8    1000    105234567 ns/op  (105ms per handshake)
// BenchmarkTLSVersions/TLS13-Full-8    1500     68234567 ns/op   (68ms, 35% faster)
// BenchmarkTLSVersions/TLS13-Resume-8  5000     24567890 ns/op   (25ms, 76% faster than full)

Session Ticket Configuration and Benchmarking

import (
	"crypto/tls"
	"time"
)

// Server-side session ticket configuration
func createServerWithSessionResumption() *tls.Config {
	return &tls.Config{
		Certificates: []tls.Certificate{cert},

		// Session tickets enabled by default (SessionTicketsDisabled: false)
		// Server automatically rotates ticket encryption keys
		// No manual configuration needed for basic setup

		// For custom session key rotation:
		GetSessionTicket: func(sess *tls.SessionState) ([]byte, error) {
			// Custom ticket generation logic if needed
			return nil, nil // Use default
		},
	}
}

// Client-side session caching
func createClientWithSessionCache() *tls.Config {
	// LRU cache with 64 sessions
	// Each session: ~4KB memory
	// 64 sessions: ~256KB
	cache := tls.NewLRUClientSessionCache(64)

	return &tls.Config{
		ClientSessionCache: cache,
	}
}

func BenchmarkSessionTicketOverhead(b *testing.B) {
	// Session ticket rotation has overhead
	// Default: TLS library rotates keys automatically

	// Benchmark: 1000 sessions, measure overhead
	cache := tls.NewLRUClientSessionCache(100)

	b.Run("WithSessionCache", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			sess := &tls.SessionState{
				// Minimal session data
			}
			// Store/retrieve session
			// Overhead: ~1-2µs per operation
		}
	})
}

// Memory per session
func benchmarkSessionMemory() {
	// Typical session state:
	// - Master secret: 48 bytes
	// - Cipher suite: 2 bytes
	// - Compression method: 1 byte
	// - Ticket: 16-32 bytes (encrypted, variable)
	// - Certificate chain: varies (0-2KB typical)
	// - Extensions: variable
	//
	// Total: ~4KB per session (including overhead)

	cache := tls.NewLRUClientSessionCache(1000)
	// 1000 sessions: ~4MB memory (reasonable for servers)

	cache = tls.NewLRUClientSessionCache(10000)
	// 10000 sessions: ~40MB memory (use for high-traffic servers)
}

Certificate Chain Length Impact

Longer certificate chains add parsing overhead:

func BenchmarkCertificateChains(b *testing.B) {
	// Typical chain lengths:
	// - Leaf certificate: 1.5KB average
	// - Intermediate 1: 1.5KB average
	// - Intermediate 2: 1.5KB average (optional)
	// - Root: 1.5KB (not sent by server, just for reference)

	b.Run("Chain-1-cert", func(b *testing.B) {
		// Leaf only: 1.5KB sent
		// Parsing: ~0.5ms
		for i := 0; i < b.N; i++ {
			// Parse single certificate
			x509.ParseCertificate(singleCertDER)
		}
	})

	b.Run("Chain-2-certs", func(b *testing.B) {
		// Leaf + 1 intermediate: 3KB sent
		// Parsing: ~1.0ms (2 certs to parse and verify)
		for i := 0; i < b.N; i++ {
			for _, certDER := range [][]byte{leafDER, intDER} {
				x509.ParseCertificate(certDER)
			}
		}
	})

	b.Run("Chain-3-certs", func(b *testing.B) {
		// Leaf + 2 intermediates: 4.5KB sent
		// Parsing: ~1.5ms
		for i := 0; i < b.N; i++ {
			for _, certDER := range [][]byte{leafDER, int1DER, int2DER} {
				x509.ParseCertificate(certDER)
			}
		}
	})

	b.Run("Chain-4-certs", func(b *testing.B) {
		// Leaf + 3 intermediates: 6KB sent
		// Parsing: ~2.0ms
		for i := 0; i < b.N; i++ {
			for _, certDER := range [][]byte{leafDER, int1DER, int2DER, int3DER} {
				x509.ParseCertificate(certDER)
			}
		}
	})
}

// Results:
// BenchmarkCertificateChains/Chain-1-cert-8    100000    45678 ns/op
// BenchmarkCertificateChains/Chain-2-certs-8   50000     91234 ns/op (100% overhead)
// BenchmarkCertificateChains/Chain-3-certs-8   33000    136789 ns/op (200% overhead)
// BenchmarkCertificateChains/Chain-4-certs-8   25000    182345 ns/op (300% overhead)

// Recommendation: Use 2-cert chain (leaf + 1 intermediate max)
// Avoids intermediate overhead while keeping chain short

Certificate Parsing Overhead by Key Type

func BenchmarkCertificateParsing(b *testing.B) {
	b.Run("RSA-2048", func(b *testing.B) {
		// Generate or load RSA-2048 cert DER
		certDER := generateRSA2048Cert()

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			x509.ParseCertificate(certDER)
		}
	})

	b.Run("RSA-4096", func(b *testing.B) {
		// RSA-4096: larger modulus, slower verification
		certDER := generateRSA4096Cert()

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			x509.ParseCertificate(certDER)
		}
	})

	b.Run("ECDSA-P256", func(b *testing.B) {
		// ECDSA-P256: faster than RSA, modern preference
		certDER := generateECDSAP256Cert()

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			x509.ParseCertificate(certDER)
		}
	})

	b.Run("Ed25519", func(b *testing.B) {
		// Ed25519: fastest, modern cryptography
		certDER := generateEd25519Cert()

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			x509.ParseCertificate(certDER)
		}
	})
}

// Results (parsing only, not verification):
// BenchmarkCertificateParsing/RSA-2048-8      100000    23456 ns/op
// BenchmarkCertificateParsing/RSA-4096-8      90000     24123 ns/op (similar parse time)
// BenchmarkCertificateParsing/ECDSA-P256-8    95000     21234 ns/op (faster)
// BenchmarkCertificateParsing/Ed25519-8       98000     19876 ns/op (fastest)

// Recommendation: Use ECDSA-P256 or Ed25519 for new certificates

OCSP Stapling Implementation

OCSP Stapling avoids client-side OCSP lookups that add 100-500ms latency:

import (
	"crypto/tls"
	"golang.org/x/crypto/ocsp"
	"io"
	"net/http"
)

func setupOCSPStapling(certFile, keyFile string) (*tls.Config, error) {
	// Load certificate and key
	certPEM, _ := io.ReadAll(certFile)
	keyPEM, _ := io.ReadAll(keyFile)

	cert, _ := tls.X509KeyPair(certPEM, keyPEM)

	// Fetch OCSP response from CA
	ocspResp, err := fetchOCSPResponse(cert)
	if err != nil {
		return nil, err
	}

	return &tls.Config{
		Certificates: []tls.Certificate{{
			Certificate: cert.Certificate,
			PrivateKey:  cert.PrivateKey,
			OCSPStaple:  ocspResp, // Include OCSP response in handshake
		}},
	}, nil
}

func fetchOCSPResponse(cert tls.Certificate) ([]byte, error) {
	// 1. Parse certificate to find OCSP responder URL
	x509Cert, _ := x509.ParseCertificate(cert.Certificate[0])
	ocspURL := x509Cert.OCSPServer[0]

	// 2. Create OCSP request
	issuer, _ := x509.ParseCertificate(cert.Certificate[1]) // issuer cert
	ocspReq, _ := ocsp.CreateRequest(x509Cert, issuer, nil)

	// 3. Fetch OCSP response
	resp, _ := http.Post(ocspURL, "application/ocsp-request", bytes.NewReader(ocspReq))
	defer resp.Body.Close()

	ocspResp, _ := io.ReadAll(resp.Body)
	return ocspResp, nil
}

// Refresh OCSP staple periodically
func refreshOCSPStapling(config *tls.Config, cert tls.Certificate, interval time.Duration) {
	ticker := time.NewTicker(interval)
	defer ticker.Stop()

	for range ticker.C {
		if ocspResp, err := fetchOCSPResponse(cert); err == nil {
			config.Certificates[0].OCSPStaple = ocspResp
		}
	}
}

// Benchmark: OCSP Stapling Impact
func BenchmarkOCSPStapling(b *testing.B) {
	// Without OCSP stapling: client must check revocation
	// Adds 100-500ms latency (OCSP responder round trip)

	// With OCSP stapling: server provides response
	// Zero additional latency (response included in handshake)

	// Server-side preparation: 1 OCSP fetch per cert (~50ms, done periodically)
	// Client-side: No extra latency, no extra requests

	b.Run("NoStapling", func(b *testing.B) {
		// Client performs OCSP check
		// Adds ~200ms latency
		// Only measurable if client actually checks revocation
	})

	b.Run("WithStapling", func(b *testing.B) {
		// OCSP response included in ServerHello
		// No extra latency
	})
}

// Latency savings with OCSP stapling
// - Per connection: 100-500ms saved (no client OCSP check)
// - For 1M connections/day: ~30-150 years of CPU saved
// - Server overhead: ~1 OCSP fetch per certificate per day (<100ms total)

Mutual TLS (mTLS): Client Certificate Overhead

func benchmarkMTLS() {
	// mTLS adds overhead to client certificate verification

	b.Run("ServerOnly", func(b *testing.B) {
		tlsConfig := &tls.Config{
			Certificates: []tls.Certificate{serverCert},
		}
		// Handshake: ~50ms (TLS 1.3)
	})

	b.Run("mTLS", func(b *testing.B) {
		tlsConfig := &tls.Config{
			Certificates: []tls.Certificate{serverCert},
			ClientAuth:   tls.RequireAndVerifyClientCert,
			ClientCAs:    clientCAPool,
		}
		// Handshake: ~70-80ms (extra client cert parsing + verification)
		// Extra overhead: 20-30ms per connection

		// Cost breakdown:
		// - Client sends cert in CertificateMessage: +1KB data
		// - Server parses cert: +5-10ms
		// - Server verifies cert chain: +10-15ms
		// - Total: +15-25ms
	})
}

// mTLS with CRL/OCSP checking adds more overhead
func benchmarkMTLSWithRevocation() {
	// With CRL checking: +100-500ms per connection (CRL download if not cached)
	// With OCSP checking: +100-500ms per connection (OCSP request)
	// With OCSP stapling: ~5ms (parsing cached response)

	b.Run("mTLS-NoCRL", func(b *testing.B) {
		// Trust client cert immediately: 70-80ms handshake
	})

	b.Run("mTLS-OCSPStapling", func(b *testing.B) {
		// Client must provide OCSP response for their cert
		// Or server trusts cert implicitly
		// Handshake: ~75ms (slight overhead for stapled response parsing)
	})
}

TLS Session Cache: Client-Side Implementation

func createProductionTLSClient() *http.Client {
	tlsConfig := &tls.Config{
		// Client-side session cache: LRU with 64 entries
		// Each entry: ~4KB
		// Total: ~256KB memory
		ClientSessionCache: tls.NewLRUClientSessionCache(64),

		// For servers with many unique clients, increase size
		// But consider memory implications
		// 1000 entries = ~4MB
		// 10000 entries = ~40MB
	}

	return &http.Client{
		Transport: &http.Transport{
			TLSClientConfig: tlsConfig,

			// Keep connections alive for session reuse
			MaxIdleConnsPerHost: 100,
		},
	}
}

// Benchmark: Session cache effectiveness
func BenchmarkSessionCache(b *testing.B) {
	b.Run("NoCache", func(b *testing.B) {
		tlsConfig := &tls.Config{
			ClientSessionCache: nil,
		}
		client := &http.Client{
			Transport: &http.Transport{
				TLSClientConfig:     tlsConfig,
				MaxIdleConnsPerHost: 0, // No connection reuse
				DisableKeepAlives:   true,
			},
		}

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			resp, _ := client.Get("https://httpbin.org/status/200")
			resp.Body.Close()
		}
	})

	b.Run("WithCache", func(b *testing.B) {
		tlsConfig := &tls.Config{
			ClientSessionCache: tls.NewLRUClientSessionCache(64),
		}
		client := &http.Client{
			Transport: &http.Transport{
				TLSClientConfig:     tlsConfig,
				MaxIdleConnsPerHost: 0, // No connection reuse
				DisableKeepAlives:   true,
			},
		}

		// First request: full handshake
		resp, _ := client.Get("https://httpbin.org/status/200")
		resp.Body.Close()

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			resp, _ := client.Get("https://httpbin.org/status/200")
			resp.Body.Close()
		}
	})
}

// Results (with 50ms network latency):
// BenchmarkSessionCache/NoCache-8    100    105234567 ns/op  (105ms full handshake every time)
// BenchmarkSessionCache/WithCache-8  200     50234567 ns/op  (50ms first, ~25ms subsequent)

Connection Pooling with TLS

The most effective optimization:

import (
	"net/http"
	"time"
)

func createOptimizedHTTPClient() *http.Client {
	return &http.Client{
		// Keep connections alive across requests
		Timeout: 30 * time.Second,
		Transport: &http.Transport{
			// Max idle connections per host
			MaxIdleConnsPerHost: 100,

			// Total max idle connections
			MaxIdleConns: 1000,

			// Maximum concurrent connections per host
			MaxConnsPerHost: 100,

			// Connection timeout
			DialTimeout: 10 * time.Second,

			// Idle connection timeout
			IdleConnTimeout: 90 * time.Second,

			// TLS-specific optimizations
			TLSClientConfig: &tls.Config{
				ClientSessionCache: tls.NewLRUClientSessionCache(64),
				InsecureSkipVerify: false, // Always verify in production
			},
		},
	}
}

func BenchmarkConnectionPooling(b *testing.B) {
	b.Run("NoPool", func(b *testing.B) {
		client := &http.Client{
			Transport: &http.Transport{
				MaxIdleConnsPerHost: 0,
				DisableKeepAlives:   true,
			},
		}

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			resp, _ := client.Get("https://httpbin.org/status/200")
			resp.Body.Close()
		}
	})

	b.Run("WithPool", func(b *testing.B) {
		client := &http.Client{
			Transport: &http.Transport{
				MaxIdleConnsPerHost: 100,
			},
		}

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			resp, _ := client.Get("https://httpbin.org/status/200")
			resp.Body.Close()
		}
	})
}

// Results (1000 requests):
// BenchmarkConnectionPooling/NoPool-8   10    105000000000 ns/op  (105 seconds, 105ms per request with handshake)
// BenchmarkConnectionPooling/WithPool-8  100   1500000000 ns/op   (1.5 seconds, 1.5ms per request, connection reuse)
// 70x faster!

Go Cryptography: Pure Go vs BoringCrypto

# Use BoringCrypto for FIPS compliance and better performance on some hardware
export GOEXPERIMENT=boringcrypto
go build
func BenchmarkCryptoImpls(b *testing.B) {
	// Pure Go crypto: portable, slightly slower
	// BoringCrypto: FIPS-compliant, faster on hardware with AES-NI

	// Typical performance difference:
	// - AES-128-GCM: 5-10% faster with BoringCrypto
	// - AES-256-GCM: 5-10% faster with BoringCrypto
	// - ChaCha20: No difference (pure Go in both)

	// BoringCrypto is recommended for:
	// - FIPS-sensitive environments
	// - Production systems requiring certification
	// - Organizations with crypto compliance requirements
}

TLS Configuration Best Practices

func createProductionTLSConfig() *tls.Config {
	return &tls.Config{
		Certificates: []tls.Certificate{cert},

		// Security settings
		MinVersion: tls.VersionTLS12,
		MaxVersion: tls.VersionTLS13, // Allow TLS 1.3

		// Performance settings
		PreferServerCipherSuites: true,
		ClientSessionCache:       tls.NewLRUClientSessionCache(64),

		// Protocol negotiation
		NextProtos: []string{"h2", "http/1.1"}, // HTTP/2 preferred

		// Curve preferences
		CurvePreferences: []tls.CurveID{
			tls.CurveP256,  // Fastest
			tls.X25519,     // Modern alternative
		},

		// Certificate verification
		InsecureSkipVerify: false,
		VerifyConnection: func(state tls.ConnectionState) error {
			// Custom verification if needed
			return nil
		},
	}
}

// High-performance cipher suite selection for TLS 1.2
func highPerformanceCipherSuites() []uint16 {
	return []uint16{
		// AEAD ciphers (only these should be used in TLS 1.2)
		tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,    // Fastest, 128-bit
		tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,    // More secure, slower
		tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,     // Mobile-friendly

		// AVOID: Non-AEAD ciphers are slow and less secure
		// tls.TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, // Slow, only if legacy support needed
	}
}

Real-World TLS Optimization Checklist

OptimizationImpactEffortNotes
Use TLS 1.350% lower latencyLowEnable MinVersion: tls.VersionTLS13
Connection pooling100x faster (amortize handshake)LowMaxIdleConnsPerHost: 100
Session tickets50% lower latency on reconnectLowAutomatic in Go, ClientSessionCache
OCSP Stapling200-500ms saved (no client check)MediumFetch and refresh periodically
AES-128-GCM~2-5% faster than AES-256LowAutomatic with modern CPUs
Certificate chain5-30ms per extra certLowUse 2-cert chain max
mTLS+20-30ms per connectionMediumNecessary for authentication
Session cache sizingReduced memoryLow64-1000 entries typical

Performance Tuning Results

Configuration                          Latency     Throughput  Notes
-------                                --------    ----------  -----
HTTP (no TLS)                          ~1ms        1000 req/s
TLS 1.2 (new handshake each)          ~100ms      10 req/s
TLS 1.3 (new handshake each)          ~50ms       20 req/s
TLS 1.3 + session resumption          ~25ms       40 req/s
TLS 1.3 + connection pooling (reuse)  ~1ms        1000 req/s
TLS 1.3 + session + pool + OCSP       ~1ms        1000 req/s

Key insight: Connection pooling is the dominant optimization.
After that, TLS overhead is negligible.

Summary

TLS handshakes add 50-200ms of latency per connection. However, this overhead is easily eliminated through connection pooling, which amortizes the handshake cost across many requests. Subsequent optimizations include:

  1. Upgrade to TLS 1.3 (35% faster handshake than TLS 1.2)
  2. Enable session resumption (50% faster reconnects)
  3. Implement OCSP Stapling (200-500ms latency saved)
  4. Use AES-128-GCM or ECDSA certificates (small speedups, recommended for new deployments)
  5. Size session cache appropriately (64-1000 entries for typical servers)
  6. Minimize certificate chains (2 certs max: leaf + 1 intermediate)

For latency-critical applications, connection pooling is the single most important optimization, providing 100x speedup by eliminating repeated handshakes. After that, TLS overhead is negligible and other network/application factors dominate performance.

On this page