DNS Performance

Optimize DNS resolution in Go, implement caching strategies, and reduce lookup latency in production services

Understanding DNS Resolution Overhead

DNS lookups convert domain names (example.com) into IP addresses. Each lookup involves querying nameservers, adding 10-200ms of latency. In a service making thousands of requests, DNS overhead compounds significantly.

DNS Lookup Latency

package main

import (
	"fmt"
	"net"
	"time"
)

func measureDNSLatency() {
	hosts := []string{
		"example.com",
		"google.com",
		"cloudflare.com",
	}

	resolver := &net.Resolver{}

	for _, host := range hosts {
		start := time.Now()
		ips, _ := resolver.LookupIPAddr(context.Background(), host)
		latency := time.Since(start)

		fmt.Printf("%s: %v (%v)\n", host, ips, latency)
	}

	// Typical results:
	// example.com: 25ms
	// google.com: 15ms
	// cloudflare.com: 30ms
	// Average: 20-50ms per lookup
}

// Benchmark: DNS lookup cost
func BenchmarkDNSLookup(b *testing.B) {
	for i := 0; i < b.N; i++ {
		net.LookupIP("example.com")
	}
	// Result: ~50k ops/sec
	// Cost: ~20 microseconds, but network adds 20-50ms latency!
}

Go's DNS Resolution: Pure Go vs CGO

Go offers two DNS resolver implementations:

Pure Go Resolver (Default)

The pure Go resolver runs in the same process, using UDP queries directly.

import (
	"net"
	"os"
)

func usePureGoResolver() {
	// Enable pure Go resolver
	os.Setenv("GODEBUG", "netdns=go")

	// Or force it in code
	net.DefaultResolver = &net.Resolver{
		PreferGo: true,
	}

	// Pure Go resolver characteristics:
	// Pros:
	// - No CGO overhead (faster, no cgo locks)
	// - Fully async, non-blocking
	// - Can handle DNS caching in-process
	//
	// Cons:
	// - Limited timeout control for system resolver
	// - May not honor /etc/hosts consistently
	// - No support for custom DNS servers via system config
}

CGO Resolver (System)

The CGO resolver uses the system's libc resolver.

func useCGOResolver() {
	// Force CGO resolver
	os.Setenv("GODEBUG", "netdns=cgo")

	// Or:
	net.DefaultResolver = &net.Resolver{
		PreferGo: false,
	}

	// CGO resolver characteristics:
	// Pros:
	// - Uses system resolver configuration
	// - Respects /etc/hosts
	// - Supports IPv6 consistently
	//
	// Cons:
	// - CGO overhead (function call marshalling)
	// - Thread blocking during lookup (impacts goroutine scheduling)
	// - Slower for high-concurrency workloads
}

Benchmark: Pure Go vs CGO

import (
	"context"
	"net"
	"testing"
	"time"
)

func BenchmarkDNSResolvers(b *testing.B) {
	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	defer cancel()

	b.Run("PureGo", func(b *testing.B) {
		resolver := &net.Resolver{PreferGo: true}
		b.ResetTimer()

		for i := 0; i < b.N; i++ {
			resolver.LookupIPAddr(ctx, "example.com")
		}
		// Result: ~100k ops/sec (concurrent requests not blocked)
	})

	b.Run("CGO", func(b *testing.B) {
		resolver := &net.Resolver{PreferGo: false}
		b.ResetTimer()

		for i := 0; i < b.N; i++ {
			resolver.LookupIPAddr(ctx, "example.com")
		}
		// Result: ~50k ops/sec (calls block goroutines)
	})
}

Use the pure Go resolver (PreferGo: true) for high-concurrency services. It avoids CGO overhead and allows goroutines to continue running during DNS lookups.

net.Resolver Configuration

Customize DNS behavior with net.Resolver:

import (
	"context"
	"net"
	"time"
)

func createCustomResolver() *net.Resolver {
	return &net.Resolver{
		PreferGo: true, // Use pure Go resolver

		// Dial function for custom DNS server
		Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
			dialer := &net.Dialer{
				Timeout: 5 * time.Second,
				// KeepAlive for DNS connections
				KeepAlive: 30 * time.Second,
			}
			return dialer.DialContext(ctx, network, address)
		},
	}
}

func resolveWithCustomSettings() error {
	resolver := createCustomResolver()
	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	ips, err := resolver.LookupIPAddr(ctx, "example.com")
	if err != nil {
		return err
	}
	// Use ips...
	return nil
}

DNS Caching: Go Doesn't Do It By Default

Go does not cache DNS responses by default. Each lookup queries the system resolver (or DNS server). Implementing application-level caching is essential for high-throughput services.

Simple In-Memory DNS Cache

import (
	"net"
	"sync"
	"time"
)

type DNSCache struct {
	cache map[string]*DNSEntry
	mu    sync.RWMutex
}

type DNSEntry struct {
	IPs       []net.IP
	ExpiresAt time.Time
}

func (c *DNSCache) Resolve(host string) ([]net.IP, error) {
	c.mu.RLock()
	entry, ok := c.cache[host]
	c.mu.RUnlock()

	// Check if cached entry is still valid
	if ok && entry.ExpiresAt.After(time.Now()) {
		return entry.IPs, nil
	}

	// Perform actual DNS lookup
	ips, err := net.LookupIP(host)
	if err != nil {
		return nil, err
	}

	// Cache with TTL
	c.mu.Lock()
	c.cache[host] = &DNSEntry{
		IPs:       ips,
		ExpiresAt: time.Now().Add(5 * time.Minute),
	}
	c.mu.Unlock()

	return ips, nil
}

// Usage
func main() {
	cache := &DNSCache{cache: make(map[string]*DNSEntry)}

	// First call: DNS lookup (~50ms)
	ips, _ := cache.Resolve("example.com")

	// Second call: Cache hit (<1ms)
	ips, _ = cache.Resolve("example.com")
}

TTL-Aware Caching

DNS responses include TTL (Time To Live), indicating how long to cache:

import (
	"net"
	"context"
)

type TTLAwareDNSCache struct {
	resolver *net.Resolver
	cache    map[string]*CachedDNS
	mu       sync.RWMutex
}

type CachedDNS struct {
	IPs    []string
	TTL    time.Duration
	CachedAt time.Time
}

func (c *TTLAwareDNSCache) LookupIP(ctx context.Context, host string) ([]string, error) {
	c.mu.RLock()
	if entry, ok := c.cache[host]; ok {
		if time.Since(entry.CachedAt) < entry.TTL {
			defer c.mu.RUnlock()
			return entry.IPs, nil
		}
	}
	c.mu.RUnlock()

	// Lookup MX records to get TTL
	mxRecords, _ := c.resolver.LookupMX(ctx, host)

	var ips []string
	for _, mx := range mxRecords {
		aRecords, _ := c.resolver.LookupHost(ctx, mx.Host)
		ips = append(ips, aRecords...)
	}

	// For simplicity, use conservative TTL
	ttl := 5 * time.Minute

	c.mu.Lock()
	c.cache[host] = &CachedDNS{
		IPs:      ips,
		TTL:      ttl,
		CachedAt: time.Now(),
	}
	c.mu.Unlock()

	return ips, nil
}

DNS Prefetching

Resolve DNS before you need it:

import (
	"net"
	"time"
)

func prefetchDNS(hosts []string) map[string][]net.IP {
	resolver := &net.Resolver{PreferGo: true}
	results := make(map[string][]net.IP)
	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	for _, host := range hosts {
		ips, _ := resolver.LookupIPAddr(ctx, host)
		results[host] = ips
	}
	return results
}

func init() {
	// Prefetch critical hosts on startup
	criticalHosts := []string{
		"api.example.com",
		"db.example.com",
		"cache.example.com",
	}
	prefetchedIPs := prefetchDNS(criticalHosts)
	// Use prefetchedIPs in requests
}

dnscache Libraries

Popular libraries provide production-ready DNS caching:

Using Dgraph's ristretto (Fast cache)

import (
	"github.com/dgraph-io/ristretto"
	"net"
)

func setupRistrettoCache() {
	cache, _ := ristretto.NewCache(&ristretto.Config{
		NumCounters: 1e7,     // 10M entries
		MaxCost:     1 << 30, // 1GB
		BufferItems: 64,
	})

	// Cache DNS lookups
	resolver := &net.Resolver{}
	cachedLookup := func(host string) ([]net.IP, error) {
		if val, found := cache.Get(host); found {
			return val.([]net.IP), nil
		}

		ips, err := resolver.LookupIP(context.Background(), "ip", host)
		if err == nil {
			cache.Set(host, ips, 100) // Weight: 100
		}
		return ips, err
	}
	_ = cachedLookup
}

Third-party DNS Cache Libraries

import (
	"github.com/miekg/dns"
)

// dnscache: Simple DNS caching library
// github.com/jackc/pgx/v4/stdlib has built-in DNS caching

// coredns: Full-featured DNS resolver with caching
// Can be embedded or run as separate service

DNS-over-HTTPS (DoH) Impact

DoH encrypts DNS queries over HTTPS, adding overhead but improving privacy:

import (
	"net"
	"net/http"
)

func createDoHResolver() *net.Resolver {
	// Using Cloudflare DoH
	return &net.Resolver{
		Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
			// DoH queries go over HTTP/2 (faster than UDP+TCP)
			// But adds TLS handshake on first request
			// Caching becomes even more critical
			return nil, nil
		},
	}
}

// DoH Performance Impact:
// Traditional DNS (UDP): 10-50ms
// DNS over HTTPS: 50-200ms (includes TLS)
//
// DoH is slower but:
// - Privacy from ISP
// - Works through restrictive firewalls
// - Benefits from HTTP/2 connection pooling
//
// Recommendation: Use DoH for privacy-critical services,
// but cache aggressively to avoid latency

Measuring DNS Resolution Time

import (
	"fmt"
	"net"
	"sync"
	"time"
)

type DNSMetrics struct {
	lookups      int64
	totalLatency  time.Duration
	maxLatency    time.Duration
	cacheHits    int64
	cacheMisses  int64
	mu            sync.Mutex
}

func (m *DNSMetrics) recordLookup(latency time.Duration, hit bool) {
	m.mu.Lock()
	defer m.mu.Unlock()

	m.lookups++
	m.totalLatency += latency
	if latency > m.maxLatency {
		m.maxLatency = latency
	}

	if hit {
		m.cacheHits++
	} else {
		m.cacheMisses++
	}
}

func (m *DNSMetrics) String() string {
	m.mu.Lock()
	defer m.mu.Unlock()

	avgLatency := time.Duration(0)
	if m.lookups > 0 {
		avgLatency = m.totalLatency / time.Duration(m.lookups)
	}

	hitRate := float64(0)
	if m.lookups > 0 {
		hitRate = float64(m.cacheHits) / float64(m.lookups) * 100
	}

	return fmt.Sprintf(
		"Lookups: %d, Avg: %v, Max: %v, Hit Rate: %.1f%%",
		m.lookups, avgLatency, m.maxLatency, hitRate,
	)
}

Reducing DNS Lookups in High-Throughput Services

Strategy 1: Batch Operations

func batchLookups(hosts []string) map[string][]net.IP {
	resolver := &net.Resolver{PreferGo: true}
	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	defer cancel()

	results := make(map[string][]net.IP)
	var wg sync.WaitGroup

	for _, host := range hosts {
		wg.Add(1)
		go func(h string) {
			defer wg.Done()
			if ips, err := resolver.LookupIPAddr(ctx, h); err == nil {
				results[h] = ips
			}
		}(host)
	}

	wg.Wait()
	return results
}

Strategy 2: Use IP Addresses Directly

import "net"

// Instead of:
http.Get("https://api.example.com/endpoint")

// Use IP directly if known:
client := &http.Client{
	Transport: &http.Transport{
		// Custom dialer with predefined IPs
		Dial: func(network, addr string) (net.Conn, error) {
			// Map known hosts to IPs
			if addr == "api.example.com:443" {
				addr = "192.0.2.1:443"
			}
			return net.Dial(network, addr)
		},
	},
}

Strategy 3: Connection Pool Per Endpoint

type EndpointPool struct {
	hosts  []string
	pools  map[string]*net.TCPConn
	mu     sync.RWMutex
	cache  *DNSCache
}

func (p *EndpointPool) GetConnection(host string) (net.Conn, error) {
	// Lookup once, reuse connection
	ips, _ := p.cache.Resolve(host)
	// Use first IP from cache
	return net.Dial("tcp", ips[0].String())
}

Production Configuration Example

import (
	"context"
	"net"
	"net/http"
	"time"
)

func setupProductionDNS() *http.Client {
	resolver := &net.Resolver{
		PreferGo: true, // Pure Go for concurrency
		Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
			dialer := &net.Dialer{
				Timeout:   5 * time.Second,
				KeepAlive: 30 * time.Second,
			}
			return dialer.DialContext(ctx, network, address)
		},
	}

	return &http.Client{
		Timeout: 30 * time.Second,
		Transport: &http.Transport{
			MaxIdleConns:        100,
			MaxIdleConnsPerHost: 100,
			DialContext: (&net.Dialer{
				Timeout:   10 * time.Second,
				KeepAlive: 30 * time.Second,
				Resolver:  resolver,
			}).DialContext,
		},
	}
}

Performance Tuning Checklist

Use Pure Go Resolver: Set PreferGo: true for concurrent services
Implement Caching: Cache DNS responses with appropriate TTLs (5-10 minutes typical)
Prefetch Critical Hosts: Resolve essential endpoints on startup
Monitor Metrics: Track lookup latency, cache hit rate, slowest hosts
Batch Lookups: Use concurrent resolution for multiple hosts
Set Timeouts: 5-10 second timeout prevents hanging on broken DNS
Connection Pooling: Reuse connections to eliminate repeated lookups

Summary

DNS resolution adds 10-200ms of latency per lookup. Go's pure Go resolver is optimal for concurrent services, avoiding CGO overhead. Application-level caching is essential since Go doesn't cache by default. Implement TTL-aware caching with 5-10 minute lifetimes, prefetch critical hosts, and batch resolution operations for best performance. Monitoring DNS metrics helps identify and optimize slow resolvers. For services making thousands of requests, DNS optimization can reduce latency by 10-50%.

On this page