Deep Dive: Goroutines, Threads, and Processes in Go

December 16, 2025

Deep Dive: Goroutines, Threads, and Processes in Go

Introduction

Go's concurrency model is one of its most powerful features, built around the concept of goroutines. To truly understand goroutines, we need to explore how they relate to operating system threads and processes. This deep dive examines the architecture, implementation, and practical implications of Go's concurrency primitives.

Understanding the Hierarchy: Process → Thread → Goroutine

Processes: The Foundation

A process is an instance of a program in execution. It represents a complete, independent execution environment with:

Memory Space: Each process has its own virtual address space
System Resources: File descriptors, network connections, signal handlers
Security Context: User ID, group ID, capabilities
Execution State: Program counter, registers, stack

// Example: Creating a new process in Go
package main
 
import (
    "fmt"
    "os"
    "os/exec"
)
 
func main() {
    cmd := exec.Command("ls", "-la")
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr
 
    // This creates a new OS process
    err := cmd.Run()
    if err != nil {
        fmt.Printf("Error: %v\n", err)
    }
}

Key Characteristics:

Heavy-weight: Creating a process involves significant overhead
Isolation: Processes are isolated from each other by default
Communication: Inter-process communication (IPC) requires special mechanisms (pipes, sockets, shared memory)

Threads: Lightweight Execution Units

Threads are the smallest unit of execution that the OS scheduler manages. Multiple threads within a process share:

Memory Space: All threads can access the same memory
File Descriptors: Open files are shared
Signal Handlers: Process-wide signal handling

But each thread maintains its own:

Stack
Program counter
Register set
Thread-local storage

// Go doesn't directly expose OS thread creation, but we can observe thread behavior
package main
 
import (
    "fmt"
    "runtime"
    "sync"
)
 
func main() {
    // Get the number of OS threads
    fmt.Printf("Initial threads: %d\n", runtime.NumGoroutine())
 
    // Force Go to use multiple OS threads
    runtime.GOMAXPROCS(4)
 
    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            // This might run on different OS threads
            fmt.Printf("Goroutine %d on thread\n", id)
        }(i)
    }
    wg.Wait()
}

Goroutines: Go's Innovation

Goroutines are lightweight threads managed by the Go runtime, not the OS. They're multiplexed onto OS threads by Go's scheduler.

package main
 
import (
    "fmt"
    "runtime"
    "time"
)
 
func expensiveOperation(id int) {
    // Simulate work
    time.Sleep(100 * time.Millisecond)
    fmt.Printf("Goroutine %d completed\n", id)
}
 
func main() {
    // A goroutine starts with just 2KB of stack space
    fmt.Printf("Number of CPUs: %d\n", runtime.NumCPU())
    fmt.Printf("GOMAXPROCS: %d\n", runtime.GOMAXPROCS(0))
 
    // Launch 1000 goroutines - try this with OS threads!
    for i := 0; i < 1000; i++ {
        go expensiveOperation(i)
    }
 
    time.Sleep(2 * time.Second)
    fmt.Printf("Number of goroutines: %d\n", runtime.NumGoroutine())
}

The Go Scheduler: M:N Threading Model

Go implements an M:N threading model where M goroutines are multiplexed onto N OS threads.

Core Components

G (Goroutine): The goroutine itself, containing:
- Stack (starts at 2KB, can grow/shrink)
- Instruction pointer
- Other metadata for scheduling
M (Machine): An OS thread that executes goroutines
- Each M is created by the runtime
- Limited by GOMAXPROCS
P (Processor): A scheduling context
- Holds the run queue of goroutines
- Required for an M to execute goroutines
- Number determined by GOMAXPROCS

package main
 
import (
    "fmt"
    "runtime"
    "sync"
    "time"
)
 
func monitorRuntime() {
    for {
        fmt.Printf("Goroutines: %d | Threads: %d | GOMAXPROCS: %d\n",
            runtime.NumGoroutine(),
            runtime.NumCPU(),
            runtime.GOMAXPROCS(0))
        time.Sleep(1 * time.Second)
    }
}
 
func cpuIntensive(wg *sync.WaitGroup, id int) {
    defer wg.Done()
 
    // Simulate CPU-intensive work
    sum := 0
    for i := 0; i < 1000000000; i++ {
        sum += i
    }
    fmt.Printf("Goroutine %d finished with sum %d\n", id, sum)
}
 
func main() {
    runtime.GOMAXPROCS(2) // Limit to 2 OS threads
 
    go monitorRuntime()
 
    var wg sync.WaitGroup
 
    // Launch 10 CPU-intensive goroutines
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go cpuIntensive(&wg, i)
    }
 
    wg.Wait()
}

Scheduling Algorithm

Go uses a work-stealing scheduler with the following characteristics:

Local Run Queue: Each P has its own local run queue (capacity 256)
Global Run Queue: Overflow and new goroutines may go here
Work Stealing: When a P's local queue is empty, it can steal from other Ps

package main
 
import (
    "fmt"
    "runtime"
    "sync"
    "time"
)
 
func demonstrateScheduling() {
    runtime.GOMAXPROCS(2) // 2 Ps available
 
    var wg sync.WaitGroup
 
    // Create goroutines that will be distributed across Ps
    for i := 0; i < 20; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
 
            // Get current P
            fmt.Printf("Goroutine %d starting\n", id)
 
            // Do some work
            time.Sleep(10 * time.Millisecond)
 
            // Yield to scheduler
            runtime.Gosched()
 
            fmt.Printf("Goroutine %d resuming\n", id)
        }(i)
    }
 
    wg.Wait()
}
 
func main() {
    demonstrateScheduling()
}

Preemption and Cooperation

Cooperative Scheduling (Pre-Go 1.14)

Originally, Go used cooperative scheduling where goroutines yielded control at specific points:

Channel operations
System calls
Function calls (stack growth check)
runtime.Gosched()

Asynchronous Preemption (Go 1.14+)

Modern Go implements asynchronous preemption using signals:

package main
 
import (
    "fmt"
    "runtime"
    "time"
)
 
func infiniteLoop(id int) {
    i := 0
    // Pre-1.14: This would block other goroutines on the same P
    // Post-1.14: This gets preempted
    for {
        i++
        if i%1000000000 == 0 {
            fmt.Printf("Goroutine %d: iteration %d\n", id, i)
        }
    }
}
 
func main() {
    runtime.GOMAXPROCS(1) // Force all goroutines on one P
 
    // Launch multiple infinite loops
    for i := 0; i < 3; i++ {
        go infiniteLoop(i)
    }
 
    // Monitor that all goroutines get execution time
    for i := 0; i < 10; i++ {
        time.Sleep(1 * time.Second)
        fmt.Printf("Main: %d seconds elapsed, %d goroutines\n",
            i+1, runtime.NumGoroutine())
    }
}

Memory Model and Synchronization

Stack Management

Goroutines start with a small stack (2KB) that grows and shrinks dynamically:

package main
 
import (
    "fmt"
    "runtime"
)
 
func recursiveFunction(depth int, maxDepth int) {
    if depth >= maxDepth {
        var m runtime.MemStats
        runtime.ReadMemStats(&m)
        fmt.Printf("Stack size at depth %d: ~%d KB\n",
            depth, m.StackSys/1024)
        return
    }
 
    // Local variable to consume stack space
    var array [1024]byte
    _ = array
 
    recursiveFunction(depth+1, maxDepth)
}
 
func main() {
    // Observe stack growth
    recursiveFunction(0, 100)
    recursiveFunction(0, 1000)
    recursiveFunction(0, 10000)
}

Channel-Based Synchronization

Channels are Go's primary synchronization primitive:

package main
 
import (
    "fmt"
    "time"
)
 
func producer(ch chan<- int) {
    for i := 0; i < 10; i++ {
        ch <- i
        time.Sleep(100 * time.Millisecond)
    }
    close(ch)
}
 
func consumer(id int, ch <-chan int, done chan<- bool) {
    for val := range ch {
        fmt.Printf("Consumer %d received: %d\n", id, val)
    }
    done <- true
}
 
func main() {
    dataChannel := make(chan int, 3) // Buffered channel
    doneChannel := make(chan bool, 2)
 
    // Start producer
    go producer(dataChannel)
 
    // Start multiple consumers
    go consumer(1, dataChannel, doneChannel)
    go consumer(2, dataChannel, doneChannel)
 
    // Wait for consumers to finish
    <-doneChannel
    <-doneChannel
}

Performance Characteristics

Goroutine vs Thread Creation Benchmark

package main
 
import (
    "fmt"
    "sync"
    "testing"
    "time"
)
 
func BenchmarkGoroutineCreation(b *testing.B) {
    for i := 0; i < b.N; i++ {
        var wg sync.WaitGroup
        wg.Add(1)
        go func() {
            wg.Done()
        }()
        wg.Wait()
    }
}
 
func measureGoroutineOverhead() {
    start := time.Now()
 
    var wg sync.WaitGroup
    numGoroutines := 100000
 
    for i := 0; i < numGoroutines; i++ {
        wg.Add(1)
        go func(id int) {
            // Minimal work
            _ = id
            wg.Done()
        }(i)
    }
 
    wg.Wait()
    elapsed := time.Since(start)
 
    fmt.Printf("Created %d goroutines in %v\n", numGoroutines, elapsed)
    fmt.Printf("Average time per goroutine: %v\n",
        elapsed/time.Duration(numGoroutines))
}
 
func main() {
    measureGoroutineOverhead()
}

Context Switching Comparison

package main
 
import (
    "fmt"
    "runtime"
    "sync"
    "time"
)
 
func contextSwitchTest(numGoroutines int, iterations int) {
    runtime.GOMAXPROCS(1) // Force context switching
 
    ch := make(chan struct{})
    var wg sync.WaitGroup
 
    start := time.Now()
 
    for i := 0; i < numGoroutines; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for j := 0; j < iterations; j++ {
                ch <- struct{}{}
            }
        }()
    }
 
    go func() {
        for i := 0; i < numGoroutines*iterations; i++ {
            <-ch
        }
    }()
 
    wg.Wait()
    elapsed := time.Since(start)
 
    totalSwitches := numGoroutines * iterations
    fmt.Printf("Total context switches: %d\n", totalSwitches)
    fmt.Printf("Total time: %v\n", elapsed)
    fmt.Printf("Average time per switch: %v\n",
        elapsed/time.Duration(totalSwitches))
}
 
func main() {
    contextSwitchTest(100, 1000)
}

Common Patterns and Best Practices

Worker Pool Pattern

package main
 
import (
    "fmt"
    "sync"
    "time"
)
 
type Job struct {
    ID   int
    Data string
}
 
type Result struct {
    JobID  int
    Output string
}
 
func worker(id int, jobs <-chan Job, results chan<- Result) {
    for job := range jobs {
        // Simulate processing
        time.Sleep(100 * time.Millisecond)
        results <- Result{
            JobID:  job.ID,
            Output: fmt.Sprintf("Processed by worker %d: %s", id, job.Data),
        }
    }
}
 
func main() {
    numWorkers := 5
    numJobs := 20
 
    jobs := make(chan Job, numJobs)
    results := make(chan Result, numJobs)
 
    // Start workers
    var wg sync.WaitGroup
    for w := 0; w < numWorkers; w++ {
        wg.Add(1)
        go func(workerID int) {
            defer wg.Done()
            worker(workerID, jobs, results)
        }(w)
    }
 
    // Send jobs
    for j := 0; j < numJobs; j++ {
        jobs <- Job{ID: j, Data: fmt.Sprintf("data-%d", j)}
    }
    close(jobs)
 
    // Collect results
    go func() {
        wg.Wait()
        close(results)
    }()
 
    for result := range results {
        fmt.Printf("Result: %+v\n", result)
    }
}

Fan-Out/Fan-In Pattern

package main
 
import (
    "fmt"
    "sync"
    "time"
)
 
func producer(id int, ch chan<- int) {
    for i := 0; i < 5; i++ {
        ch <- id*10 + i
        time.Sleep(100 * time.Millisecond)
    }
}
 
func fanOut(in <-chan int, workerCount int) []<-chan int {
    channels := make([]<-chan int, workerCount)
 
    for i := 0; i < workerCount; i++ {
        ch := make(chan int)
        channels[i] = ch
 
        go func(workerCh chan<- int) {
            for val := range in {
                // Process and forward
                workerCh <- val * val
            }
            close(workerCh)
        }(ch)
    }
 
    return channels
}
 
func fanIn(channels ...<-chan int) <-chan int {
    out := make(chan int)
    var wg sync.WaitGroup
 
    for _, ch := range channels {
        wg.Add(1)
        go func(c <-chan int) {
            defer wg.Done()
            for val := range c {
                out <- val
            }
        }(ch)
    }
 
    go func() {
        wg.Wait()
        close(out)
    }()
 
    return out
}
 
func main() {
    input := make(chan int)
 
    // Start producer
    go func() {
        producer(1, input)
        close(input)
    }()
 
    // Fan out to 3 workers
    workers := fanOut(input, 3)
 
    // Fan in results
    results := fanIn(workers...)
 
    // Collect results
    for result := range results {
        fmt.Printf("Result: %d\n", result)
    }
}

Debugging and Profiling

Runtime Statistics

package main
 
import (
    "fmt"
    "runtime"
    "runtime/debug"
    "time"
)
 
func printRuntimeStats() {
    var m runtime.MemStats
    runtime.ReadMemStats(&m)
 
    fmt.Printf("\n=== Runtime Statistics ===\n")
    fmt.Printf("Goroutines: %d\n", runtime.NumGoroutine())
    fmt.Printf("OS Threads: %d\n", runtime.GOMAXPROCS(0))
    fmt.Printf("Memory Allocated: %d MB\n", m.Alloc/1024/1024)
    fmt.Printf("Memory System: %d MB\n", m.Sys/1024/1024)
    fmt.Printf("GC Runs: %d\n", m.NumGC)
    fmt.Printf("Stack in use: %d KB\n", m.StackInuse/1024)
}
 
func main() {
    // Set GC percentage
    debug.SetGCPercent(100)
 
    // Monitor runtime stats
    ticker := time.NewTicker(2 * time.Second)
    defer ticker.Stop()
 
    go func() {
        for range ticker.C {
            printRuntimeStats()
        }
    }()
 
    // Create some goroutines
    for i := 0; i < 100; i++ {
        go func(id int) {
            time.Sleep(10 * time.Second)
        }(i)
    }
 
    time.Sleep(15 * time.Second)
}

Detecting Goroutine Leaks

package main
 
import (
    "fmt"
    "runtime"
    "time"
)
 
func leakyFunction() {
    ch := make(chan int) // Unbuffered channel
 
    go func() {
        val := <-ch // This will block forever
        fmt.Printf("Received: %d\n", val)
    }()
 
    // Forgot to send to channel or close it
    // The goroutine will leak
}
 
func detectLeak() {
    initial := runtime.NumGoroutine()
    fmt.Printf("Initial goroutines: %d\n", initial)
 
    for i := 0; i < 10; i++ {
        leakyFunction()
    }
 
    time.Sleep(1 * time.Second)
 
    final := runtime.NumGoroutine()
    fmt.Printf("Final goroutines: %d\n", final)
    fmt.Printf("Leaked goroutines: %d\n", final-initial)
}
 
func fixedFunction() {
    ch := make(chan int, 1) // Buffered channel
 
    go func() {
        select {
        case val := <-ch:
            fmt.Printf("Received: %d\n", val)
        case <-time.After(1 * time.Second):
            fmt.Println("Timeout - no leak")
            return
        }
    }()
}
 
func main() {
    detectLeak()
 
    // Show fixed version
    fmt.Println("\n=== Fixed Version ===")
    initial := runtime.NumGoroutine()
 
    for i := 0; i < 10; i++ {
        fixedFunction()
    }
 
    time.Sleep(2 * time.Second)
    final := runtime.NumGoroutine()
    fmt.Printf("No leak: %d goroutines (started with %d)\n", final, initial)
}

Advanced Topics

System Calls and Blocking

When a goroutine makes a blocking system call, Go's runtime handles it specially:

package main
 
import (
    "fmt"
    "net"
    "runtime"
    "time"
)
 
func blockingIO() {
    // This will cause the M to be parked
    conn, err := net.Dial("tcp", "example.com:80")
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }
    defer conn.Close()
 
    // Write to connection
    conn.Write([]byte("GET / HTTP/1.0\r\n\r\n"))
 
    // Read response
    buffer := make([]byte, 1024)
    conn.Read(buffer)
}
 
func main() {
    runtime.GOMAXPROCS(2)
 
    fmt.Printf("Starting with %d OS threads\n", runtime.GOMAXPROCS(0))
 
    // Launch multiple goroutines doing blocking I/O
    for i := 0; i < 10; i++ {
        go blockingIO()
    }
 
    // Monitor thread creation
    for i := 0; i < 5; i++ {
        time.Sleep(1 * time.Second)
        fmt.Printf("Goroutines: %d\n", runtime.NumGoroutine())
    }
}

CGO and System Thread Binding

package main
 
/*
#include <pthread.h>
#include <stdio.h>
 
void print_thread_id() {
    pthread_t tid = pthread_self();
    printf("C thread ID: %lu\n", (unsigned long)tid);
}
*/
import "C"
import (
    "fmt"
    "runtime"
)
 
func main() {
    runtime.LockOSThread() // Bind goroutine to OS thread
    defer runtime.UnlockOSThread()
 
    fmt.Println("Goroutine locked to OS thread")
    C.print_thread_id()
 
    // This goroutine will always run on the same OS thread
    for i := 0; i < 5; i++ {
        fmt.Printf("Iteration %d\n", i)
        C.print_thread_id()
    }
}

Performance Optimization Strategies

1. Proper GOMAXPROCS Setting

package main
 
import (
    "fmt"
    "runtime"
    "sync"
    "time"
)
 
func cpuBoundWork(id int, wg *sync.WaitGroup) {
    defer wg.Done()
 
    // CPU-intensive work
    sum := 0
    for i := 0; i < 1000000000; i++ {
        sum += i
    }
}
 
func benchmark(maxProcs int) time.Duration {
    runtime.GOMAXPROCS(maxProcs)
 
    var wg sync.WaitGroup
    start := time.Now()
 
    numGoroutines := 100
    for i := 0; i < numGoroutines; i++ {
        wg.Add(1)
        go cpuBoundWork(i, &wg)
    }
 
    wg.Wait()
    return time.Since(start)
}
 
func main() {
    numCPU := runtime.NumCPU()
    fmt.Printf("Number of CPUs: %d\n\n", numCPU)
 
    for i := 1; i <= numCPU*2; i++ {
        duration := benchmark(i)
        fmt.Printf("GOMAXPROCS=%d: %v\n", i, duration)
    }
}

2. Goroutine Pool Management

package main
 
import (
    "fmt"
    "sync"
    "time"
)
 
type Pool struct {
    work chan func()
    wg   sync.WaitGroup
}
 
func NewPool(size int) *Pool {
    pool := &Pool{
        work: make(chan func(), 100),
    }
 
    pool.wg.Add(size)
    for i := 0; i < size; i++ {
        go pool.worker()
    }
 
    return pool
}
 
func (p *Pool) worker() {
    defer p.wg.Done()
    for task := range p.work {
        task()
    }
}
 
func (p *Pool) Submit(task func()) {
    p.work <- task
}
 
func (p *Pool) Close() {
    close(p.work)
    p.wg.Wait()
}
 
func main() {
    pool := NewPool(10) // Limit concurrent goroutines
 
    // Submit many tasks
    for i := 0; i < 1000; i++ {
        id := i
        pool.Submit(func() {
            time.Sleep(10 * time.Millisecond)
            fmt.Printf("Task %d completed\n", id)
        })
    }
 
    pool.Close()
}

Conclusion

Go's concurrency model, built on goroutines, represents a significant advancement in concurrent programming. By abstracting OS threads and implementing an efficient M:N scheduler, Go enables developers to write highly concurrent programs without the traditional complexity and overhead.

Key takeaways:

Goroutines are cheap: 2KB initial stack vs MB for OS threads
Scheduling is sophisticated: Work-stealing, preemption, and dynamic stack management
Communication over sharing: Channels provide safe communication between goroutines
Runtime does heavy lifting: Automatic management of threads and scheduling

The combination of lightweight goroutines, channels for communication, and a sophisticated runtime makes Go particularly well-suited for:

Network servers handling thousands of connections
Parallel data processing pipelines
Microservices and distributed systems
Real-time systems with predictable latency requirements

Understanding the underlying mechanics of goroutines, threads, and processes allows developers to write more efficient concurrent programs and debug complex synchronization issues effectively.

Deep Dive: Goroutines, Threads, and Processes in Go

Deep Dive: Goroutines, Threads, and Processes in Go

Introduction

Understanding the Hierarchy: Process → Thread → Goroutine

Processes: The Foundation

Threads: Lightweight Execution Units

Goroutines: Go's Innovation

The Go Scheduler: M:N Threading Model

Core Components

Scheduling Algorithm

Preemption and Cooperation

Cooperative Scheduling (Pre-Go 1.14)

Asynchronous Preemption (Go 1.14+)

Memory Model and Synchronization

Stack Management

Channel-Based Synchronization

Performance Characteristics

Goroutine vs Thread Creation Benchmark

Context Switching Comparison

Common Patterns and Best Practices

Worker Pool Pattern

Fan-Out/Fan-In Pattern

Debugging and Profiling

Runtime Statistics

Detecting Goroutine Leaks

Advanced Topics

System Calls and Blocking

CGO and System Thread Binding

Performance Optimization Strategies

1. Proper GOMAXPROCS Setting

2. Goroutine Pool Management

Conclusion

References and Further Reading