CMU Computer Systems: Cocurrent Programming

71 阅读3分钟

Classical problem classes of concurrent programs

  • Races: outcome depends on arbitrary scheduling decisions elsewhere in the system
  • Deadlock: improper resource allocation prevents forward progress
  • Livelock/ Starvation/Fairness: external events and/ or system scheduling decisions can prevent sub-task progress

Iterative Servers

  • Iterative servers process one request at a time
  • Second Client is Blocked
    • Second client attempts to connect to iterative server
    • Call to connect returns
      • Even though connection not yet accepted
      • Server side TCP manager queues request
      • Feature known as "TCP listen backlog"
    • Call to rio_writen returns
      • Server side TCP manager buffers input data
    • Calll to rio_readlineb blocks
      • Server hasn't written anything for it to read
  • Fundamental Flaw of Iterative Servers
    • Client 1 blocks waiting for user to type in data
    • Server blocks waiting for data from Client 1
    • Client 2 blocks waiting to read from server

Approaches for Writing Concurrent Servers

  • Process-based
    • Kernel automatically interleaves multiple logical flows
    • Each flow has its own private address space
  • Event-based
    • Programmer manually interleaves multiple logical flows
    • All flows share the same address space
    • Uses technique called I/O multiplexing
  • Thread-based
    • Kernel automatically interleaves multiple logical flows
    • Each flow shares the same address space
    • Hybrid of process-based and event-based

Issues with Process-based Servers

  • Listening server process must reap zombie children
    • to avoid fatal memory leak
  • Parent process must close its copy of connfd
    • Kernel keeps reference count for each socket/open file
    • After fork, refcnt (connfd) = 2
    • Connection will not be closed until refcnt (connfd) = 0

Pros and Cons of Process-based Servers

  • Handle multiple connections concurrently
  • Clean sharing model
    • descriptor (no)
    • file tables (yes)
    • global variables (no)
  • Simple and straightforward
  • Additional overhead for process control
  • Nontrivial to share data between processes
    • Requires IPC (interprocess communication) mechanisms
      • FIFO's (named pipes), System V shared memory and semaphore

Event-based Servers

  • Server maintains set of active connections

    • Array of connfd's
  • Repeat

    • Determine which descriptors (connfd's or listenfd) have pending inputs
      • e.g., using select or epoll functions
      • arrival of pending input is an event
    • If listenfd has input, then accept connection
      • and add new connfd to array
    • Service all connfd's with pending inputs
  • Details for select-based server in book

Pros and Cons of Event-based Servers

  • One logical control flow address space
  • Can single-step with a debugger
  • No process or thread control overhead
    • Design of choice for high-performance Web servers and search engines.
  • Significantly more complex to code than process- or thread-based designs
  • Hard to provide fine-grained concurrency
    • E.g., how to deal with partial HTTP request headers
  • Cannot take advantages of multi-core
    • Single thread of control

Process

  • Traditional View
    • Process = process context + code, data, and stack
  • Alternate View of a Process
    • Process = thread + code, data, and kernel context

A Process With Multiple Threads

  • Multiple threads can be associated with a process
    • Each thread has its own logical control flow
    • Each thread shares the same code, data, and kernel context
    • Each thread has its own stack for local variables
      • but not protected from other threads
    • Each thread has its own thread id

Logical View of Threads

  • Threads associated with process form a pool of peers
    • Unlike processes which form a tree hierarchy
    image.png

Threads vs. Processes

  • Similar
    • Each has its own logical control flow
    • Each can run concurrently with others (possibly on different cores)
    • Each is context switched
  • Different
    • Threads share all code and data (except local stacks)
      • Processes do not
    • Threads are somewhat less expensive than processes
      • Process control (creating and reaping) twice as expensive as thread control

Thread-based Server Execution Model

  • Each client handled by individual peer thread

  • Threads share all process state except TID

  • Each thread has a separate stack for local variables

    image.png

Thread-Based Concurrent Server (cont)

  • Run thread in "detached" mode
    • Runs independently of other threads
    • Reaped automatically (by kernel) when it terminates
  • Free storage allocated to hold connfd
  • Close connfd (important!)

Issues With Thread-Based Servers

  • Must run "detached" to avoid memory leak
    • At any point in time, a thread is either joinable or detached
    • Joinable thread can be reaped and killed by other threads
    • Detached thread cannot be reaped or killed by other threads
    • Default state is joinable
  • Must be careful to avoid unintended sharing
  • All functions called by a thread must be thread-safe

Pros and Cons of Thread-Based Designs

  • Easy to share data structures between threads
  • Threads are more efficient than processes
  • Unintentional sharing can introduce subtle and hard-to-reproduce errors
    • The ease with which data can be shared is both the greatest strength and the greatest weakness of threads
    • Hard to know which data shared & which private
    • Hard to detect by testing

Summary: Approaches to Concurrency

  • Process-based
    • Hard to share resources: Easy to avoid unintended sharing
    • High overhead in adding/removing clients
  • Event-based
    • Tedious and low level
    • Total control over scheduling
    • Very low overhead
    • Cannot create as fine grained a level of concurrency
    • Does not make use of multi-core
  • Thread-based
    • Easy to share resources
    • Medium overhead
    • Not much control over scheduling policies
    • Difficult to debug