Classical problem classes of concurrent programs
- Races: outcome depends on arbitrary scheduling decisions elsewhere in the system
- Deadlock: improper resource allocation prevents forward progress
- Livelock/ Starvation/Fairness: external events and/ or system scheduling decisions can prevent sub-task progress
Iterative Servers
- Iterative servers process one request at a time
- Second Client is Blocked
- Second client attempts to connect to iterative server
- Call to connect returns
- Even though connection not yet accepted
- Server side TCP manager queues request
- Feature known as "TCP listen backlog"
- Call to rio_writen returns
- Server side TCP manager buffers input data
- Calll to rio_readlineb blocks
- Server hasn't written anything for it to read
- Fundamental Flaw of Iterative Servers
- Client 1 blocks waiting for user to type in data
- Server blocks waiting for data from Client 1
- Client 2 blocks waiting to read from server
Approaches for Writing Concurrent Servers
- Process-based
- Kernel automatically interleaves multiple logical flows
- Each flow has its own private address space
- Event-based
- Programmer manually interleaves multiple logical flows
- All flows share the same address space
- Uses technique called I/O multiplexing
- Thread-based
- Kernel automatically interleaves multiple logical flows
- Each flow shares the same address space
- Hybrid of process-based and event-based
Issues with Process-based Servers
- Listening server process must reap zombie children
- to avoid fatal memory leak
- Parent process must close its copy of connfd
- Kernel keeps reference count for each socket/open file
- After fork, refcnt (connfd) = 2
- Connection will not be closed until refcnt (connfd) = 0
Pros and Cons of Process-based Servers
- Handle multiple connections concurrently
- Clean sharing model
- descriptor (no)
- file tables (yes)
- global variables (no)
- Simple and straightforward
- Additional overhead for process control
- Nontrivial to share data between processes
- Requires IPC (interprocess communication) mechanisms
- FIFO's (named pipes), System V shared memory and semaphore
- Requires IPC (interprocess communication) mechanisms
Event-based Servers
-
Server maintains set of active connections
- Array of connfd's
-
Repeat
- Determine which descriptors (connfd's or listenfd) have pending inputs
- e.g., using select or epoll functions
- arrival of pending input is an event
- If listenfd has input, then accept connection
- and add new connfd to array
- Service all connfd's with pending inputs
- Determine which descriptors (connfd's or listenfd) have pending inputs
-
Details for select-based server in book
Pros and Cons of Event-based Servers
- One logical control flow address space
- Can single-step with a debugger
- No process or thread control overhead
- Design of choice for high-performance Web servers and search engines.
- Significantly more complex to code than process- or thread-based designs
- Hard to provide fine-grained concurrency
- E.g., how to deal with partial HTTP request headers
- Cannot take advantages of multi-core
- Single thread of control
Process
- Traditional View
- Process = process context + code, data, and stack
- Alternate View of a Process
- Process = thread + code, data, and kernel context
A Process With Multiple Threads
- Multiple threads can be associated with a process
- Each thread has its own logical control flow
- Each thread shares the same code, data, and kernel context
- Each thread has its own stack for local variables
- but not protected from other threads
- Each thread has its own thread id
Logical View of Threads
- Threads associated with process form a pool of peers
- Unlike processes which form a tree hierarchy
Threads vs. Processes
- Similar
- Each has its own logical control flow
- Each can run concurrently with others (possibly on different cores)
- Each is context switched
- Different
- Threads share all code and data (except local stacks)
- Processes do not
- Threads are somewhat less expensive than processes
- Process control (creating and reaping) twice as expensive as thread control
- Threads share all code and data (except local stacks)
Thread-based Server Execution Model
-
Each client handled by individual peer thread
-
Threads share all process state except TID
-
Each thread has a separate stack for local variables
Thread-Based Concurrent Server (cont)
- Run thread in "detached" mode
- Runs independently of other threads
- Reaped automatically (by kernel) when it terminates
- Free storage allocated to hold connfd
- Close connfd (important!)
Issues With Thread-Based Servers
- Must run "detached" to avoid memory leak
- At any point in time, a thread is either joinable or detached
- Joinable thread can be reaped and killed by other threads
- Detached thread cannot be reaped or killed by other threads
- Default state is joinable
- Must be careful to avoid unintended sharing
- All functions called by a thread must be thread-safe
Pros and Cons of Thread-Based Designs
- Easy to share data structures between threads
- Threads are more efficient than processes
- Unintentional sharing can introduce subtle and hard-to-reproduce errors
- The ease with which data can be shared is both the greatest strength and the greatest weakness of threads
- Hard to know which data shared & which private
- Hard to detect by testing
Summary: Approaches to Concurrency
- Process-based
- Hard to share resources: Easy to avoid unintended sharing
- High overhead in adding/removing clients
- Event-based
- Tedious and low level
- Total control over scheduling
- Very low overhead
- Cannot create as fine grained a level of concurrency
- Does not make use of multi-core
- Thread-based
- Easy to share resources
- Medium overhead
- Not much control over scheduling policies
- Difficult to debug