Common Concurrency Problems
Non-Deadlock Bugs
Atomicity-Violation Bugs
The desired serializability among multiple memory accesses is violated (i.e. a code region is intended to be atomic, but the atomicity is not enforced during execution).
Order-Violation Bugs
The desired order between two (groups of) memory accesses is flipped (i.e., A should always be executed before B, but the order is not enforced during execution)
The fix to this type of bug is generally to enforce ordering. As discussed previously, using condition variables is an easy and robust way to add this style of synchronization into modern code bases.
Deadlock Bugs
Why Do Deadlocks Occur?
One reason is that in large code bases, complex dependencies arise between components. Another reason is due to the nature of encapsulation. As software developers, we are taught to hide details of implementations and thus make software easier to build in a modular way. Unfortunately, such modularity does not mesh well with locking.
Conditions for Deadlock
- Mutual exclusion: Threads claim exclusive control of resources that they require (e.g., a thread grabs a lock).
- Hold-and-wait: Threads hold resources allocated to them (e.g., locks that they have already acquired) while waiting for additional resources (e.g., locks that they wish to acquire).
- No preemption: Resources (e.g., locks) cannot be forcibly removed from threads that are holding them.
- Circular wait: There exists a circular chain of threads such that each thread holds one or more resources (e.g., locks) that are being requested by the next thread in the chain.
If any of these four conditions are not met, deadlock cannot occur.
Things we can do
- Prevention: break one of these conditions
- Avoidance: schedule in sequence tasks which are prone to introduce deadlock
- Detect and Recover: reboot the service or even system…
Event-based Concurrency
event-based concurrency
The approach is quite simple: you simply wait for something (i.e., an 'event') to occur; when it does, you check what type of event it is and do the small amount of work it requires (which may include issuing I/O requests or scheduling other events for future handling, etc.).
What a canonical event-based server looks like? Such applications are based around a simple construct known as the event loop. Pseudocode for an event loop looks like this:
while (1) {
events = getEvents();
for (e in events)
processEvent(e);
}
The main loop simply waits for something to do (by calling getEvents() in the code above) and then, for each event returned, processes them, one at a time; the code that processes each event is known as an event handler. Importantly, when a handler processes an event, it is the only activity taking place in the system; thus, deciding which event to handle next is equivalent to scheduling. This explicit control over scheduling is one of the fundamental advantages of the event-based approach.
An Important API: select()
(or poll()
)
int select(int nfds,
fd_set *restrict readfds,
fd_set *restrict writefds,
fd_set *restrict errorfds,
struct timeval *restrict timeout);
Read the fucking manual:
select() examines the I/O descriptor sets whose addresses are passed in readfds, writefds, and errorfds to see if some of their descriptors are ready for reading, are ready for writing, or have an exceptional condition pending, respectively.
The first nfds descriptors are checked in each set; i.e., the descriptors from 0 through nfds-1 in the descriptor sets are examined. (Example: If you have set two file descriptors '4' and '17', nfds should not be '2', but rather '17 + 1' or '18'.)
On return, select() replaces the given descriptor sets with subsets consisting of those descriptors that are ready for the requested operation. select() returns the total number of ready descriptors in all the sets.
The descriptor sets are stored as bit fields in arrays of integers. The following macros are provided for manipulating such descriptor sets:
- FD_ZERO(&fdset) initializes a descriptor set fdset to the null set.
- FD_SET(fd, &fdset) includes a particular descriptor fd in fdset.
- FD_CLR(fd, &fdset) removes fd from fdset.
- FD_ISSET(fd, &fdset) is non-zero if fd is a member of fdset, zero otherwise.
If timeout is not a null pointer, it specifies a maximum interval to wait for the selection to complete. If timeout is a null pointer, the select blocks indefinitely.
To effect a poll, the timeout argument should not be a null pointer, but it should point to a zero-valued timeval structure.
Any of readfds, writefds, and errorfds may be given as null pointers if no descriptors are of interest.
A simple example:
int main(void) {
// open and set up a bunch of sockets (not shown)
// main loop
while (1) {
// initialize the fd_set to all zero
fd_set readFDs;
FD_ZERO(&readFDs);
// now set the bits for the descriptors
// this server is interested in
// (for simplicity, all of them from min to max)
int fd;
for (fd = minFD; fd < maxFD; fd++)
FD_SET(fd, &readFDs);
// do the select
int rc = select(maxFD+1, &readFDs, NULL, NULL, NULL);
// check which actually have data using FD_ISSET()
for (fd = minFD; fd < maxFD; fd++)
if (FD_ISSET(fd, &readFDs))
processFD(fd);
}
}
Why Simpler? No Locks Needed
With a single CPU and an event-based application, the problems found in concurrent programs are no longer present. Specifically, because only one event is being handled at a time, there is no need to acquire or release locks; the event-based server cannot be interrupted by another thread because it is decidedly single threaded. Thus, concurrency bugs common in threaded programs do not manifest in the basic event-based approach.
Event-based servers give control of scheduling to the application itself, but do so at some cost in complexity and difficulty of integration with other aspects of modern systems (e.g., paging). Because of these challenges, no single approach has emerged as best; thus, both threads and events are likely to persist as two different approaches to the same concurrency problem for many years to come.