Item 17: Be wary of shared-state parallelism

"Even the most daring forms of sharing are guaranteed safe in Rust." – Aaron Turon

The official documentation describes Rust as enabling "fearless concurrency", but this Item will explore why (sadly) there are still some reasons to be afraid of concurrency, even in Rust.

This Item is specific to shared-state parallelism: where different threads of execution communicate with each other by sharing memory. Sharing state between threads generally comes with two terrible problems, regardless of the language involved:

Data races: These can lead to corrupted data.
Deadlocks: These can lead to your program grinding to a halt.

Both of these problems are terrible ("causing or likely to cause terror") because they can be very hard to debug in practice: the failures occur nondeterministically and are often more likely to happen under load—which means that they don't show up in unit tests, integration tests, or any other sort of test (Item 30), but they do show up in production.

Rust is a giant step forward, because it completely solves one of these two problems. However, the other still remains, as we shall see.

Data Races

Let's start with the good news, by exploring data races and Rust. The precise technical definition of a data race varies from language to language, but we can summarize the key components as follows:

A data race is defined to occur when two distinct threads access the same memory location, under the following conditions:

At least one of them is a write.

There is no synchronization mechanism that enforces an ordering on the accesses.

Data races in C++

The basics of this are best illustrated with an example. Consider a data structure that tracks a bank account:

// C++ code.
class BankAccount {
 public:
  BankAccount() : balance_(0) {}

  int64_t balance() const {
    if (balance_ < 0) {
      std::cerr << "** Oh no, gone overdrawn: " << balance_ << "! **\n";
      std::abort();
    }
    return balance_;
  }
  void deposit(uint32_t amount) {
    balance_ += amount;
  }
  bool withdraw(uint32_t amount) {
    if (balance_ < amount) {
      return false;
    }
    // What if another thread changes `balance_` at this point?
    std::this_thread::sleep_for(std::chrono::milliseconds(500));

    balance_ -= amount;
    return true;
  }

 private:
  int64_t balance_;
};

This example is in C++, not Rust, for reasons that will become clear shortly. However, the same general concepts apply in many other (non-Rust) languages—Java, or Go, or Python, etc.

This class works fine in a single-threaded setting, but consider a multithreaded setting:

BankAccount account;
account.deposit(1000);

// Start a thread that watches for a low balance and tops up the account.
std::thread payer(pay_in, &account);

// Start 3 threads that each try to repeatedly withdraw money.
std::thread taker(take_out, &account);
std::thread taker2(take_out, &account);
std::thread taker3(take_out, &account);

Here several threads are repeatedly trying to withdraw from the account, and there's an additional thread that tops up the account when it runs low:

// Constantly monitor the `account` balance and top it up if low.
void pay_in(BankAccount* account) {
  while (true) {
    if (account->balance() < 200) {
      log("[A] Balance running low, deposit 400");
      account->deposit(400);
    }
    // (The infinite loop with sleeps is just for demonstration/simulation
    // purposes.)
    std::this_thread::sleep_for(std::chrono::milliseconds(5));
  }
}

// Repeatedly try to perform withdrawals from the `account`.
void take_out(BankAccount* account) {
  while (true) {
    if (account->withdraw(100)) {
      log("[B] Withdrew 100, balance now " +
          std::to_string(account->balance()));
    } else {
      log("[B] Failed to withdraw 100");
    }
    std::this_thread::sleep_for(std::chrono::milliseconds(20));
  }
}

Eventually, things will go wrong:

** Oh no, gone overdrawn: -100! **

The problem isn't hard to spot, particularly with the helpful comment in the withdraw() method: when multiple threads are involved, the value of the balance can change between the check and the modification. However, real-world bugs of this sort are much harder to spot—particularly if the compiler is allowed to perform all kinds of tricks and reorderings of code under the covers (as is the case for C++).

The various sleep calls are included in order to artificially raise the chances of this bug being hit and thus detected early; when these problems are encountered in the wild, they're likely to occur rarely and intermittently—making them very hard to debug.

The BankAccount class is thread-compatible, which means that it can be used in a multithreaded environment as long as the users of the class ensure that access to it is governed by some kind of external synchronization mechanism.

The class can be converted to a thread-safe class—meaning that it is safe to use from multiple threads—by adding internal synchronization operations:¹

// C++ code.
class BankAccount {
 public:
  BankAccount() : balance_(0) {}

  int64_t balance() const {
    // Lock mu_ for all of this scope.
    const std::lock_guard<std::mutex> with_lock(mu_);
    if (balance_ < 0) {
      std::cerr << "** Oh no, gone overdrawn: " << balance_ << " **!\n";
      std::abort();
    }
    return balance_;
  }
  void deposit(uint32_t amount) {
    const std::lock_guard<std::mutex> with_lock(mu_);
    balance_ += amount;
  }
  bool withdraw(uint32_t amount) {
    const std::lock_guard<std::mutex> with_lock(mu_);
    if (balance_ < amount) {
      return false;
    }
    balance_ -= amount;
    return true;
  }

 private:
  mutable std::mutex mu_; // protects balance_
  int64_t balance_;
};

The internal balance_ field is now protected by a mutex mu_: a synchronization object that ensures that only one thread can successfully hold the mutex at a time. A caller can acquire the mutex with a call to std::mutex::lock(); the second and subsequent callers of std::mutex::lock() will block until the original caller invokes std::mutex::unlock(), and then one of the blocked threads will unblock and proceed through std::mutex::lock().

All access to the balance now takes place with the mutex held, ensuring that its value is consistent between check and modification. The std::lock_guard is also worth highlighting: it's an RAII class (see Item 11) that calls lock() on creation and unlock() on destruction. This ensures that the mutex is unlocked when the scope exits, reducing the chances of making a mistake around balancing manual lock() and unlock() calls.

However, the thread safety here is still fragile; all it takes is one erroneous modification to the class:

// Add a new C++ method...
void pay_interest(int32_t percent) {
  // ...but forgot about mu_
  int64_t interest = (balance_ * percent) / 100;
  balance_ += interest;
}

and the thread safety has been destroyed.²

Data races in Rust

For a book about Rust, this Item has covered a lot of C++, so consider a straightforward translation of this class into Rust:

#![allow(unused)]
fn main() {
pub struct BankAccount {
    balance: i64,
}

impl BankAccount {
    pub fn new() -> Self {
        BankAccount { balance: 0 }
    }
    pub fn balance(&self) -> i64 {
        if self.balance < 0 {
            panic!("** Oh no, gone overdrawn: {}", self.balance);
        }
        self.balance
    }
    pub fn deposit(&mut self, amount: i64) {
        self.balance += amount
    }
    pub fn withdraw(&mut self, amount: i64) -> bool {
        if self.balance < amount {
            return false;
        }
        self.balance -= amount;
        true
    }
}
}

along with the functions that try to pay into or withdraw from an account forever:

pub fn pay_in(account: &mut BankAccount) {
    loop {
        if account.balance() < 200 {
            println!("[A] Running low, deposit 400");
            account.deposit(400);
        }
        std::thread::sleep(std::time::Duration::from_millis(5));
    }
}

pub fn take_out(account: &mut BankAccount) {
    loop {
        if account.withdraw(100) {
            println!("[B] Withdrew 100, balance now {}", account.balance());
        } else {
            println!("[B] Failed to withdraw 100");
        }
        std::thread::sleep(std::time::Duration::from_millis(20));
    }
}

This works fine in a single-threaded context—even if that thread is not the main thread:

{
    let mut account = BankAccount::new();
    let _payer = std::thread::spawn(move || pay_in(&mut account));
    // At the end of the scope, the `_payer` thread is detached
    // and is the sole owner of the `BankAccount`.
}

but a naive attempt to use the BankAccount across multiple threads:

{
    let mut account = BankAccount::new();
    let _taker = std::thread::spawn(move || take_out(&mut account));
    let _payer = std::thread::spawn(move || pay_in(&mut account));
}

immediately falls foul of the compiler:

error[E0382]: use of moved value: `account`
   --> src/main.rs:102:41
    |
100 | let mut account = BankAccount::new();
    |     ----------- move occurs because `account` has type
    |                 `broken::BankAccount`, which does not implement the
    |                 `Copy` trait
101 | let _taker = std::thread::spawn(move || take_out(&mut account));
    |                                 -------               ------- variable
    |                                 |                         moved due to
    |                                 |                         use in closure
    |                                 |
    |                                 value moved into closure here
102 | let _payer = std::thread::spawn(move || pay_in(&mut account));
    |                                 ^^^^^^^             ------- use occurs due
    |                                 |                        to use in closure
    |                                 |
    |                                 value used here after move

The rules of the borrow checker (Item 15) make the problem clear: there are two mutable references to the same item, one more than is allowed. The rules of the borrow checker are that you can have a single mutable reference to an item, or multiple (immutable) references, but not both at the same time.

This has a curious resonance with the definition of a data race at the start of this Item: enforcing that there is a single writer, or multiple readers (but never both), means that there can be no data races. By enforcing memory safety, Rust gets thread safety "for free".

As with C++, some kind of synchronization is needed to make this struct thread-safe. The most common mechanism is also called Mutex, but the Rust version "wraps" the protected data rather than being a standalone object (as in C++):

#![allow(unused)]
fn main() {
pub struct BankAccount {
    balance: std::sync::Mutex<i64>,
}
}

The lock() method on this Mutex generic returns a MutexGuard object with RAII behavior, like C++'s std::lock_guard: the mutex is automatically released at the end of the scope when the guard is dropped. (In contrast to C++, Rust's Mutex has no methods that manually acquire or release the mutex, as they would expose developers to the danger of forgetting to keep these calls exactly in sync.)

To be more precise, lock() actually returns a Result that holds the MutexGuard, to cope with the possibility that the Mutex has been poisoned. Poisoning happens if a thread fails while holding the lock, because this might mean that any mutex-protected invariants can no longer be relied on. In practice, lock poisoning is sufficiently rare (and it's sufficiently desirable that the program terminates when it happens) that it's common to just .unwrap() the Result (despite the advice in Item 18).

The MutexGuard object also acts as a proxy for the data that is enclosed by the Mutex, by implementing the Deref and DerefMut traits (Item 8), allowing it to be used both for read operations:

impl BankAccount {
    pub fn balance(&self) -> i64 {
        let balance = *self.balance.lock().unwrap();
        if balance < 0 {
            panic!("** Oh no, gone overdrawn: {}", balance);
        }
        balance
    }
}

and for write operations:

impl BankAccount {
    // Note: no longer needs `&mut self`.
    pub fn deposit(&self, amount: i64) {
        *self.balance.lock().unwrap() += amount
    }
    pub fn withdraw(&self, amount: i64) -> bool {
        let mut balance = self.balance.lock().unwrap();
        if *balance < amount {
            return false;
        }
        *balance -= amount;
        true
    }
}

There's an interesting detail lurking in the signatures of these methods: although they are modifying the balance of the BankAccount, the methods now take &self rather than &mut self. This is inevitable: if multiple threads are going to hold references to the same BankAccount, by the rules of the borrow checker, those references had better not be mutable. It's also another instance of the interior mutability pattern described in Item 8: borrow checks are effectively moved from compile time to runtime but now with cross-thread synchronization behavior. If a mutable reference already exists, an attempt to get a second blocks until the first reference is dropped.

Wrapping up shared state in a Mutex mollifies the borrow checker, but there are still lifetime issues (Item 14) to fix:

{
    let account = BankAccount::new();
    let taker = std::thread::spawn(|| take_out(&account));
    let payer = std::thread::spawn(|| pay_in(&account));
    // At the end of the scope, `account` is dropped but
    // the `_taker` and `_payer` threads are detached and
    // still hold (immutable) references to `account`.
}

error[E0373]: closure may outlive the current function, but it borrows `account`
              which is owned by the current function
   --> src/main.rs:206:40
    |
206 |     let taker = std::thread::spawn(|| take_out(&account));
    |                                    ^^           ------- `account` is
    |                                    |                     borrowed here
    |                                    |
    |                                    may outlive borrowed value `account`
    |
note: function requires argument type to outlive `'static`
   --> src/main.rs:206:21
    |
206 |     let taker = std::thread::spawn(|| take_out(&account));
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: to force the closure to take ownership of `account` (and any other
      referenced variables), use the `move` keyword
    |
206 |     let taker = std::thread::spawn(move || take_out(&account));
    |                                    ++++
error[E0373]: closure may outlive the current function, but it borrows `account`
              which is owned by the current function
   --> src/main.rs:207:40
    |
207 |     let payer = std::thread::spawn(|| pay_in(&account));
    |                                    ^^         ------- `account` is
    |                                    |                  borrowed here
    |                                    |
    |                                    may outlive borrowed value `account`
    |
note: function requires argument type to outlive `'static`
   --> src/main.rs:207:21
    |
207 |     let payer = std::thread::spawn(|| pay_in(&account));
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: to force the closure to take ownership of `account` (and any other
      referenced variables), use the `move` keyword
    |
207 |     let payer = std::thread::spawn(move || pay_in(&account));
    |                                    ++++

The error message makes the problem clear: the BankAccount is going to be dropped at the end of the block, but there are two new threads that have a reference to it and that may carry on running afterward. (The compiler's suggestion for how to fix the problem is less helpful—if the BankAccount item is moved into the first closure, it will no longer be available for the second closure to receive a reference to it!)

The standard tool for ensuring that an object remains active until all references to it are gone is a reference-counted pointer, and Rust's variant of this for multithreaded use is std::sync::Arc:

let account = std::sync::Arc::new(BankAccount::new());
account.deposit(1000);

let account2 = account.clone();
let _taker = std::thread::spawn(move || take_out(&account2));

let account3 = account.clone();
let _payer = std::thread::spawn(move || pay_in(&account3));

Each thread gets its own copy of the reference-counting pointer, moved into the closure, and the underlying BankAccount will be dropped only when the refcount drops to zero. This combination of Arc<Mutex<T>> is common in Rust programs that use shared-state parallelism.

Stepping back from the technical details, observe that Rust has entirely avoided the problem of data races that plagues multithreaded programming in other languages. Of course, this good news is restricted to safe Rust—unsafe code (Item 16) and FFI boundaries in particular (Item 34) may not be data-race free—but it's still a remarkable phenomenon.

Standard marker traits

There are two standard traits that affect the use of Rust objects between threads. Both of these traits are marker traits (Item 10) that have no associated methods but have special significance to the compiler in multithreaded scenarios:

The Send trait indicates that items of a type are safe to transfer between threads; ownership of an item of this type can be passed from one thread to another.
The Sync trait indicates that items of a type can be safely accessed by multiple threads, subject to the rules of the borrow checker.

Another way of saying this is to observe that Send means T can be transferred between threads, and Sync means that &T can be transferred between threads.

Both of these traits are auto traits: the compiler automatically derives them for new types, as long as the constituent parts of the type also implement Send/Sync.

The majority of safe types implement Send and Sync, so much so that it's clearer to understand what types don't implement these traits (written in the form impl !Sync for Type).

A type that doesn't implement Send is one that can be used only in a single thread. The canonical example of this is the unsynchronized reference-counting pointer Rc<T> (Item 8). The implementation of this type explicitly assumes single-threaded use (for speed); there is no attempt at synchronizing the internal refcount for multithreaded use. As such, transferring an Rc<T> between threads is not allowed; use Arc<T> (with its additional synchronization overhead) for this case.

A type that doesn't implement Sync is one that's not safe to use from multiple threads via non-mut references (as the borrow checker will ensure there are never multiple mut references). The canonical examples of this are the types that provide interior mutability in an unsynchronized way, such as Cell<T> and RefCell<T>. Use Mutex<T> or RwLock<T> to provide interior mutability in a multithreaded environment.

Raw pointer types like *const T and *mut T also implement neither Send nor Sync; see Item 16 and Item 34.

Deadlocks

Now for the bad news. Although Rust has solved the problem of data races (as previously described), it is still susceptible to the second terrible problem for multithreaded code with shared state: deadlocks.

Consider a simplified multiple-player game server, implemented as a multithreaded application to service many players in parallel. Two core data structures might be a collection of players, indexed by username, and a collection of games in progress, indexed by some unique identifier:

struct GameServer {
    // Map player name to player info.
    players: Mutex<HashMap<String, Player>>,
    // Current games, indexed by unique game ID.
    games: Mutex<HashMap<GameId, Game>>,
}

Both of these data structures are Mutex-protected and so are safe from data races. However, code that manipulates both data structures opens up potential problems. A single interaction between the two might work fine:

impl GameServer {
    /// Add a new player and join them into a current game.
    fn add_and_join(&self, username: &str, info: Player) -> Option<GameId> {
        // Add the new player.
        let mut players = self.players.lock().unwrap();
        players.insert(username.to_owned(), info);

        // Find a game with available space for them to join.
        let mut games = self.games.lock().unwrap();
        for (id, game) in games.iter_mut() {
            if game.add_player(username) {
                return Some(id.clone());
            }
        }
        None
    }
}

However, a second interaction between the two independently locked data structures is where problems start:

impl GameServer {
    /// Ban the player identified by `username`, removing them from
    /// any current games.
    fn ban_player(&self, username: &str) {
        // Find all games that the user is in and remove them.
        let mut games = self.games.lock().unwrap();
        games
            .iter_mut()
            .filter(|(_id, g)| g.has_player(username))
            .for_each(|(_id, g)| g.remove_player(username));

        // Wipe them from the user list.
        let mut players = self.players.lock().unwrap();
        players.remove(username);
    }
}

To understand the problem, imagine two separate threads using these two methods, where their execution happens in the order shown in Table 3-1.

Table 3-1. Thread deadlock sequence

Thread 1	Thread 2
Enters `add_and_join()` and immediately acquires the `players` lock.
	Enters `ban_player()` and immediately acquires the `games` lock.
Tries to acquire the `games` lock; this is held by thread 2, so thread 1 blocks.
	Tries to acquire the `players` lock; this is held by thread 1, so thread 2 blocks.

At this point, the program is deadlocked: neither thread will ever progress, nor will any other thread that does anything with either of the two Mutex-protected data structures.

The root cause of this is a lock inversion: one function acquires the locks in the order players then games, whereas the other uses the opposite order (games then players). This is a simple example of a more general problem; the same situation can arise with longer chains of nested locks (thread 1 acquires lock A, then B, then it tries to acquire C; thread 2 acquires C, then tries to acquire A) and across more threads (thread 1 locks A, then B; thread 2 locks B, then C; thread 3 locks C, then A).

A simplistic attempt to solve this problem involves reducing the scope of the locks, so there is no point where both locks are held at the same time:

/// Add a new player and join them into a current game.
fn add_and_join(&self, username: &str, info: Player) -> Option<GameId> {
    // Add the new player.
    {
        let mut players = self.players.lock().unwrap();
        players.insert(username.to_owned(), info);
    }

    // Find a game with available space for them to join.
    {
        let mut games = self.games.lock().unwrap();
        for (id, game) in games.iter_mut() {
            if game.add_player(username) {
                return Some(id.clone());
            }
        }
    }
    None
}
/// Ban the player identified by `username`, removing them from
/// any current games.
fn ban_player(&self, username: &str) {
    // Find all games that the user is in and remove them.
    {
        let mut games = self.games.lock().unwrap();
        games
            .iter_mut()
            .filter(|(_id, g)| g.has_player(username))
            .for_each(|(_id, g)| g.remove_player(username));
    }

    // Wipe them from the user list.
    {
        let mut players = self.players.lock().unwrap();
        players.remove(username);
    }
}

(A better version of this would be to encapsulate the manipulation of the players data structure into add_player() and remove_player() helper methods, to reduce the chances of forgetting to close out a scope.)

This solves the deadlock problem but leaves behind a data consistency problem: the players and games data structures can get out of sync with each other, given an execution sequence like the one shown in Table 3-2.

Table 3-2. State inconsistency sequence

Thread 1	Thread 2
Enters `add_and_join("Alice")` and adds Alice to the `players` data structure (then releases the `players` lock).
	Enters `ban_player("Alice")` and removes Alice from all `games` (then releases the `games` lock).
	Removes Alice from the `players` data structure; thread 1 has already released the lock, so this does not block.
Carries on and acquires the `games` lock (already released by thread 2). With the lock held, adds "Alice" to a game in progress.

At this point, there is a game that includes a player that doesn't exist, according to the players data structure!

The heart of the problem is that there are two data structures that need to be kept in sync with each other. The best way to do this is to have a single synchronization primitive that covers both of them:

struct GameState {
    players: HashMap<String, Player>,
    games: HashMap<GameId, Game>,
}

struct GameServer {
    state: Mutex<GameState>,
    // ...
}

Advice

The most obvious advice for avoiding the problems that arise with shared-state parallelism is simply to avoid shared-state parallelism. The Rust book quotes from the Go language documentation: "Do not communicate by sharing memory; instead, share memory by communicating".

The Go language has channels that are suitable for this built into the language; for Rust, equivalent functionality is included in the standard library in the std::sync::mpsc module: the channel() function returns a (Sender, Receiver) pair that allows values of a particular type to be communicated between threads.

If shared-state concurrency can't be avoided, then there are some ways to reduce the chances of writing deadlock-prone code:

Put data structures that must be kept consistent with each other under a single lock.
Keep lock scopes small and obvious; wherever possible, use helper methods that get and set things under the relevant lock.
Avoid invoking closures with locks held; this puts the code at the mercy of whatever closure gets added to the codebase in the future.
Similarly, avoid returning a MutexGuard to a caller: it's like handing out a loaded gun, from a deadlock perspective.
Include deadlock detection tools in your CI system (Item 32), such as no_deadlocks, ThreadSanitizer, or parking_lot::deadlock.
As a last resort: design, document, test, and police a locking hierarchy that describes what lock orderings are allowed/required. This should be a last resort because any strategy that relies on engineers never making a mistake is likely to be doomed to failure in the long term.

More abstractly, multithreaded code is an ideal place to apply the following general advice: prefer code that's so simple that it is obviously not wrong, rather than code that's so complex that it's not obviously wrong.

The third category of behavior is thread-hostile: code that's dangerous in a multithreaded environment even if all access to it is externally synchronized.

The Clang C++ compiler includes a -Wthread-safety option, sometimes known as annotalysis, that allows data to be annotated with information about which mutexes protect which data, and functions to be annotated with information about the locks they acquire. This gives compile-time errors when these invariants are broken, like Rust; however, there is nothing to enforce the use of these annotations in the first place—for example, when a thread-compatible library is used in a multithreaded environment for the first time.

Effective Rust