Item 16: Be wary of shared-state parallelism

"Even the most daring forms of sharing are guaranteed safe in Rust." – Aaron Turon

The official documentation describes Rust as enabling "fearless concurrency" but this Item will explore why (sadly) there are still some reasons to be afraid.

This Item is specific to shared-state parallelism: where different threads of execution communicate with each other by sharing memory.

Data Races

Let's start with the good news, by exploring data races and Rust. The definition is roughly:

A data race is defined to occur when two distinct threads access the same memory location, where

  • at least one of them is a write, and
  • there is no synchronization mechanism that enforces an ordering on the accesses.

The basics of this are best illustrated with an example:

class BankAccount {
 public:
  BankAccount() : balance_(0) {}

  int64_t balance() const {
    return balance_;
  }
  void deposit(uint32_t amount) {
    balance_ += amount;
  }
  bool withdraw(uint32_t amount) {
    if (balance_ < amount) {
      return false;
    }
    // What if another thread changes `balance_` at this point?
    std::this_thread::sleep_for(std::chrono::milliseconds(500));

    balance_ -= amount;
    return true;
  }

 private:
  int64_t balance_;
};

This example is in C++, not Rust, for reasons which will become clear shortly. However, the same general concepts apply in lots of other languages (that aren't Rust) – Java, or Go, or Python …

This class works fine in a single-threaded setting, but in a multi-threaded setting:

  std::thread taker(take_out, &account, 100);
  std::thread taker2(take_out, &account, 100);
  std::thread taker3(take_out, &account, 100);
  std::thread payer(pay_in, &account, 300);

where several threads are repeatedly trying to withdraw from the account:

int64_t check_balance(const BankAccount* account) {
    int64_t bal = account->balance();
    if (bal < 0) {
      std::cerr << "** Oh no, gone overdrawn: " << bal << " **!\n";
      abort();
    }
    std::cout << "Balance now " << bal << "\n";
    return bal;
}
void take_out(BankAccount* account, int count) {
  for (int ii = 0; ii < count; ii++) {
    if (account->withdraw(100)) {
      log("Withdrew 100");
    } else {
      log("Failed to withdraw 100");
    }
    check_balance(account);
    std::this_thread::sleep_for(std::chrono::milliseconds(6));
  }
}

then eventually things will go wrong.

** Oh no, gone overdrawn: -100 **!

The problem isn't hard to spot, particularly with the helpful comment in the withdraw() method: when multiple threads are involved, the value of the balance can change between the check and the modification. However, real-world bugs of this sort are much harder to spot – particularly if the compiler is allowed to perform all kinds of tricks and re-orderings of code under the covers (as is the case for C++).

The sleep call also artificially raises the chances of this bug being hit and thus detected early; when these problems are encountered in the wild they're likely to occur rarely and intermittently – making them very hard to debug.

The BankAccount class is thread-compatible, which means that it can be used in a multithreaded environment as long as the users of class ensure that access to it is governed by some kind of external synchronization mechanism.

The class can be converted to a thread-safe class1 – meaning that it is safe to use from multiple threads – by adding internal synchronization operations:

class BankAccount {
 public:
  BankAccount() : balance_(0) {}

  int64_t balance() const {
    const std::lock_guard<std::mutex> with_lock(mu_);
    return balance_;
  }
  void deposit(uint32_t amount) {
    const std::lock_guard<std::mutex> with_lock(mu_);
    balance_ += amount;
  }
  bool withdraw(uint32_t amount) {
    const std::lock_guard<std::mutex> with_lock(mu_);
    if (balance_ < amount) {
      return false;
    }
    // What if another thread changes `balance_` at this point?
    balance_ -= amount;
    return true;
  }

 private:
  mutable std::mutex mu_;
  int64_t balance_;
};

The internal balance_ field is now protected by a mutex mu_: a synchronization object that ensures that only one thread can successfully lock() at a time. The second and subsequent callers will block until unlock() is called – and even then, only one of the blocked threads will unblock and proceed through lock().

All access to the balance now takes place with the mutex held, ensuring that its value is consistent between check and modification. The std::lock_guard is also worth highlighting: it's an RAII class (cf. Item 10) that ensures that the mutex is unlocked when the scope exits, reducing the chances of making a mistake around balancing manual lock() and unlock() calls.

However, the thread-safety here is still fragile; all it takes is one erroneous modification to the class:

  bool transfer(BankAccount* destination, uint32_t amount) {
    // oops, forgot about mu_
    if (balance_ < amount) {
      return false;
    }
    balance_ -= amount;
    destination->balance_ += amount;
    return true;
  }

and the thread-safety has been destroyed2.

For a book about Rust, this Item has covered a lot of C++, so consider a straightforward translation of this class into Rust:

    pub struct BankAccount {
        balance: i64,
    }

    impl BankAccount {
        pub fn new() -> Self {
            BankAccount { balance: 0 }
        }
        pub fn balance(&self) -> i64 {
            self.balance
        }
        pub fn deposit(&mut self, amount: i64) {
            self.balance += amount
        }
        pub fn withdraw(&mut self, amount: i64) -> bool {
            if self.balance < amount {
                return false;
            }
            self.balance -= amount;
            true
        }
    }

This works fine in a single-threaded context – even if that thread is not the main thread:

        let mut account = BankAccount::new();
        let payer = std::thread::spawn(move || pay_in(&mut account, 30));
        // Wait for thread completion
        payer.join().unwrap();

but a naïve attempt to use the BankAccount across multiple threads:

        let mut account = BankAccount::new();
        let taker = std::thread::spawn(move || take_out(&mut account, 100));
        let payer = std::thread::spawn(move || pay_in(&mut account, 300));
        payer.join().unwrap();
        taker.join().unwrap();

immediately falls foul of the compiler:

error[E0382]: use of moved value: `account`
  --> deadlock/src/main.rs:76:40
   |
74 |         let mut account = BankAccount::new();
   |             ----------- move occurs because `account` has type `broken::BankAccount`, which does not implement the `Copy` trait
75 |         let taker = std::thread::spawn(move || take_out(&mut account, 100));
   |                                        -------               ------- variable moved due to use in closure
   |                                        |
   |                                        value moved into closure here
76 |         let payer = std::thread::spawn(move || pay_in(&mut account, 300));
   |                                        ^^^^^^^             ------- use occurs due to use in closure
   |                                        |
   |                                        value used here after move

With experience of the borrow checker (Item 13), the problem sticks out clearly: there are two mutable references to the same item, one more than is allowed. The rules of the borrow checker are that you can have a single mutable reference to an item, or multiple (immutable) references, but not both at the same time.

This has a curious resonance with the definition of a data race at the start of this Item: enforcing that there is a single writer, or multiple readers (but never both) means that there can be no data races. By enforcing memory safety, Rust gets thead safety "for free".

As with C++, some kind of synchronization is needed to make this struct thread-safe. The most common mechanism is also called Mutex, but the Rust version "wraps" the protected data rather than being a standalone object (as in C++):

    pub struct BankAccount {
        balance: std::sync::Mutex<i64>,
    }

The lock() method on this Mutex generic returns a MutexGuard object with an RAII behaviour, like C++'s std::lock_guard: the mutex is automatically released at the end of the scope when the guard is dropped. (Rust's Mutex has no manual lock/unlock methods, as they would expose developers to the danger of forgetting to keep lock() and unlock() calls exactly in sync.)

The MutexGuard object also acts as a proxy for the data that is enclosed by the Mutex, by implementing the Deref and DerefMut traits (Item 8), allowing it to be used for both read operations:

        pub fn balance(&self) -> i64 {
            *self.balance.lock().unwrap()
        }

and write operations:

        // Note: no longer needs `&mut self`.
        pub fn deposit(&self, amount: i64) {
            *self.balance.lock().unwrap() += amount
        }

There's an interesting detail lurking in the signature of this method: although it is modifying the balance of the BankAccount, this method now takes &self rather than &mut self. This is inevitable: if multiple threads are going to hold references to the same BankAccount, by the rules of the borrow checker those references had better not be mutable. It's also another instance of the interior mutability pattern described in Item 8: borrow checks are effectively moved from compile-time to run-time, but now with a specific concern for cross-thread synchronization.

Wrapping up shared state in a Mutex mollifies the borrow checker, but there are still lifetime issues (Item 14) to fix.

        {
            let account = BankAccount::new();
            let taker = std::thread::spawn(|| take_out(&account, 100));
            let payer = std::thread::spawn(|| pay_in(&account, 300));
        }
error[E0373]: closure may outlive the current function, but it borrows `account`, which is owned by the current function
   --> deadlock/src/main.rs:190:44
    |
190 |             let taker = std::thread::spawn(|| take_out(&account, 100));
    |                                            ^^           ------- `account` is borrowed here
    |                                            |
    |                                            may outlive borrowed value `account`
    |
note: function requires argument type to outlive `'static`
   --> deadlock/src/main.rs:190:25
    |
190 |             let taker = std::thread::spawn(|| take_out(&account, 100));
    |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: to force the closure to take ownership of `account` (and any other referenced variables), use the `move` keyword
    |
190 |             let taker = std::thread::spawn(move || take_out(&account, 100));
    |                                            ^^^^^^^
error[E0373]: closure may outlive the current function, but it borrows `account`, which is owned by the current function
   --> deadlock/src/main.rs:191:44
    |
191 |             let payer = std::thread::spawn(|| pay_in(&account, 300));
    |                                            ^^         ------- `account` is borrowed here
    |                                            |
    |                                            may outlive borrowed value `account`
    |
note: function requires argument type to outlive `'static`
   --> deadlock/src/main.rs:191:25
    |
191 |             let payer = std::thread::spawn(|| pay_in(&account, 300));
    |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: to force the closure to take ownership of `account` (and any other referenced variables), use the `move` keyword
    |
191 |             let payer = std::thread::spawn(move || pay_in(&account, 300));
    |                                            ^^^^^^^

The error message makes the problem clear: the BankAccount is going to be dropped at the end of the block, but there are two new threads which have a reference to it, and which will carry on running afterwards.

The standard tool for ensuring that an object remains active until all references to it are gone is a reference counted pointer, and Rust's variant of this for multi-threaded use is std::sync::Arc:

        let account = std::sync::Arc::new(BankAccount::new());

        let account2 = account.clone();
        let taker = std::thread::spawn(move || take_out(&account2, 100));

        let account3 = account.clone();
        let payer = std::thread::spawn(move || pay_in(&account3, 300));

Each thread gets its own (moved) copy of the reference counting pointer, and the underlying BankAccount will only be dropped when the refcount drops to zero. This combination of Arc<Mutex<T>> is common in Rust programs that use shared state parallelism.

Stepping back from the technical details, observe that Rust has entirely avoided the problem of data races that plagues multi-threaded programming in other languages. Of course, this good news is restricted to safe Rust – unsafe code (Item 15) and FFI boundaries in particular (Item 36) may not be data race free – but it's still a remarkable phenomenom.

Standard Marker Traits

There are two standard traits that affect the use of Rust objects between threads. Both of these traits are marker traits (Item 5) that have no associated methods, but they have special significance to the compiler in multi-threaded scenarios.

  • The Send trait indicates that items of a type are safe to transfer between threads; ownership of an item of this type can be passed from one thread to another.
  • The Sync trait indicates that items of a type can be safely accessed by multiple threads, subject to the rules of the borrow checker.

Both of these traits are automatically derived provided that the constituent parts of a type are also Send/Sync.

The majority of safe types are Send and Sync, so much so that it's clearer to understand what types don't implement these traits (written in the form impl !Sync for Type).

A non-Send type is one that can only be used in a single thread. The canonical example of this is the unsynchronized reference counting pointer Rc<T> (Item 8). The implementation of this type explicitly assumes single-threaded use (for speed); there is no attempt at synchronizing the internal refcount for multi-threaded use. As such, transferring an Rc<T> between threads is not allowed; use Arc<T> (with its additional synchronization overhead) for this case.

A non-Sync type is one that's not safe to use from multiple threads via non-mut references (as the borrow checker will ensure there are never multiple nut references). The canonical examples of this are the types that provide interior mutability in an unsynchronized way, such as Cell<T> and RefCell<T>. Use Mutex<T> or RwLock<T> to provide interior mutability in a multi-threaded environment.

Deadlocks

Now for the bad news. Multi-threaded code comes with two terrible problems:

  • data races, which can lead to corrupted data, and
  • deadlocks, which can lead to your program grinding to a halt.

Both of these problems are terrible because they can be very hard to debug in practice: the failures occur non-deterministically and are often more likely to happen under load – which means that they don't show up in unit tests, integration tests, or any other sort of test (Item 29), but they do show up in production.

Rust has solved the problem of data races (as described above), but the problem of deadlocks applies to Rust as much as it does to any other language that supports multi-threading.

Consider a simplified multiple player game server, implemented as a multithreaded application in order to service many players in parallel. Two core data structures might be a collection of players, indexed by username:

    players: Mutex<HashMap<String, Player>>,

and a collection of games in progress indexed by some unique identifier:

    games: Mutex<HashMap<GameID, Game>>,

Both of these data structures are Mutex-protected and so safe from data races; however, code that manipulates both data structures opens up potential problems. A single interaction between the two might work fine:

    fn add_and_join(&self, username: &str, info: Player) -> Option<GameID> {
        // Add the new player.
        let mut players = self.players.lock().unwrap();
        players.insert(username.to_owned(), info);

        // Find a game with available space for them to join.
        let mut games = self.games.lock().unwrap();
        for (id, game) in games.iter_mut() {
            if game.add_player(username) {
                return Some(*id);
            }
        }
        None
    }

However, a second interaction between the two independently locked data structures is where problems start:

    fn ban_player(&self, username: &str) {
        // Find all games that the user is in and remove them.
        let mut games = self.games.lock().unwrap();
        games
            .iter_mut()
            .filter(|(_id, g)| g.has_player(username))
            .for_each(|(_id, g)| g.remove_player(username));

        // Wipe them from the user list.
        let mut players = self.players.lock().unwrap();
        players.remove(username);
    }

To understand the problem, imagine two separate threads using these two methods:

  • Thread 1 enters add_and_join() and immediately acquires the players lock.
  •    Thread 2 enters ban_player() and immediately acquires the games lock.
  • Thread 1 now tries to acquire the games lock; this is held by thread 2, so thread 1 blocks.
  •    Thread 2 tries to acquire the players lock; this is held by thread 1, so thread 2 blocks.

At this point the program is deadlocked: neither thread will ever progress, and nor will any other thread that does anything with either of the two Mutex-protected data structures.

The root cause of this is a lock inversion: one function acquires the locks in the order players then games, whereas the other uses the opposite order (games then players). This is the simplest example of a more general description of the problem:

  • Build a directed graph where:
    • Each Mutex instance is a vertex.
    • Each edge indicates a situation where one Mutex gets acquired while another Mutex is already held.
  • If there are any cycles in the graph, a deadlock can occur.

A simplistic attempt to solve this problem involves reducing the scope of the locks, so there is no point where both locks are held at the same time:

    fn add_and_join(&self, username: &str, info: Player) -> Option<GameID> {
        // Add the new player.
        {
            let mut players = self.players.lock().unwrap();
            players.insert(username.to_owned(), info);
        }

        // Find a game with available space for them to join.
        {
            let mut games = self.games.lock().unwrap();
            for (id, game) in games.iter_mut() {
                if game.add_player(username) {
                    return Some(*id);
                }
            }
        }
        None
    }
    fn ban_player(&self, username: &str) {
        // Find all games that the user is in and remove them.
        {
            let mut games = self.games.lock().unwrap();
            games
                .iter_mut()
                .filter(|(_id, g)| g.has_player(username))
                .for_each(|(_id, g)| g.remove_player(username));
        }

        // Wipe them from the user list.
        {
            let mut players = self.players.lock().unwrap();
            players.remove(username);
        }
    }

(A better version of this would be to encapsulate the manipulation of the players data structure into add_player() and remove_player() helper methods, to reduce the chances of forgetting to close out a scope.)

This solves the deadlock problem, but leaves behind a data consistency problem: the players and games data structures can get out of sync with each other.

  • Thread 1 enters add_and_join("Alice") and adds Alice to the players data structure (then releases the players lock).
  •    Thread 2 enters ban_player("Alice") and removes Alice from all games (then releases the games lock).
  •    Thread 2 now removes Alice from the players data structure; thread 1 has already released the lock so this does not block.
  • Thread 1 now carries on and acquires the games lock (already released by thread 2). With the lock held, thread 1 adds "Alice" to a game in progress.

At this point, there is a game that includes a player that doesn't exist, according to the players data structure!

The heart of the problem is that there are two data structures that need to be kept in sync with each other; the best way to do this is to have a single sychronization primitive that covers both of them.

    struct GameState {
        players: HashMap<String, Player>,
        games: HashMap<GameID, Game>,
    }

    struct GameServer {
        state: Mutex<GameState>,
        // ...
    }

Advice

The most obvious advice for how to avoid the problems that come with shared-state parallelism is simply to avoid shared-state parallelism. The Rust book quotes from the Go language documentation: "Do not communicate by sharing memory; instead, share memory by communicating".

The Go language has channels that are suitable for this built into the language; for Rust, equivalent functionality is included in the standard library in the std::sync::mpsc module: the channel() function returns a (Sender, Receiver) pair that allows values of a particular type to be communicated between threads.

If shared-state concurrency can't be avoided, then there are some ways to reduce the chances of writing deadlock-prone code:

  • Put data structures that must be kept consistent with each other under a single lock.
  • Keep lock scopes small and obvious; wherever possible, use helper methods that get and set things under the relevant lock.
  • Avoid invoking closures with locks held; this puts the code at the mercy of whatever closure gets added to the codebase in the future.
  • Similarly, avoid returning a MutexGuard to a caller: it's like handing out a loaded gun from a deadlock perspective.
  • Include deadlock detection tools in your CI system (Item 31), such as no_deadlocks, ThreadSanitizer, or parking_lot::deadlock.
  • As a last resort: design, document, test and police a locking hierarchy that describes what lock orderings are allowed/required. This should be a last resort because any strategy that relies on engineers never making a mistake is obviously doomed to failure in the long term.

More abstractly, multi-threaded code is an ideal place to apply the general advice: prefer code that's so simple that it is obviously not wrong, rather than code that's so complex that it's not obviously wrong.


1: The third category of behaviour is thread-hostile: code that's dangerous in a multithreaded environment even if all access to it is externally synchronized.

2: The Clang C++ compiler includes a -Wthread-safety option, sometimes known as annotalysis, that allows data to be annotated with information about which mutexes protect which data, and functions to be annotated with information about the locks they acquire. This gives compile-time errors when these invariants are broken, like Rust; however, there is nothing to enforce the use of these annotations in the first place – for example, when a thread-compatible library is used in a multi-threaded environment for the first time.