Item 35: Prefer bindgen to manual FFI mappings

Item 34 discussed the mechanics of invoking C code from a Rust program, describing how declarations of C structures and functions need to have an equivalent Rust declaration to allow them to be used over FFI. The C and Rust declarations need to be kept in sync, and Item 34 warned that the toolchain wouldn't help with this – mismatches would be silently ignored, hiding problems for later.

Keeping two things perfectly in sync sounds like a good target for automation, and the Rust toolchain comes with the right tool for the job: bindgen. The primary function of bindgen is to parse a C header file, and emit the corresponding Rust declarations.

Taking some of the example C declarations from Item 34:

/* C data structure definition. */
/* Changes here must be reflected in lib.rs. */
typedef struct {
    uint8_t byte;
    uint32_t integer;
} FfiStruct;

uint32_t add32(uint32_t x, uint32_t y);
int add(int x, int y);

the bindgen tool can be manually invoked (or invoked by a build.rs build script) to create a corresponding Rust file:

% bindgen --no-layout-tests \
          --allowlist-function="add.*" \
          --allowlist-type=FfiStruct \
          -o src/generated.rs \
          ../elsewhere/somelib.h

The generated Rust is identical to the hand-crafted declarations of Item 34

#![allow(unused)]
fn main() {
/* automatically generated by rust-bindgen 0.59.2 */

extern "C" {
    pub fn add32(x: u32, y: u32) -> u32;
}
extern "C" {
    pub fn add(
        x: ::std::os::raw::c_int,
        y: ::std::os::raw::c_int,
    ) -> ::std::os::raw::c_int;
}
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct FfiStruct {
    pub byte: u8,
    pub integer: u32,
}
}

and can be pulled into the code with the source-level include! macro:

// Include the auto-generated Rust declarations.
include!("generated.rs");

For anything but the most trivial FFI declarations, use bindgen to generate Rust bindings for C code – this is an area where machine-made, mass-produced code is definitely preferable to hand-crafted artisanal declarations. If a C function definition changes, the C compiler will complain if the C declaration no longer matches the C definition, but nothing will complain that the Rust declaration no longer matches the C declaration; auto-generating the Rust declaration from the C declaration ensures that never happens.

This also means that the bindgen step is an ideal candidate to include in a continuous integration system (Item 32); if the generated code is included in source control, the CI system can error out if a freshly-generated file doesn't match the checked-in version.

The bindgen tool comes particularly into its own when you're dealing with an existing C codebase that has a large API. Creating Rust equivalents to a big lib_api.h header file is manual and tedious, therefore error-prone – and as noted above, many categories of mismatch error will not be detected by the toolchain. bindgen also has a panoply of options that allow specific subsets of an API to be targeted (such as the --allowlist-function and --allowlist-type options illustrated above1).

This also allows a layered approach to exposing an existing C library in Rust; a common convention for wrapping some xyzzy library is to have:

  • An xyzzy-sys crate that holds (just) the bindgen-erated code – use of which is necessarily unsafe.
  • An xyzzy crate that encapsulates the unsafe code, and provides safe Rust access to the underyling functionality.

This concentrates the unsafe code in one layer, and allows the rest of the program to follow the advice of Item 16.

Beyond C

The bindgen tool has the ability to handle some C++ constructs, but only a subset and in a limited fashion. For better (but still somewhat limited) integration consider using the cxx crate for C++/Rust interoperation. Instead of generating Rust code from C++ declarations, cxx takes the approach of auto-generating both Rust and C++ code from a common schema, allowing for tighter integration.


1: The example also used the --no-layout-tests option to keep the output simple; by default, the generated code will include #[test] code to check that structures are indeed laid out correctly.