Item 35: Prefer bindgen
to manual FFI mappings
Item 34 discussed the mechanics of invoking C code from a Rust program, describing how declarations of C structures and functions need to have an equivalent Rust declaration to allow them to be used over FFI. The C and Rust declarations need to be kept in sync, and Item 34 warned that the toolchain wouldn't help with this – mismatches would be silently ignored, hiding problems for later.
Keeping two things perfectly in sync sounds like a good target for automation, and the Rust toolchain comes with the
right tool for the job: bindgen
. The primary function of bindgen
is to parse a C header file, and emit the corresponding Rust declarations.
Taking some of the example C declarations from Item 34:
/* C data structure definition. */
/* Changes here must be reflected in lib.rs. */
typedef struct {
uint8_t byte;
uint32_t integer;
} FfiStruct;
uint32_t add32(uint32_t x, uint32_t y);
int add(int x, int y);
the bindgen
tool can be manually invoked (or invoked by a build.rs
build script) to create a corresponding
Rust file:
% bindgen --no-layout-tests \
--allowlist-function="add.*" \
--allowlist-type=FfiStruct \
-o src/generated.rs \
../elsewhere/somelib.h
The generated Rust is identical to the hand-crafted declarations of Item 34
#![allow(unused)] fn main() { /* automatically generated by rust-bindgen 0.59.2 */ extern "C" { pub fn add32(x: u32, y: u32) -> u32; } extern "C" { pub fn add( x: ::std::os::raw::c_int, y: ::std::os::raw::c_int, ) -> ::std::os::raw::c_int; } #[repr(C)] #[derive(Debug, Copy, Clone)] pub struct FfiStruct { pub byte: u8, pub integer: u32, } }
and can be pulled into the code with the source-level include!
macro:
// Include the auto-generated Rust declarations.
include!("generated.rs");
For anything but the most trivial FFI declarations, use bindgen
to generate Rust bindings for C code – this
is an area where machine-made, mass-produced code is definitely preferable to hand-crafted artisanal declarations. If a
C function definition changes, the C compiler will complain if the C declaration no longer matches the C definition, but
nothing will complain that the Rust declaration no longer matches the C declaration; auto-generating the Rust
declaration from the C declaration ensures that never happens.
This also means that the bindgen
step is an ideal candidate to include in a continuous integration system (Item 32);
if the generated code is included in source control, the CI system can error out if a freshly-generated file doesn't
match the checked-in version.
The bindgen
tool comes particularly into its own when you're dealing with an existing C codebase that has a large API.
Creating Rust equivalents to a big lib_api.h
header file is manual and tedious, therefore error-prone – and as
noted above, many categories of mismatch error will not be detected by the toolchain. bindgen
also has a
panoply of
options that allow specific subsets of an API to be
targeted (such as the --allowlist-function
and --allowlist-type
options illustrated above1).
This also allows a layered approach to exposing an existing C library in Rust; a common convention for wrapping some
xyzzy
library is to have:
- An
xyzzy-sys
crate that holds (just) thebindgen
-erated code – use of which is necessarilyunsafe
. - An
xyzzy
crate that encapsulates theunsafe
code, and provides safe Rust access to the underyling functionality.
This concentrates the unsafe
code in one layer, and allows the rest of the program to follow the advice of Item 16.
Beyond C
The bindgen
tool has the ability to handle some C++ constructs,
but only a subset and in a limited fashion. For better (but still somewhat limited) integration consider using the
cxx
crate for C++/Rust interoperation. Instead of generating Rust code from C++
declarations, cxx
takes the approach of auto-generating both Rust and C++ code from a common schema, allowing for
tighter integration.
1: The example
also used the --no-layout-tests
option to keep the output simple; by default, the generated code will include #[test]
code to check that structures are indeed laid out correctly.