Links to other parts of the series:
This is part of a series on writing a hashmap in Rust. See Part 1 for an introduction.
To recap: We ended Part 1 after writing two simple hashmaps.
first::Map
used separate chaining in a linked listsecond::Map
used open addressing and quadratic probing.
We saw a performance improvement for small value sizes (8 bytes) by moving to open addressing, but a regression for larger value sizes (64 bytes).
Table of Contents
Safe map with “swiss table” metadata
second::Map
is okay, but we can do better.
Our next optimization will be to keep some additional metadata in the map that tells us if each bucket is empty, full or tombstone.
Following the Swiss Tables design, we’ll maintain an additional array (the “metadata”) that holds one byte of metadata per bucket.
Each metadata entry consists of a control bit (is the bucket full or empty/tombstone?) and a 7-bit hash (called H2
) of the bucket’s key if it is full.
This scheme gets us a few performance advantages:
- Denser array for lookup:
Since our map holds
(K, V)
directly in storage, largerK
orV
values can lead to slowdowns as the stride of memory accesses becomes too long. The metadata can be substantially smaller than the size ofK
orV
—in our case, each entry will only be a single byte. Thus, if we can probe on a smaller metadata array, the probing process can be substantially more cache-friendly than probing on the full storage array. - Enabling
unsafe
optimizations: We saw in the benchmarks that making asecond::Map
with capacity for 100,000 items was about 30x slower for our map than std’s. This is because the whole storage array must be initialized toBucket::Empty
, and eagerly zeroing out all of that data takes time. A similar performance hit occurs every time we resize the map and have to initialize the new backing storage. Using the metadata, we can determine which buckets are full and leave all the others uninitialized usingstd::mem::MaybeUninit
, at the cost of someunsafe
complications.
At the moment, we’ll keep things safe (so no MaybeUninit
yet) and focus on implementing the metadata.
Here’s the basic idea: split the 64 bit hash into two hashes,
H1
is the top 57 bits of the original hash, andH2
is the remaining 7 bits.
We’ll use H1
in place of the full hash value to determine the index in both the storage and metadata arrays, and use H2
in the metadata entry for each bucket.
Here’s what this looks like in code:
// third.rs
// ...continued
impl<K, V> Map<K, V>
where
K: PartialEq + Eq + Hash,
{
fn bucket_index_and_h2(&self, k: &K) -> (usize, u8) {
let hash = make_hash(&self.hasher, k);
let (h1, h2) = (hash >> 7, (hash & 0x7F) as u8);
let index = fast_rem(h1 as usize, self.n_buckets());
(index, h2)
}
}
// continued...
The metadata
is an array of u8
, with one byte of metadata for each bucket.
Each metadata entry consists of a control bit and either the H2
hash (if full) or a sentinel value.
If the control bit is 0, then the bucket is full, and the remaining 7 bits are the H2
hash of that bucket’s key.
Otherwise, the remaining bits form one of two sentinel values (for “empty” or “tombstone”).
To probe for k
, we follow the same quadratic probe sequence, but on the metadata
array instead of the storage
array.
If we find an element whose H2
hash is equal to k
’s, then we hop over to the storage
array to actually compare the keys for equality.
Implementation
Here’s our map type:
// third.rs
//! A Swiss Tables-inspired map with metadata.
#[derive(Debug, Clone)]
pub struct Map<K, V, S: BuildHasher = DefaultHashBuilder> {
hasher: S,
n_items: usize, // Number of live items
n_occupied: usize, // Number of occupied buckets
storage: Box<[Option<(K, V)>]>,
metadata: Box<[Metadata]>,
}
// continued...
It looks similar to second::Map
, except we’ve added a new metadata
field.
We also changed the type of storage
to hold Option
instead of Bucket
, since we’ll now handle the bucket state in the metadata array.
The Metadata
type is just a type alias for u8
.
We define it in a new file called metadata.rs
, along with some utility functions:
// metadata.rs
//! Metadata for Swiss tables
const EMPTY: u8 = 0x80;
const TOMBSTONE: u8 = 0xFE;
const MASK: u8 = 0x7F;
pub type Metadata = u8;
#[inline]
pub const fn from_h2(h2: u8) -> Metadata {
h2 & MASK
}
#[inline]
pub const fn empty() -> Metadata {
EMPTY
}
#[inline]
pub const fn tombstone() -> Metadata {
TOMBSTONE
}
#[inline]
pub const fn is_empty(m: Metadata) -> bool {
m == EMPTY
}
#[inline]
pub const fn is_full(m: Metadata) -> bool {
(m & 0x80) == 0
}
#[inline]
pub const fn h2(m: Metadata) -> u8 {
m & MASK
}
The main change to our map implementation is probing in self.metadata
instead of self.storage
,
which requires some changes to probe_find
.
Nicely though, the main map operations won’t change too much (since most of their logic is contained in probe_find
).
Here’s the new probe_find
:
// third.rs
// ...continued
impl<K, V> Map<K, V>
where
K: PartialEq + Eq + Hash,
{
fn probe_find(&self, k: &K) -> ProbeResult {
let (mut current, h2) = self.bucket_index_and_h2(k);
for step in 0..self.n_buckets() {
current = fast_rem(current + step, self.n_buckets());
let meta = self.metadata[current];
if metadata::is_empty(meta) {
return ProbeResult::Empty(current, h2);
} else if metadata::is_full(meta) && metadata::h2(meta) == h2 {
let (kk, _) = self.storage[current].as_ref().unwrap();
if kk == k {
return ProbeResult::Full(current);
}
}
}
// We've seen every element in `storage`!
unreachable!("backing storage is full, we didn't resize correctly")
}
}
(ProbeResult::Empty
now holds (index, h2)
pairs,
to avoid recomputing the hash when updating metadata.)
Since the details of probing are abstracted by probe_find
, the map operations are almost unchanged from second::Map
.
get
and get_mut
are basically the same so I won’t show them.
_insert
and remove
need to be tweaked slightly to update the metadata:
// third.rs
// ...continued
impl<K, V> Map<K, V>
where
K: PartialEq + Eq + Hash,
{
pub fn insert(&mut self, k: K, v: V) -> Option<V> {
if self.needs_resize() {
self.resize();
}
self._insert(k, v)
}
fn _insert(&mut self, k: K, v: V) -> Option<V> {
match self.probe_find(&k) {
ProbeResult::Empty(index, h2) => {
self.metadata[index] = metadata::from_h2(h2);
self.storage[index] = Some((k, v));
self.n_items += 1;
self.n_occupied += 1;
None
}
ProbeResult::Full(index) => {
let (_, vv) = self.storage[index].as_mut().unwrap();
Some(std::mem::replace(vv, v))
}
}
}
pub fn remove(&mut self, k: &K) -> Option<V> {
match self.probe_find(k) {
ProbeResult::Empty(..) => None,
ProbeResult::Full(index) => {
let old_bucket = self.storage[index].take();
self.metadata[index] = metadata::tombstone();
self.n_items -= 1;
old_bucket.map(|(_, v)| v)
}
}
}
}
That’s it for the map operations!
Implementing resize
is also easy:
We just iterate over all of the items in the old storage
and reinsert any item that is Some((k, v))
using _insert
,
which updates the metadata automatically.
Speed comparison with std
As a reminder, the numbers in the table below are the ratio of each map’s runtime with that of std
’s implementation.
For details on the benchmark code, please see Part 1 or the repo.
Benchmark | first | second | third |
---|---|---|---|
new (capacity: 0) | 0.7 | 0.5 | 0.8 |
new (capacity: 10⁵ 16-byte items) | 60 | 27 | 33 |
drop (10⁵ items) | 3.0 | 1.1 | 1.2 |
insert_grow_seq (payload: 8 bytes) | 2.2 | 2.1 | 1.9 |
insert_grow_seq (64 bytes) | 2.4 | 2.6 | 1.9 |
insert_grow_random (8 bytes) | 2.5 | 2.1 | 1.9 |
insert_grow_random (64 bytes) | 2.2 | 2.5 | 2.3 |
insert_reserved_random (8 bytes) | 2.3 | 1.5 | 1.4 |
insert_reserved_random (64 bytes) | 2.0 | 1.6 | 1.4 |
lookup (8 bytes) | 1.2 | 1.2 | 1.3 |
lookup (64 bytes) | 1.2 | 1.5 | 1.4 |
lookup_string (8 bytes) | 1.1 | 1.3 | 0.9 |
lookup_string (64 bytes) | 0.7 | 1.5 | 1.1 |
lookup_miss (8 bytes) | 1.4 | 2.0 | 1.6 |
lookup_miss (64 bytes) | 1.6 | 3.0 | 1.5 |
remove (8 bytes) | 1.7 | 0.7 | 0.8 |
remove (64 bytes) | 2.0 | 0.6 | 0.9 |
Yay!
We see an improvement across the board over second::Map
.
third::Map
is our fastest map yet.
We still lag behind std
’s map quite a bit in the insert_grow
and the new
benchmarks.
As mentioned earlier, we can leverage the metadata to leave unused storage buckets uninitialized, which should improve both of these benchmarks.
Let’s do it!
Unsafe map using MaybeUninit
MaybeUninit<T>
is a standard library utility that lets us create uninitialized memory.
Here’s how this might be used, (barely) adapted from std
’s docs:
use std::mem::MaybeUninit;
// Make an uninitialized variable.
let mut bucket = MaybeUninit::<(u32, u32)>::uninit();
// Set it to an initialized value.
bucket.write((0, 0));
// Read the contents of `bucket`.
// If `bucket` is not initialized, this is UB!
let (k, v) = unsafe { bucket.assume_init() };
The key rule for using MaybeUninit
is pretty simple:
We must ensure that the memory is actually initialized before we read it!
Breaking this rule leads to instant undefined behavior (UB).
Luckily, we have the metadata to help us. In fact, we can define the following simple invariant:
“storage[i]
can be treated as initialized <=> is_full(metadata[i])
,”
which, if correctly maintained, will keep us from ever reading an uninitialized bucket.
Implementation
Here’s the new Map
type.
It’s the same as third::Map
, but with Option
replaced with MaybeUninit
in the type of storage
:
// fourth.rs
//! A Swiss Tables-inspired map with metadata.
//! This is similar to the one in `third`, except using MaybeUninit as an optimization.
pub struct Map<K, V, S: BuildHasher = DefaultHashBuilder> {
hasher: S,
n_items: usize,
n_occupied: usize,
storage: Box<[MaybeUninit<(K, V)>]>, // New!
metadata: Box<[Metadata]>,
}
// continued...
I’ve been leaving out the with_capacity
implementation for the last several maps since it never changes, but here we use something slightly different:
// fourth.rs
// ...continued
impl<K, V> Map<K, V> {
pub fn new() -> Self {
Self::with_capacity(0)
}
pub fn with_capacity(capacity: usize) -> Self {
let capacity = fix_capacity(capacity);
// New!
// Shortcut for creating `storage`.
// This requires a nightly compiler to build.
let storage = Box::new_uninit_slice(capacity);
let metadata = (0..capacity)
.map(|_| Metadata::empty())
.collect::<Vec<_>>()
.into_boxed_slice();
Self {
hasher: DefaultHashBuilder::default(),
n_items: 0,
n_occupied: 0,
storage,
metadata,
}
}
}
// continued...
Above, we use the nightly Box::new_uninit_slice
API to create a new boxed slice of uninitialized data.
None of the other code needs to be changed at all, except for in the places where we directly access storage
’s data.
Here’s what probe_find
looks like now:
// fourth.rs
// ...continued
impl<K, V> Map<K, V>
where
K: PartialEq + Eq + Hash,
{
fn probe_find(&self, k: &K) -> ProbeResult {
let (mut current, h2) = self.bucket_index_and_h2(k);
for step in 0..self.n_buckets() {
current = fast_rem(current + step, self.n_buckets());
let meta = self.metadata[current];
if metadata::is_empty(meta) {
return ProbeResult::Empty(current, h2);
} else if metadata::is_full(meta) && metadata::h2(meta) == h2 {
// SAFETY: we checked the invariant that `meta.is_value()`.
let (kk, _) = unsafe { self.storage[current].assume_init_ref() };
if kk == k {
return ProbeResult::Full(current);
}
}
}
unreachable!("backing storage is full, we didn't resize correctly")
}
}
Here, we have our first use of unsafe
where we try to read from self.storage
using MaybeUninit::assume_init_ref
(the non-owning version of MaybeUninit::assume_init
).
We also need to use a little unsafe in the main map operations, for example get
and get_mut
:
// fourth.rs
// ...continued
impl<K, V> Map<K, V>
where
K: PartialEq + Eq + Hash,
{
pub fn get(&self, k: &K) -> Option<&V> {
match self.probe_find(k) {
ProbeResult::Empty(..) => None,
ProbeResult::Full(index) => {
// SAFETY:
// `self.probe_find` only returns `ProbeResult::Full`
// if `is_full(self.metadata[index])`.
let (_, v) = unsafe { self.storage[index].assume_init_ref() };
Some(v)
}
}
}
pub fn get_mut(&mut self, k: &K) -> Option<&mut V> {
match self.probe_find(k) {
ProbeResult::Empty(..) => None,
ProbeResult::Full(index) => {
// SAFETY:
// `self.probe_find` only returns `ProbeResult::Full`
// if `is_full(self.metadata[index])`.
let (_, v) = unsafe { self.storage[index].assume_init_mut() };
Some(v)
}
}
}
}
_insert
and remove
are mostly unchanged, except for replacing the old Option
methods with their MaybeUninit
counterparts.
For example, here’s what remove
looks like now:
// fourth.rs
// ...continued
impl<K, V> Map<K, V>
where
K: PartialEq + Eq + Hash,
{
pub fn remove(&mut self, k: &K) -> Option<V> {
match self.probe_find(k) {
ProbeResult::Empty(..) => None,
ProbeResult::Full(index) => {
let old_bucket = std::mem::replace(
&mut self.storage[index],
MaybeUninit::uninit()
);
// SAFETY:
// `self.probe_find` only returns `ProbeResult::Full`
// if `is_full(self.metadata[index])`.
let (_, vv) = unsafe { old_bucket.assume_init() };
self.metadata[index] = metadata::tombstone();
self.n_items -= 1;
Some(vv)
}
}
}
}
Finally, resize
becomes a little more complicated, since we now need to check the old metadata
to know which buckets are safe to access:
// fourth.rs
// ...continued
impl<K, V> Map<K, V>
where
K: PartialEq + Eq + Hash,
{
fn resize(&mut self) {
// Calculate the new capacity.
let capacity = match self.n_buckets() {
0 => 16,
x => x * 2,
};
// Set `self.storage` to a new array.
let new_storage = Box::new_uninit_slice(capacity);
let old_storage = std::mem::replace(&mut self.storage, new_storage);
let old_buckets = Vec::from(old_storage).into_iter();
let new_metadata = (0..capacity)
.map(|_| metadata::empty())
.collect::<Vec<_>>()
.into_boxed_slice();
// Here, we need to keep the old metadata, as it's unsafe to blindly access the old storage
// array.
let old_metadata = std::mem::replace(&mut self.metadata, new_metadata);
self.n_items = 0;
self.n_occupied = 0;
// Move nodes from `old_storage` to `self.storage`.
for (&metadata, bucket) in old_metadata.iter().zip(old_buckets) {
if metadata::is_full(metadata) {
// SAFETY: we just checked the invariant above.
let (k, v) = unsafe { bucket.assume_init() };
self._insert(k, v);
}
}
}
}
Testing for UB with miri
Luckily, the unsafe
in our current implementation is not too gnarly, since we aren’t manipulating raw pointers.
That being said, it’s probably still a good idea to test our code with miri
, an interpreter that checks for UB and memory leaks.
(And sure enough, the above implementation is missing quite a few important details…1)
First, let’s check our tests using the normal test harness:
$ cargo test fourth
All passing!
To install miri
,
$ cargo install miri
And to run miri
on our test suite,
$ cargo miri test fourth
Unfortunately, running this command makes Miri spit out a long diagnostic message, ending with:
error: the evaluated program leaked memory
note: pass `-Zmiri-ignore-leaks` to disable this check
error: aborting due to previous error; 1 warning emitted
error: test failed, to rerun pass `--lib`
Looks like we have a memory leak!
Fixing leaks: Implementing Drop
The root problem here is pretty simple.
To quote from MaybeUninit<T>
’s docs:
Dropping a
MaybeUninit<T>
will never callT
’s drop code. It is your responsibility to make sureT
gets dropped if it got initialized.
Oops: we haven’t done anything to ensure our map items get dropped!
That’s fine if we’re holding types like integers, which don’t do anything in their destructor.
But what if we’re holding String
s?
Hmm, I think I made a test for this…
#[test]
fn insert_nontrivial_drop() {
let mut map: Map<String, String> = Map::new();
let items = (0..1000).map(|i| (i.to_string(), i.to_string()));
for (k, v) in items {
map.insert(k, v);
}
assert_eq!(map.len(), 1000);
}
Yup—that test is causing the leak.
To fix it, we’ll need to manually ensure that all initialized elements are dropped.
That sounds fancy (to me at least), but it’s pretty straightforward:
We loop through the metadata and for each full
entry, drop the corresponding item in the storage
array.
Here’s the code:
// fourth.rs
// ...continued
impl<K, V, S> Drop for Map<K, V, S>
where
S: BuildHasher,
{
fn drop(&mut self) {
for (i, &m) in self.metadata.iter().enumerate() {
if metadata::is_full(m) {
unsafe { self.storage[i].assume_init_drop() };
}
}
}
}
// continued...
Now, cargo miri test fourth
succeeds with no errors!
Satisfying the drop checker with #[may_dangle]
Unfortunately, we introduced a small problem by implementing Drop
ourselves.
Check out this test:
#[test]
fn insert_borrowed_data() {
let items = (0..1000)
.map(|i| (i.to_string(), i.to_string()))
.collect::<Vec<_>>();
let mut map = Map::new();
for (k, v) in &items {
map.insert(k, v);
}
assert_eq!(map.len(), 1000);
}
This compiles fine and succeeds for both normal testing and under Miri.
However, swapping the declarations of items
and map
now results in a compile error:
#[test]
fn insert_borrowed_data() {
// XXX this doesn't compile
let mut map = Map::new();
let items = (0..1000)
.map(|i| (i.to_string(), i.to_string()))
.collect::<Vec<_>>();
// ... same as before
}
And here’s rustc
’s error message:
error[E0597]: `items` does not live long enough
--> src/lib.rs:96:27
|
92 | let items = (0..1000)
| ----- binding `items` declared here
...
96 | for (k, v) in &items {
| ^^^^^^ borrowed value does not live long enough
...
100 | }
| -
| |
| `items` dropped here while still borrowed
| borrow might be used here, when `map` is dropped and runs the `Drop` code for type `fourth::Map`
|
= note: values in a scope are dropped in the opposite order they are defined
As usual, the compiler’s note is very helpful: since items
is defined after map
, it’s dropped first, even though map
is still holding references to data in items
!
This analysis is carried out by the drop checker (aka dropck
),
which ensures that no type can observe dangling references in safe Rust (see the Rustonomicon for an in-depth explanation).
Unfortunately, the dropck
introduces an ergonomic limitation for our type: any Map
that holds references must be declared after the data it references.
Interestingly though, the std
version doesn’t impose this limitation!
Indeed, there’s a way to opt out of the dropck
by using the unstable #[may_dangle]
attribute from #![feature(dropck_eyepatch)]
.
This essentially tells the compiler: “We might be holding dangling references when drop
is run, but if we are, we promise not to look at them.”
As usual in Rust, the fact that we have to “make a promise” means that unsafe
will be involved.
That was a lot of words, so here’s the code:
// fourth.rs
// ...continued
use core::marker::PhantomData;
// Add a `PhantomData` field to `Map`:
pub struct Map<K, V, S: BuildHasher = DefaultHashBuilder> {
// ...
// same fields as before, plus:
_ph: PhantomData<(K, V)>,
}
// This replaces the previous drop implementation.
unsafe impl<#[may_dangle] K, #[may_dangle] V, S> Drop for Map<K, V, S>
where
S: BuildHasher,
{
fn drop(&mut self) {
// Only touch `storage` if `K` or `V` needs to be dropped.
if std::mem::needs_drop::<(K, V)>() {
for (i, &m) in self.metadata.iter().enumerate() {
if metadata::is_full(m) {
unsafe { self.storage[i].assume_init_drop() };
}
}
}
}
}
// continued...
We added a PhantomData<(K, V)>
member to Map
, which is required when using #[may_dangle]
.
We also replaced the old impl Drop
block with a new unsafe impl Drop
, where K
and V
are annotated with #[may_dangle]
.
This asserts to the compiler that we won’t touch dangling references in the drop
implementation.
Finally, we added a check with std::mem::needs_drop
, to ensure that we don’t run drop
on the individual map items unless they’re owned types.
With these changes, the new version of insert_borrowed_data
(with map
declared first) compiles and all of the tests pass Miri.
I also added a new test that is identical to insert_borrowed_data
but uses a Map<String, &str>
or a Map<&str, String>
in place of the original Map<&str, &str>
,
in order to test the scenario where one of K
or V
is a borrowed type and the other is owned.
Implementing Clone
We also need to implement Clone
manually for the new map, since MaybeUninit<T>
only implements Clone
if T
is Copy
.
Let’s first add a test so we can check our work with Miri:
#[test]
fn clone() {
let mut map = Map::new();
for i in 0..1000 {
map.insert(i, i);
}
assert_eq!(map.len(), 1000);
let another_map = map.clone();
for i in 0..1000 {
assert_eq!(map.get(&i), another_map.get(&i));
}
}
Next, we’ll implement Clone
.
The strategy is pretty simple:
We create a new instance of Map
by cloning all the fields of self
except for storage
.
Then for each full bucket in self.storage
, we clone the key and value and write the cloned items into the new storage
array.
impl<K, V> Clone for Map<K, V>
where
K: Clone,
V: Clone,
{
fn clone(&self) -> Self {
let mut other = Self {
hasher: DefaultHashBuilder::default(),
n_items: self.n_items,
n_occupied: self.n_occupied,
storage: Box::new_uninit_slice(self.n_buckets()),
metadata: self.metadata.clone(),
_ph: PhantomData,
};
for (i, m) in self.metadata.iter().enumerate() {
if metadata::is_full(*m) {
let (k, v) = unsafe { self.storage[i].assume_init_ref() };
other.storage[i].write((k.clone(), v.clone()));
}
}
other
}
}
Note that this is pretty manual, unlike in resize
(which uses _insert
).
Manually writing to the new storage
is more efficient here because we don’t have to rehash the items (unlike during resizing).
Warning: Exception safety
Note that the
Clone
impl above is not actually sound as it stands, due to a lack of exception safety. (See the section in the Rustonomicon.)The problem occurs where we call
k.clone()
andv.clone()
above, either of which could potentially panic. If that happens, thenMap::drop
will cause UB because our invariant is violated:other.metadata
is fully copied fromself.metadata
, butother.storage
is not correctly initialized.Please see Part 3b, which is a short post dedicated to fixing this problem.
Speed comparison with std
Benchmark | first | second | third | fourth |
---|---|---|---|---|
new (capacity: 0) | 0.7 | 0.5 | 0.8 | 0.6 |
new (capacity: 10⁵ 16-byte items) | 60 | 27 | 33 | 1.1 |
drop (10⁵ items) | 3.0 | 1.1 | 1.2 | 1.1 |
insert_grow_seq (payload: 8 bytes) | 2.2 | 2.1 | 1.9 | 1.9 |
insert_grow_seq (64 bytes) | 2.4 | 2.6 | 1.9 | 1.7 |
insert_grow_random (8 bytes) | 2.5 | 2.1 | 1.9 | 1.8 |
insert_grow_random (64 bytes) | 2.2 | 2.5 | 2.3 | 1.7 |
insert_reserved_random (8 bytes) | 2.3 | 1.5 | 1.4 | 1.3 |
insert_reserved_random (64 bytes) | 2.0 | 1.6 | 1.4 | 1.4 |
lookup (8 bytes) | 1.2 | 1.2 | 1.3 | 1.2 |
lookup (64 bytes) | 1.2 | 1.5 | 1.4 | 1.1 |
lookup_string (8 bytes) | 1.1 | 1.3 | 0.9 | 0.9 |
lookup_string (64 bytes) | 0.7 | 1.5 | 1.1 | 1.2 |
lookup_miss (8 bytes) | 1.4 | 2.0 | 1.6 | 1.6 |
lookup_miss (64 bytes) | 1.6 | 3.0 | 1.5 | 1.5 |
remove (8 bytes) | 1.7 | 0.7 | 0.8 | 0.6 |
remove (64 bytes) | 2.0 | 0.6 | 0.9 | 0.6 |
We see some improvements here. Inching toward the performance of std
’s map!
This concludes Part 2. In Part 3, we’ll look at two additional optimizations made by Swiss Tables and hashbrown:
- SIMD probing. SIMD allows us to check 16 metadata entries with only a few instructions.
- Fancier allocation strategy. Putting the
metadata
andstorage
in the same allocation allows for better cache performance (especially when the map is small).
A little more
unsafe
: Removing bounds checksI also tried using
slice::get_unchecked
to remove the bounds checks in accessingself.metadata[i]
andself.storage[i]
inside ofprobe_find
, at the cost of a bit moreunsafe
2. This should be fine because we explicitly ensure that our index is inbounds by callingfast_rem(current_index, self.n_buckets())
in the loop.However, I found that doing this only improved the benchmarks by < 5%. (This change was not included in the above benchmarks.)
Footnotes
You may have noticed, for example, that the current Map
will leak a ton of memory when holding a type with nontrivial drop
!
Using unsafe
to eliminate bounds checks is generally frowned upon—see e.g.,
this section in the Rust Performance Book.
A lot of suggestions on how to do this safely can be found in
“How to avoid bounds checks in Rust (without unsafe!).