diff options
author | Tavian Barnes <tavianator@tavianator.com> | 2022-10-26 16:22:50 -0400 |
---|---|---|
committer | Tavian Barnes <tavianator@tavianator.com> | 2022-10-29 11:24:16 -0400 |
commit | c9447e42e04dfb43d9b3ba1e33aa80b8568c3b35 (patch) | |
tree | b4bdc678896b3dc4fe7dc36100b70a9821eca161 | |
parent | 8828eed307da56b129ca56cf9ee3d9d108d81d52 (diff) | |
download | bfs-c9447e42e04dfb43d9b3ba1e33aa80b8568c3b35.tar.xz |
trie: Convert ASCII art to Unicode art
-rw-r--r-- | src/trie.c | 80 |
1 files changed, 49 insertions, 31 deletions
@@ -1,6 +1,6 @@ /**************************************************************************** * bfs * - * Copyright (C) 2019 Tavian Barnes <tavianator@tavianator.com> * + * Copyright (C) 2019-2022 Tavian Barnes <tavianator@tavianator.com> * * * * Permission to use, copy, modify, and/or distribute this software for any * * purpose with or without fee is hereby granted. * @@ -22,29 +22,29 @@ * look like * * A A A A - * *--->*--->*--->*--->$ - * | | | D D - * | | +--->*--->$ - * | | B C D - * | +--->*--->*--->$ - * | D D A A - * +--->*--->*--->*--->$ - * | D D - * +--->*--->$ + * ●───→●───→●───→●───→○ + * │ │ │ D D + * │ │ └───→●───→○ + * │ │ B C D + * │ └───→●───→●───→○ + * │ D D A A + * └───→●───→●───→●───→○ + * │ D D + * └───→●───→○ * * A compressed (PATRICIA) trie collapses internal nodes that have only a single * child, like this: * * A A AA - * *--->*--->*---->$ - * | | | DD - * | | +---->$ - * | | BCD - * | +----->$ - * | DD AA - * +---->*---->$ - * | DD - * +---->$ + * ●───→●───→●────→○ + * │ │ │ DD + * │ │ └────→○ + * │ │ BCD + * │ └─────→○ + * │ DD AA + * └────→●────→○ + * │ DD + * └────→○ * * The nodes can be compressed further by dropping the actual compressed * sequences from the nodes, storing it only in the leaves. This is the @@ -53,21 +53,39 @@ * branch on, need to be stored in each node. * * A A A - * 0--->1--->2--->AAAA - * | | | D - * | | +--->AADD - * | | B - * | +--->ABCD - * | D A - * +--->2--->DDAA - * | D - * +--->DDDD + * 0───→1───→2───→AAAA + * │ │ │ D + * │ │ └───→AADD + * │ │ B + * │ └───→ABCD + * │ D A + * └───→2───→DDAA + * │ D + * └───→DDDD * * Nodes are represented very compactly. Rather than a dense array of children, * a sparse array of only the non-NULL children directly follows the node in - * memory. A bitmap is used to track which children exist; the index of a child - * i is found by counting the number of bits below bit i that are set. A tag - * bit is used to tell pointers to internal nodes apart from pointers to leaves. + * memory. A bitmap is used to track which children exist. + * + * ┌────────────┐ + * │ [4] [3] [2][1][0] ←─ children + * │ ↓ ↓ ↓ ↓ ↓ + * │ 14 10 6 3 0 ←─ sparse index + * │ ↓ ↓ ↓ ↓ ↓ + * │ 0100010001001001 ←─ bitmap + * │ + * │ To convert a sparse index to a dense index, mask off the bits above it, and + * │ count the remaining bits. + * │ + * │ 10 ←─ sparse index + * │ ↓ + * │ 0000001111111111 ←─ mask + * │ & 0100010001001001 ←─ bitmap + * │ ──────────────── + * │ = 0000000001001001 + * │ └──┼──┘ + * │ [3] ←─ dense index + * └───────────────────┘ * * This implementation tests a whole nibble (half byte/hex digit) at every * branch, so the bitmap takes up 16 bits. The remainder of a machine word is |