mirror of
https://github.com/bitcoin/bips.git
synced 2025-05-12 12:03:29 +00:00
BIP-0158: allow filters to define values for P and M, reparameterize default filter
This commit is contained in:
parent
4a85759f02
commit
1c2ed6dce3
@ -65,11 +65,14 @@ For each block, compact filters are derived containing sets of items associated
|
|||||||
with the block (eg. addresses sent to, outpoints spent, etc.). A set of such
|
with the block (eg. addresses sent to, outpoints spent, etc.). A set of such
|
||||||
data objects is compressed into a probabilistic structure called a
|
data objects is compressed into a probabilistic structure called a
|
||||||
''Golomb-coded set'' (GCS), which matches all items in the set with probability
|
''Golomb-coded set'' (GCS), which matches all items in the set with probability
|
||||||
1, and matches other items with probability <code>2^(-P)</code> for some integer
|
1, and matches other items with probability <code>2^(-P)</code> for some
|
||||||
parameter <code>P</code>.
|
integer parameter <code>P</code>. We also introduce parameter <code>M</code>
|
||||||
|
which allows filter to uniquely tune the range that items are hashed onto
|
||||||
|
before compressing. Each defined filter also selects distinct parameters for P
|
||||||
|
and M.
|
||||||
|
|
||||||
At a high level, a GCS is constructed from a set of <code>N</code> items by:
|
At a high level, a GCS is constructed from a set of <code>N</code> items by:
|
||||||
# hashing all items to 64-bit integers in the range <code>[0, N * 2^P)</code>
|
# hashing all items to 64-bit integers in the range <code>[0, N * M)</code>
|
||||||
# sorting the hashed values in ascending order
|
# sorting the hashed values in ascending order
|
||||||
# computing the differences between each value and the previous one
|
# computing the differences between each value and the previous one
|
||||||
# writing the differences sequentially, compressed with Golomb-Rice coding
|
# writing the differences sequentially, compressed with Golomb-Rice coding
|
||||||
@ -80,9 +83,13 @@ The following sections describe each step in greater detail.
|
|||||||
|
|
||||||
The first step in the filter construction is hashing the variable-sized raw
|
The first step in the filter construction is hashing the variable-sized raw
|
||||||
items in the set to the range <code>[0, F)</code>, where <code>F = N *
|
items in the set to the range <code>[0, F)</code>, where <code>F = N *
|
||||||
2^P</code>. Set membership queries against the hash outputs will have a false
|
M</code>. Customarily, <code>M</code> is set to <code>2^P</code>. However, if
|
||||||
positive rate of <code>2^(-P)</code>. To avoid integer overflow, the number of
|
one is able to select both Parameters independently, then more optimal values
|
||||||
items <code>N</code> MUST be <2^32 and <code>P</code> MUST be <=32.
|
can be
|
||||||
|
selected<ref>https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845</ref>.
|
||||||
|
Set membership queries against the hash outputs will have a false positive rate
|
||||||
|
of <code>2^(-P)</code>. To avoid integer overflow, the
|
||||||
|
number of items <code>N</code> MUST be <2^32 and <code>M</code> MUST be <2^32.
|
||||||
|
|
||||||
The items are first passed through the pseudorandom function ''SipHash'', which
|
The items are first passed through the pseudorandom function ''SipHash'', which
|
||||||
takes a 128-bit key <code>k</code> and a variable-sized byte vector and produces
|
takes a 128-bit key <code>k</code> and a variable-sized byte vector and produces
|
||||||
@ -104,9 +111,9 @@ result.
|
|||||||
hash_to_range(item: []byte, F: uint64, k: [16]byte) -> uint64:
|
hash_to_range(item: []byte, F: uint64, k: [16]byte) -> uint64:
|
||||||
return (siphash(k, item) * F) >> 64
|
return (siphash(k, item) * F) >> 64
|
||||||
|
|
||||||
hashed_set_construct(raw_items: [][]byte, P: uint, k: [16]byte) -> []uint64:
|
hashed_set_construct(raw_items: [][]byte, k: [16]byte, M: uint) -> []uint64:
|
||||||
let N = len(raw_items)
|
let N = len(raw_items)
|
||||||
let F = N << P
|
let F = N * M
|
||||||
|
|
||||||
let set_items = []
|
let set_items = []
|
||||||
|
|
||||||
@ -197,8 +204,8 @@ with Golomb-Rice coding. Finally, the bit stream is padded with 0's to the
|
|||||||
nearest byte boundary and serialized to the output byte vector.
|
nearest byte boundary and serialized to the output byte vector.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
construct_gcs(L: [][]byte, P: uint, k: [16]byte) -> []byte:
|
construct_gcs(L: [][]byte, P: uint, k: [16]byte, M: uint) -> []byte:
|
||||||
let set_items = hashed_set_construct(L, P, k)
|
let set_items = hashed_set_construct(L, k, M)
|
||||||
|
|
||||||
set_items.sort()
|
set_items.sort()
|
||||||
|
|
||||||
@ -224,8 +231,8 @@ against the reconstructed values. Note that querying does not require the entire
|
|||||||
decompressed set be held in memory at once.
|
decompressed set be held in memory at once.
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
gcs_match(key: [16]byte, compressed_set: []byte, target: []byte, P: uint, N: uint) -> bool:
|
gcs_match(key: [16]byte, compressed_set: []byte, target: []byte, P: uint, N: uint, M: uint) -> bool:
|
||||||
let F = N << P
|
let F = N * M
|
||||||
let target_hash = hash_to_range(target, F, k)
|
let target_hash = hash_to_range(target, F, k)
|
||||||
|
|
||||||
stream = new_bit_stream(compressed_set)
|
stream = new_bit_stream(compressed_set)
|
||||||
@ -260,6 +267,8 @@ against the decompressed GCS contents. See
|
|||||||
|
|
||||||
This BIP defines one initial filter type:
|
This BIP defines one initial filter type:
|
||||||
* Basic (<code>0x00</code>)
|
* Basic (<code>0x00</code>)
|
||||||
|
* <code>M = 784931</code>
|
||||||
|
* <code>P = 19</code>
|
||||||
|
|
||||||
==== Contents ====
|
==== Contents ====
|
||||||
|
|
||||||
@ -271,24 +280,27 @@ items for each transaction in a block:
|
|||||||
|
|
||||||
==== Construction ====
|
==== Construction ====
|
||||||
|
|
||||||
Both the basic and extended filter types are constructed as Golomb-coded sets
|
The basic type is constructed as Golomb-coded sets with the following
|
||||||
with the following parameters.
|
parameters.
|
||||||
|
|
||||||
The parameter <code>P</code> MUST be set to <code>20</code>. This value was
|
The parameter <code>P</code> MUST be set to <code>19</code>, and the parameter
|
||||||
chosen as simulations show that it minimizes the bandwidth utilized, considering
|
<code>M</code> MUST be set to <code>784931</code>. Analysis has shown that if
|
||||||
both the expected number of blocks downloaded due to false positives and the
|
one is able to select <code>P</code> and <code>M</code> independently, then
|
||||||
size of the filters themselves. The code along with a demo used for the
|
setting <code>M=1.497137 * 2^P</code> is close to optimal
|
||||||
parameter tuning can be found
|
<ref>https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845</ref>.
|
||||||
[https://github.com/Roasbeef/bips/blob/83b83c78e189be898573e0bfe936dd0c9b99ecb9/gcs_light_client/gentestvectors.go here].
|
|
||||||
|
Empirical analysis also shows that was chosen as these parameters minimize the
|
||||||
|
bandwidth utilized, considering both the expected number of blocks downloaded
|
||||||
|
due to false positives and the size of the filters themselves.
|
||||||
|
|
||||||
The parameter <code>k</code> MUST be set to the first 16 bytes of the hash of
|
The parameter <code>k</code> MUST be set to the first 16 bytes of the hash of
|
||||||
the block for which the filter is constructed. This ensures the key is
|
the block for which the filter is constructed. This ensures the key is
|
||||||
deterministic while still varying from block to block.
|
deterministic while still varying from block to block.
|
||||||
|
|
||||||
Since the value <code>N</code> is required to decode a GCS, a serialized GCS
|
Since the value <code>N</code> is required to decode a GCS, a serialized GCS
|
||||||
includes it as a prefix, written as a CompactSize. Thus, the complete
|
includes it as a prefix, written as a <code>CompactSize</code>. Thus, the
|
||||||
serialization of a filter is:
|
complete serialization of a filter is:
|
||||||
* <code>N</code>, encoded as a CompactSize
|
* <code>N</code>, encoded as a <code>CompactSize</code>
|
||||||
* The bytes of the compressed filter itself
|
* The bytes of the compressed filter itself
|
||||||
|
|
||||||
==== Signaling ====
|
==== Signaling ====
|
||||||
@ -311,7 +323,8 @@ though it requires implementation of the new filters.
|
|||||||
|
|
||||||
We would like to thank bfd (from the bitcoin-dev mailing list) for bringing the
|
We would like to thank bfd (from the bitcoin-dev mailing list) for bringing the
|
||||||
basis of this BIP to our attention, Greg Maxwell for pointing us in the
|
basis of this BIP to our attention, Greg Maxwell for pointing us in the
|
||||||
direction of Golomb-Rice coding and fast range optimization, and Pedro
|
direction of Golomb-Rice coding and fast range optimization, Pieter Wullie for
|
||||||
|
his analysis of optimal GCS parameters, and Pedro
|
||||||
Martelletto for writing the initial indexing code for <code>btcd</code>.
|
Martelletto for writing the initial indexing code for <code>btcd</code>.
|
||||||
|
|
||||||
We would also like to thank Dave Collins, JJ Jeffrey, and Eric Lombrozo for
|
We would also like to thank Dave Collins, JJ Jeffrey, and Eric Lombrozo for
|
||||||
@ -363,8 +376,8 @@ easier to understand.
|
|||||||
=== Golomb-Coded Set Multi-Match ===
|
=== Golomb-Coded Set Multi-Match ===
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
gcs_match_any(key: [16]byte, compressed_set: []byte, targets: [][]byte, P: uint, N: uint) -> bool:
|
gcs_match_any(key: [16]byte, compressed_set: []byte, targets: [][]byte, P: uint, N: uint, M: uint) -> bool:
|
||||||
let F = N << P
|
let F = N * M
|
||||||
|
|
||||||
// Map targets to the same range as the set hashes.
|
// Map targets to the same range as the set hashes.
|
||||||
let target_hashes = []
|
let target_hashes = []
|
||||||
|
Loading…
x
Reference in New Issue
Block a user