Ernestas Poškus

Technical blog

"We must view with profound respect the infinite capacity of the human mind to resist the introduction of useful knowledge." - Thomas R. Lounsbury

| github | goodreads | linkedin | twitter |

ansible 2 / elasticsearch 2 / kernel 2 / leadership 1 / linux 2 / mnemonics 1 / nginx 1 / paper 40 / personal 5 / rust 1 / tools 2 /

Approximating Data with the Count-Min Data Structure

WC 154 / RT 1min


Count-Min Data Structure

Algorithmic problems such as tracking the contents of a set arise frequently in the course of building systems. Given the variety of possible solutions, the choice of appropriate data structures for such tasks is at the heart of building efficient and effective software.

The Count-Min sketch provides a different kind of solution to count tracking. It allocates a fixed amount of space to store count information, which does not vary over time even as more and more counts are updated.

Implementation

With all data structures, it is important to understand the data organization and algorithms for updating the structure, to make clear the relative merits of different choices of structure for a given task. The Count-Min Sketch data structure primarily consists of a fixed array of counters, of width w and depth d. The counters are initialized to all zeros. Each row of counters is associated with a different hash function.