Ernestas Poškus

Technical blog

"We must view with profound respect the infinite capacity of the human mind to resist the introduction of useful knowledge." - Thomas R. Lounsbury

| github | goodreads | linkedin | twitter |

ansible 2 / elasticsearch 2 / kernel 2 / leadership 1 / linux 2 / mnemonics 1 / nginx 1 / paper 40 / personal 5 / rust 1 / tools 2 /

Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution

WC 209 / RT 1min


Replication Under Scalable Hashing

Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system.

RUSH variants also support weighting, allowing disks of different vintages to be added to a system.

RUSH variants is optimal or near-optimal reorganization. When new disks are added to the system, or old disks are retired, RUSH variants minimize the number of objects that need to be moved in order to bring the system back into balance.

RUSH variants can perform reorganization online without locking the filesystem for a long time to relocate data.

Algorithm

Subcluster in a system managed by RUSH t must have at least as many disks as an object has replicas.

RUSH t is the best algorithms for distributing data over very large clusters of disks.

RUSH r may be the best option for systems which need to remove disks one at a time from the system.

RUSH p may be the best option for smaller systems where storage space is at a premium.

Notes

RUSH t - RUSH tree

RUSH r - RUSH support for removal

PUSH p - RUSH placement using prime numbers