0

Consistent hashing research

Some theories

Consistent hashing

Consistent hashing is a special kind of hashing such that when a hash table is resized and consistent hashing is used, only K/n keys need to be remapped on average, where K is the number of keys, and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped.

Hash function

A hash function is any function that can be used to map digital data of arbitrary size to digital data of fixed size. Hash function have many use: 1.1 Hash tables 1.2 Caches 1.3 Bloom filters 1.4 Finding duplicate records 1.5 Protecting data 1.6 Finding similar records 1.7 Finding similar substrings 1.8 Geometric hashing 1.9 Standard uses of hashing in cryptography

Hash table:

A hash table (hash map) is a data structure used to implement an associative array, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the correct value can be found.

Problem and solution

Problem

  • If your DB is too slow then you want to use cache, but one cache server is not enough...so your cache have to use multiple servers (scale horizontally). You will need some ways of map the right server with a particular key. You may think about get a server in N servers by your key % N.
  • It's a good start solution but you will get a problem soon when you change number of servers, most of your cache will become invalid, so consisten hasing will help you to handle these problems: ** with key and list of servers, how do you find the primary, second,... server for this key? ** if you have diffirence size servers, how do you assign works that corresponds to these capacity? ** how do you add capacity to your system without downtime?

Solution

Imagine 64bit-space like this:

And you hash two of hash resources by a good hash function(ex: SHA1) then take 8 bytes.This is how we see:

Finally, about servers, if you have 3 servers, your first server have an IP: IP1, so you name this: IP1-1, your second server have twice as much memories as your first one and have IP2, so you will create two node: IP2-1 and IP2-2, last server have the same memory as server 1, call this by name IP3-1. Now hash all names into 64-bit numbers and stick on the circle:

Now you slove the problem, you can find out which server of resource A by start at resource A and go clockwise until you get a server. If server 2 die, your system sill ok, resource A will shift to server 1 and resource B will shift to server 3. You can manage amount of load to each server by add more node(IP2-1, IP2-2) to system or cut off some nodes.

Conclusion

Consistent hashing is based on mapping each object to a point on the edge of a circle. The system maps each available server to many randomly distributed points on the edge of the same circle. To find where an object should be placed, the system finds the location of that object's key on the edge of the circle; then walks around the circle until falling into the first server it encounters. The result is that each server contains all the resources located between its point and the previous server point. If a server becomes unavailable, then the angles it maps to will be removed. Requests for resources that would have mapped to each of those points now map to the next highest point. Since each server is associated with many randomly distributed points, the resources that were held by that server will now map to many different servers. The items that mapped to the down server must be redistributed among the remaining ones, but values mapping to other servers will still do so and do not need to be moved. A similar process occurs when a server is added. By adding a server point, we make any resources between that and the next smaller angle map to the new bucket. These resources will no longer be associated with the previous server , and any value previously stored there will not be found by the selection method described above. The portion of the keys associated with each server can be altered by altering the number of angles that server maps to.

Refrences

Memcached is a very popular library implement consistent hashing. It scale horizontally very good and be used in many large systems.


All rights reserved

Viblo
Hãy đăng ký một tài khoản Viblo để nhận được nhiều bài viết thú vị hơn.
Đăng kí