Saturday, November 10, 2012

Generating Sequential IDs

A typical problem we see in distributed systems is how to generate sequential IDs efficiently. Here you can find three approaches:

Percolator (Google)

The timestamp oracle is a server that hands out timestamps in strictly increasing order. Since every transaction requires contacting the timestamp oracle twice, this service must scale well. The oracle periodically allocates a range of timestamps by writing the highest allocated timestamp to stable storage; given an allocated range of timestamps, the oracle can satisfy future requests strictly from memory. If the oracle restarts, the timestamps will jump forward to the maximum allocated timestamp (but will never go backwards). To save RPC overhead (at the cost of increasing transaction latency) each Percolator worker batches timestamp requests across transactions by maintaining only one pending RPC to the oracle. As the oracle becomes more loaded, the batching naturally increases to compensate. Batching increases the scalability of the oracle but does not affect the timestamp guarantees. Our oracle serves around 2 million timestamps per second from a single machine.

Ref: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36726.pdf

Snowflake (Twitter)

To generate the roughly-sorted 64 bit ids in an uncoordinated manner, we settled on a composition of: timestamp, worker number and sequence number. Sequence numbers are per-thread and worker numbers are chosen at startup via zookeeper (though that’s overridable via a config file). We encourage you to peruse and play with the code: you’ll find it on github. Please remember, however, that it is currently alpha-quality software that we aren’t yet running in production and is very likely to change.

Ref: http://engineering.twitter.com/2010/06/announcing-snowflake.html

MySQL Ticket Server (Flickr)

A Flickr ticket server is a dedicated database server, with a single database on it, and in that database there are tables like Tickets32 for 32-bit IDs, and Tickets64 for 64-bit IDs.

The Tickets64 schema looks like:
CREATE TABLE `Tickets64` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `stub` char(1) NOT NULL default '',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `stub` (`stub`)
) ENGINE=MyISAM
SELECT * from Tickets64 returns a single row that looks something like:
+-------------------+------+
| id                | stub |
+-------------------+------+
| 72157623227190423 |    a |
+-------------------+------+
When I need a new globally unique 64-bit ID I issue the following SQL:
REPLACE INTO Tickets64 (stub) VALUES ('a');
SELECT LAST_INSERT_ID();
Ref: http://code.flickr.com/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/
Post a Comment