Thursday, September 13, 2018

LSM tree, Memtable and SSTable

LSM tree (Log-structured merge-tree) is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches [1].
LSM trees are used in data stores such as Bigtable, HBase, LevelDB, RocksDB, ScyllaDB, Apache Cassandra, MongoDB, SQLite4, Tarantool, WiredTiger, InfluxDB, etc.

Memtable is in-memory data-structure  that holds the data before it flushed to the SSTables.
In LevelDB, RocksDB and Cassandra the implementation of Memtable is based on skiplist [2] data-structure [3, 4, 5].

SSTable (Sorted Strings Table) is a key-value based persistent, ordered immutable storage [6].
The SSTable contains a sequence of blocks (typically 64 KB in size). At the end of the SSTable an index block is stored.

References:
[1] Log-structured merge-tree
[2] Skip list
[3] Reviewing LevelDB: Part V, into the MemTables we go
[4] MemTable
[5] Apache Cassandra Github: SASIIndex
[6] Bigtable: A Distributed Storage System for Structured Data

No comments:

Post a Comment