Tuesday, October 16, 2018

SPARROW theorem and RUM Conjecture

I've found a post in RocksDB blog about the SPAce, Read Or Write (SPARROW) theorem, which states that:
1. Read Amplification (RA) is inversely related to Write Amplification (WA)
2. Write Amplification (WA) is inversely related to Space Amplification (SA)

Seems like the same, but more detailed principles are described in RUM Conjecture paper.

Tuesday, October 9, 2018

Epoch protection

Epoch protection is a technique to avoid expensive synchronization between threads.

[1] Keir Fraser: Practical lock-freedom
[2] Faster: A Concurrent Key-Value Store with In-Place Updates

Monday, September 17, 2018

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

Finally I've finished reading this awesome book by Martin Kleppmann. It is one of the best technical books I have read. A Lot of useful information with great explanations and schemes.
Special thanks for references after each chapter and of course beautiful maps between chapters :)
This book sparked my interest in databases even more. Thank you Martin for such a good reading! Can't stop to recommend it to everyone!

Thursday, September 13, 2018

LSM tree, Memtable and SSTable

LSM tree (Log-structured merge-tree) is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches [1].
LSM trees are used in data stores such as Bigtable, HBase, LevelDB, RocksDB, ScyllaDB, Apache Cassandra, MongoDB, SQLite4, Tarantool, WiredTiger, InfluxDB, etc.

Memtable is in-memory data-structure  that holds the data before it flushed to the SSTables.
In LevelDB, RocksDB and Cassandra the implementation of Memtable is based on skiplist [2] data-structure [3, 4, 5].

SSTable (Sorted Strings Table) is a key-value based persistent, ordered immutable storage [6].
The SSTable contains a sequence of blocks (typically 64 KB in size). At the end of the SSTable an index block is stored.

[1] Log-structured merge-tree
[2] Skip list
[3] Reviewing LevelDB: Part V, into the MemTables we go
[4] MemTable
[5] Apache Cassandra Github: SASIIndex
[6] Bigtable: A Distributed Storage System for Structured Data

Monday, August 20, 2018

PEPFS - Writing file system in CPython

Writing file system in CPython

On last PiterPy meetup I've made a talk about writing simple file system PEPFS in CPython using FUSE (File System In Userspace).

The PEPFS project is available on github: https://github.com/delimitry/pepfs
PEPFS is a simple read-only file system where files are CPython PEPs.
To build file system a fusepy module was used.

Slides are available here https://speakerdeck.com/delimitry/writing-file-system-in-cpython and here https://www.slideshare.net/delimitry/writing-file-system-in-cpython

Wednesday, July 11, 2018

CPython logo

On last Python meetup I've made a talk about CPython logo, its history, authors and meaning.

Slides are available here https://speakerdeck.com/delimitry/cpython-logo and here https://slideshare.net/delimitry/cpython-logo

Please let me know if I you find any mistakes and inaccuracies

And here the embedded presentation:

Thursday, May 10, 2018

Seven Concurrency Models in Seven Weeks

Just finished reading Seven Concurrency Models in Seven Weeks book by Paul Butcher.

This book is the best overview of concurrent/parallel programming models I've ever seen.

Highly recommend for everyone interested in distributed systems and concurrent/parallel programming paradigms.