Thursday, September 13, 2018

LSM tree, Memtable and SSTable

LSM tree (Log-structured merge-tree) is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches [1].
LSM trees are used in data stores such as Bigtable, HBase, LevelDB, RocksDB, ScyllaDB, Apache Cassandra, MongoDB, SQLite4, Tarantool, WiredTiger, InfluxDB, etc.

Memtable is in-memory data-structure  that holds the data before it flushed to the SSTables.
In LevelDB, RocksDB and Cassandra the implementation of Memtable is based on skiplist [2] data-structure [3, 4, 5].

SSTable (Sorted Strings Table) is a key-value based persistent, ordered immutable storage [6].
The SSTable contains a sequence of blocks (typically 64 KB in size). At the end of the SSTable an index block is stored.

[1] Log-structured merge-tree
[2] Skip list
[3] Reviewing LevelDB: Part V, into the MemTables we go
[4] MemTable
[5] Apache Cassandra Github: SASIIndex
[6] Bigtable: A Distributed Storage System for Structured Data

Monday, August 20, 2018

PEPFS - Writing file system in CPython

Writing file system in CPython

On last PiterPy meetup I've made a talk about writing simple file system PEPFS in CPython using FUSE (File System In Userspace).

The PEPFS project is available on github:
PEPFS is a simple read-only file system where files are CPython PEPs.
To build file system a fusepy module was used.

Slides are available here and here

Wednesday, July 11, 2018

CPython logo

On last Python meetup I've made a talk about CPython logo, its history, authors and meaning.

Slides are available here and here

Please let me know if I you find any mistakes and inaccuracies

And here the embedded presentation:

Thursday, May 10, 2018

Seven Concurrency Models in Seven Weeks

Just finished reading Seven Concurrency Models in Seven Weeks book by Paul Butcher.

This book is the best overview of concurrent/parallel programming models I've ever seen.

Highly recommend for everyone interested in distributed systems and concurrent/parallel programming paradigms.

Tuesday, May 1, 2018


Last week my friend asked me about 0x80 bit that sometimes set in Python marshalled objects data in *.pyc file.
So I decided to understand what kind of flag it is and why it is used.

This is FLAG_REF — a special flag that was added in Python 3.4, and used in conjunction with TYPE_REF type code value. For source code look at:
In the current version (Python 3.7) everything is the same.

TYPE_REF = 'r'
FLAG_REF = 0x80

All currently used type code values are < 0x80 (the highest type code is TYPE_DICT = '{', i.e. 0x7b), so it was decided to store this flag right in the highest bit of type code value.

Now let's see how this flag works.

When marshal_load reads *.pyc file with read_object function, it calls r_object to read marshalled binary data object by object.
To understand the type of an object to read, a single byte with code is read.

flag = code & FLAG_REF to understand if FLAG_REF is set.
type = code & ~FLAG_REF to get only 7 bits (as all type code values are < 0x80).

Also a special macro R_REF is defined. This macro reads unmarshalled object into special refs list with r_ref function only if FLAG_REF is set.
NB: on write (for mashalling) the hashtable, instead of the list, is used.

So when type is equal to TYPE_REF, then only the reference index is read, thus eliminating the need to unmarshall the already unmarshalled object again.

Tuesday, April 24, 2018

Compressed Rich Text Format (RTF) aka LZFu compression and decompression in Python

I uploaded my package for compression and decompression compressed RTF (aka LZFu or MELA) files to PyPI. This package is written in pure Python, and supports Python 2.7 and 3.x versions.

Now it can be easily installed using pip command:
pip install compressed_rtf

The package is based on Rich Text Format (RTF) Compression Algorithm

Sources are available on github:

Thursday, April 19, 2018

RuCTF Finals 2018

This year I've decided to go to Yekaterinburg for the final of RuCTF 2018 to visit conference, watch the game (this time I was a spectator, because only students can play), see friends and and spent good time with a team.

We also visited local zoo.

Conference flew by quickly, thanks to guys who made talks and thanks to guys from HWV. You can find some presentations here:

One more event on RuCTF is battles. Two people from different teams solve the task within 15 minutes. I've solved matrix task with my team mates :)

RuCTF finals are held in Yeltsin Center with a huge display with game monitor.

This year out team SiBears finished in the 8th place (out of 26 teams in total, from 6 countries, 2 guest and 1 school team).

1st place: Bushwhackers from Moscow State University
2nd place: c00kies@venice from Ca'Foscari University
3rd place: saarsec from Saarland University

It was really hard game. More than 10 hours of stress. Out team and some other teams had an accidental power outage during the game :(
By the way the game was delayed due to some problems with vulnerable image's disk.

After the game we made team photos and went to the after-party where guys played "Nalivaika" game, where it was required to answer security-related questions several rounds. Top 10 contestants drank Vodka after each round.

In the evening we gathered on the kitchen of our hotel to eat and discuss the game, team captain even has the strength to make some fixes in our exploit farm :)

Thanks RuCTF organizers, Hackerdom and all participants. 
RuCTF is friendship!