Thursday, September 13, 2018

LSM tree, Memtable and SSTable

LSM tree (Log-structured merge-tree) is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches [1].
LSM trees are used in data stores such as Bigtable, HBase, LevelDB, RocksDB, ScyllaDB, Apache Cassandra, MongoDB, SQLite4, Tarantool, WiredTiger, InfluxDB, etc.

Memtable is in-memory data-structure  that holds the data before it flushed to the SSTables.
In LevelDB, RocksDB and Cassandra the implementation of Memtable is based on skiplist [2] data-structure [3, 4, 5].

SSTable (Sorted Strings Table) is a key-value based persistent, ordered immutable storage [6].
The SSTable contains a sequence of blocks (typically 64 KB in size). At the end of the SSTable an index block is stored.

[1] Log-structured merge-tree
[2] Skip list
[3] Reviewing LevelDB: Part V, into the MemTables we go
[4] MemTable
[5] Apache Cassandra Github: SASIIndex
[6] Bigtable: A Distributed Storage System for Structured Data

Monday, August 20, 2018

PEPFS - Writing file system in CPython

Writing file system in CPython

On last PiterPy meetup I've made a talk about writing simple file system PEPFS in CPython using FUSE (File System In Userspace).

The PEPFS project is available on github:
PEPFS is a simple read-only file system where files are CPython PEPs.
To build file system a fusepy module was used.

Slides are available here and here

Wednesday, July 11, 2018

CPython logo

On last Python meetup I've made a talk about CPython logo, its history, authors and meaning.

Slides are available here and here

Please let me know if I you find any mistakes and inaccuracies

And here the embedded presentation:

Thursday, May 10, 2018

Seven Concurrency Models in Seven Weeks

Just finished reading Seven Concurrency Models in Seven Weeks book by Paul Butcher.

This book is the best overview of concurrent/parallel programming models I've ever seen.

Highly recommend for everyone interested in distributed systems and concurrent/parallel programming paradigms.

Tuesday, May 1, 2018


Last week my friend asked me about 0x80 bit that sometimes set in Python marshalled objects data in *.pyc file.
So I decided to understand what kind of flag it is and why it is used.

This is FLAG_REF — a special flag that was added in Python 3.4, and used in conjunction with TYPE_REF type code value. For source code look at:
In the current version (Python 3.7) everything is the same.

TYPE_REF = 'r'
FLAG_REF = 0x80

All currently used type code values are < 0x80 (the highest type code is TYPE_DICT = '{', i.e. 0x7b), so it was decided to store this flag right in the highest bit of type code value.

Now let's see how this flag works.

When marshal_load reads *.pyc file with read_object function, it calls r_object to read marshalled binary data object by object.
To understand the type of an object to read, a single byte with code is read.

flag = code & FLAG_REF to understand if FLAG_REF is set.
type = code & ~FLAG_REF to get only 7 bits (as all type code values are < 0x80).

Also a special macro R_REF is defined. This macro reads unmarshalled object into special refs list with r_ref function only if FLAG_REF is set.
NB: on write (for mashalling) the hashtable, instead of the list, is used.

So when type is equal to TYPE_REF, then only the reference index is read, thus eliminating the need to unmarshall the already unmarshalled object again.

Tuesday, April 24, 2018

Compressed Rich Text Format (RTF) aka LZFu compression and decompression in Python

I uploaded my package for compression and decompression compressed RTF (aka LZFu or MELA) files to PyPI. This package is written in pure Python, and supports Python 2.7 and 3.x versions.

Now it can be easily installed using pip command:
pip install compressed_rtf

The package is based on Rich Text Format (RTF) Compression Algorithm

Sources are available on github:

Thursday, April 19, 2018

RuCTF Finals 2018

This year I've decided to go to Yekaterinburg for the final of RuCTF 2018 to visit conference, watch the game (this time I was a spectator, because only students can play), see friends and and spent good time with a team.

We also visited local zoo.

Conference flew by quickly, thanks to guys who made talks and thanks to guys from HWV. You can find some presentations here:

One more event on RuCTF is battles. Two people from different teams solve the task within 15 minutes. I've solved matrix task with my team mates :)

RuCTF finals are held in Yeltsin Center with a huge display with game monitor.

This year out team SiBears finished in the 8th place (out of 26 teams in total, from 6 countries, 2 guest and 1 school team).

1st place: Bushwhackers from Moscow State University
2nd place: c00kies@venice from Ca'Foscari University
3rd place: saarsec from Saarland University

It was really hard game. More than 10 hours of stress. Out team and some other teams had an accidental power outage during the game :(
By the way the game was delayed due to some problems with vulnerable image's disk.

After the game we made team photos and went to the after-party where guys played "Nalivaika" game, where it was required to answer security-related questions several rounds. Top 10 contestants drank Vodka after each round.

In the evening we gathered on the kitchen of our hotel to eat and discuss the game, team captain even has the strength to make some fixes in our exploit farm :)

Thanks RuCTF organizers, Hackerdom and all participants. 
RuCTF is friendship!

Monday, March 19, 2018

Twin paradox and distributed systems

In the future we definetely will travel through the Universe. And the speed of the starships will be close to the speed of light, or even probably beyond it (as you can see in some movies). Therefore according to the twin paradox all the clocks will have different time. Intergalactic distributed computer network will be really difficult, and eventually (I hope so) constistent.

So, I have some questions:
How will the clocks in computers on the starships, stations, and planets be syncronized?
There are special data structure called CRDT (conflict-free replicated data type) but is also uses timestamps.
Probably we need to find some source/invariant of global time (but how can we break the laws of physics (may be quantum physics can help here)), or find a way to make amendments base on straship acceleration/speed/traveled distance if it is possible.

Saturday, March 17, 2018

iCTF 2018

Today the historical event happened. The team SiBears I'm playing with won first place in iCTF 2018 CTF contest.

First time I've played iCTF in 2011. And it was hard but really interesting. And I'm very proud that out team has finally won, and finished at first place, because this is one of the first (if not the first) classic CTF competitions. Thanks to the UCSB iCTF team for such great event.

Friday, March 16, 2018

Surely You're Joking, Mr. Feynman!

Recently I've finished reading the book "Surely You're Joking, Mr. Feynman!". This book was recommended in the book "A Mind For Numbers: How to Excel at Math and Science (Even If You Flunked Algebra)".
really enjoyed reading it. The great and funny stories from the life of the great man.
Recommend this book to everyone, it is really good.

Sunday, February 25, 2018

A Mind For Numbers by Barbara Oakley

I've finished a book "A Mind For Numbers: How to Excel at Math and Science (Even If You Flunked Algebra)"
Good book with good advices and techniques of how to effectively learn, some of them I have used many times :)
I wish I have read this book earlier. Highly recommend this book to students and pupils.

Wednesday, February 7, 2018

Buzzword poem generator algorithm

Conditions of the problem
Given: a list of buzzwords.
Problem: generate poem from these buzzwords.

We also need to know:

1) Result poem lines number.
2) Rhyme scheme.
3) Syllables in each line.

4) Minimum number of words in each line.
5) Somehow know the number of syllables in each buzzword.
6) Somehow get the list of rhyming buzzwords.

1) Knowing the total number of syllables in each line  generate all combinations of sums (compositions) from all available number of syllables.
E.g. we have buzzwords with the number of syllables 1, 2 and 3, and the total number of syllables in line (i.e. sum of syllables) equal 3. Compositions of 3 are:
+ 1 + 1
+ 2
+ 1
If we assume that we don't have buzzwords with number of syllables 2 then only two possible combinations can be: 1 + 1 + 1 and 3.
NB: Each positive integer n has 2 ** (n - 1) distinct compositions.

2) Knowing the number of syllables in each line, generate the base (I call it so) for future poem. The base is generated from random compositions for each number of syllables.
E.g. for two line poem with 3 and 4 syllables, the possible base can be the next:
[2, 1]
[1, 2, 1]
[2, 1]
[2, 2] 
The poem base can be also limited to the minimum number of words per line. 
For the example above with the minimum number of words equal 3, possible base can be the next:
[1, 1, 1]
[1, 1, 2]
or (notice that the first line has only one possible option):
[1, 1, 1]
[1, 1, 1, 1]

3) Knowing the rhyme scheme it is possible to get the last number of syllables for each line from poem base and group these numbers by rhyme scheme letters, thereby simplify the search of rhyme buzzwords.
The last number of syllables for each line are taken because only the last words form a rhyme.
For example the rhyme scheme is ABA, and the poem base is:
[1, 2, 1]
[1, 1, 2]
[3, 2]
Last numbers of syllables for each line are 1, 2 and 2.

1 A
2 B
2 A
Group them by rhyme scheme letters:
A: [1, 2], B: [2]

It is required to find all buzzwords that are rhyme, and have 1 and 2 syllables, then randomly select one
 from the list of found variants for A letter.
And find only one random buzzword with 2 syllables for B letter.
It is important that after using a buzzword, it must be removed from the list of buzzwords to avoid duplication.
If the rhymes was not found it is required to indicate that, for example by returning empty result.

4) To this step we already have poem base, and rhyme scheme letters mapped to the found buzzwords. In case of empty mapping of rhyme scheme letters to buzzwords — go to step 2 and try again.
If the mapping of rhyme scheme letters to buzzwords isn't empty, start filling the poem base with buzzwords according to the number of syllables. The filling of each poem base line must be performed until the penultimate number of syllables, because the last ones were already found in previous step and saved in the mapping of rhyme scheme letters to buzzwords.
As in the previous step, it is important that after using a buzzword, it must be removed from the list of buzzwords to avoid duplication.
If the filling of the poem was not succeed — go to step 2 and try again.
If the number of attempts is too big, stop trying and display the error message.


Support the poem metrical feet, to generate more smooth poems.

This algorithm was used in my project:

Tuesday, February 6, 2018

Buzzword poem generator

Last week I've created a new project on my github — a tool for the generation of poems from the buzzwords. I called it Buzzword poem generator. As usual, the tool is written in pure Python and supports 2.7 and, of course, Python 3.x.

The project was inspired by recent tweet from Lena Hall (@lenadroid on twitter):

My tool generates poems based on the list of buzzwords. It is possible to pass the parameters of the rhyme scheme and number of syllables in each line. For example for the rhyme scheme ABAB, and 7 syllables per each line, the next poem was generated:

$ python -r ABAB -s 7 7 7 7

Memcached Vault Impala Spark
Redshift Chef Celery Swarm
Haskell Rust Zookeeper Splunk
Python Flink Kinesis Storm

Use it with pleasure :)

CPython github repository bots

I've watched a Monty Python and the Holy Grail movie recently. Despite the fact that it was filmed in 1975, the quality of the video and the jokes are great. Recommend to everyone this British classic comedy.

Fun facts: the bots in CPython repository on github, are called after characters from this movie.
For example the next bots:

Have fun!

Saturday, January 27, 2018

Harry Potter and the Methods of Rationality

I've finally finished reading the book Harry Potter and the Methods of Rationality (HPMOR).
What can I say: it is amazing book. One of the best I've ever read.
10/10 Excellent work. I will definitely recommend it to everyone.

I wish the author to live long, and prosper - EXPECTO PATRONUM! :)

Thursday, January 25, 2018

Contribute to CPython

Today I tried to contribute to CPython. My pull request was small and changes minor, I found a mistake in HAMT algorithm description in the comments.

After sending my pull request and my username was checked by the bot "the-knights-who-say-ni", and the bot said the next message with "CLA not signed" badge.

I was really surprised. Despite this my fix was accepted and merged.
But here are the steps needed in order to sign the CLA (from
  1. If you don’t have an account on (aka b.p.o), please register to create one.
  2. Make sure your GitHub username is listed in the “Your Details” section at b.p.o.
  3. Fill out and sign the PSF contributor form. The “ username” requested by the form is the “Login name” field under “Your Details”.
After at least one US business day "Contributor Form Received Yes on " and "Is Committer <...>" status will appear in your profile details on b.p.o site.

Since this time your pull requests on github will be with "CLA signed" badge.
It is also required to provide a link like "" to the issue in PR description and specify a bpo- in PR title.