Wednesday, September 16, 2015

Python ftplib storbinary hanging

Today I've spent several hours to find why python program is hangs sometimes.
Using gdb I've found that sometimes ftplib's storbinary hangs at the end of file transfer.

gdb has shown that it hangs at:
... in __libc_recv (fd=?, buf=?, n=8192, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33

I've added callback argument to storbinary to print last data block (by default blocksize is 8192):
def print_last(data):
    if data < 8192:
        print len(data)

ftp.storbinary('STOR %s' % fn, f, callback=print_last)
And after that I've found that program hangs at the end of file transfer.
I also found that EOFError happens after creating some number of ftp sessions (sockets). In my case after creating >46 (or 47) simultaneous ftp connection.

Solution was simple - to add timeout argument on ftp creation.
ftp = ftplib.FTP(self.host, self.username, self.password, timeout=10)

Tuesday, September 15, 2015

Images resizer in Go

I've published my small tool, used to study Go (golang). Source code is available on my github: https://github.com/delimitry/images_resizer
This program includes different ways to resize images - one non-concurrent and two concurrent:
first - uses just goroutines and channels, second - uses pool of workers to limit simultaneously running goroutines. More information is in readme on github.

Thursday, September 3, 2015

Working with text files in Python 2.7

Reading text files in Python 2.7 at first sight is simple, but without correct opening, a work with them can lead to unexpected results.
For example you have a file, looks like ascii-encoded and reading it line by line saving line number and line data. But if in this file some non-ascii symbols are hidden, you will have problems with encoding. In Python such problem could be reproduced as UnicodeDecodeError:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x?? in position ??: ordinal not in range(128)
Moreover if line endings are nonuniform for single platform (i.e. '\r\n' or '\r' will be met along with '\n') you could have problems with lines number. And if you open your log file in text editor - line numbers in it and in your Python script results will not correspond.
Let's prepare the test logfile:
filename = 'logfile.log'
with open(filename, 'w') as f:
    f.write('line 1' + '\n')
    f.write('line 2' + '\n')
    f.write('line 3' + '\r')
    f.write('line 4' + '\r\n')
    f.write('line 5' + '\n')
    f.write('line 6' + '\n')
    f.write('line 7')
If you open it in text editor, the result will be the following:
1 line 1
2 line 2
3 line 3
4 line 4

6 line 5
7 line 6
8 line 7
Now for example we needed to get the line number with string "line 5".
filename = 'logfile.log'
with open(filename, 'r') as f:
    for line_num, line in enumerate(f):
        if line.startswith('line 5'):
            print '%s: %s' % (line_num + 1, line.strip())
the output is:
4: line 5
Here we have a wrong line number 4. The line number in text editor is 6.

Fortunately in python there are several possible ways to open files with different useful arguments.
1) open(name[, mode[, buffering]])
2) codecs.open(filename, mode[, encoding[, errors[, buffering]]])
3) io.open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True)
The way 1 we've already used, let's try to use 2 and 3.
import io
import codecs


filename = 'logfile.log'
with codecs.open(filename, 'r') as f:
    for line_num, line in enumerate(f):
        if line.startswith('line 5'):
            print '%s: %s' % (line_num + 1, line.strip())

with io.open(filename, 'r') as f:
    for line_num, line in enumerate(f):
        if line.startswith('line 5'):
            print '%s: %s' % (line_num + 1, line.strip())

and the output is:
4: line 5
6: line 5
Here codecs.open gave the same result as a simple open, but io.open gave the expected result.
Due to its argument newline, it is possible to enable universal newlines mode and read the lines regardless line ending format (Windows, Unix, Mac OS up to version 9), i.e. lines can end with '\r\n', '\n', or '\r'. This mode is enabled by default (when newline=None). Therefore lines are split correctly and we get the line numbers the same as in text editors.
In Python 3.x io.open is the default interface to access files and streams.

codecs.open and io.open functions have encoding argument that specifies the encoding which is to be used for the file. Also they have errors argument - an optional string that specifies how encoding and decoding errors are to be handled. The difference is that codecs.open handles line endings differently.
After the additions of arguments:
with codecs.open(filename, 'r', encoding='utf-8', errors='replace') as f:
    for line_num, line in enumerate(f):
        if line.startswith('line 5'):
            print '%s: %s' % (line_num + 1, line.strip())

with io.open(filename, 'r', encoding='utf-8', errors='replace') as f:
    for line_num, line in enumerate(f):
        if line.startswith('line 5'):
            print '%s: %s' % (line_num + 1, line.strip())
the output is:
6: line 5
6: line 5