Wednesday, October 29, 2014

Installing tesseract for python on Ubuntu 14.04

Building and installing tesseract for python on Ubuntu 14.04.

root@server:/home/user/tesseract# cat /etc/lsb-release

Install packages
sudo apt-get install python-distutils-extra tesseract-ocr tesseract-ocr-eng libopencv-dev libtesseract-dev libleptonica-dev python-all-dev swig libcv-dev python-opencv python-numpy python-setuptools build-essential subversion
sudo apt-get install autoconf automake libtool
sudo apt-get install libpng12-dev libjpeg62-dev libtiff4-dev zlib1g-dev

For tesseract training install the next packages:
sudo apt-get install libicu-dev libpango1.0-dev libcairo2-dev

Download leptonica
tar xvf leptonica-1.71.tar.gz

and build it
cd leptonica-1.71
make install

Download tesseract-ocr
tar xvf tesseract-3.03-rc1.tar.gz

and build it
cd tesseract-3.03
sudo make install
sudo ldconfig

Download (checkout) python-tesseract
svn checkout python-tesseract

I've used 659 revistion.

and build it
cd python-tesseract
python clean
python build
python install

After that try to run your python example.
If you'll get such error:
Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
AdaptedTemplates != NULL:Error:Assert failed:in file adaptmatch.cpp, line 174
Segmentation fault (core dumped)

You could fix it by patching "mainblk.cpp" file inside tesseract-3.03\ccutil\ folder the next way:

In the "mainblk.cpp" file code:
  if (argv0 != NULL) {
    datadir = argv0;
  } else {
    if (getenv("TESSDATA_PREFIX")) {
      datadir = getenv("TESSDATA_PREFIX");
    } else {
#define _STR(a) #a
#define _XSTR(a) _STR(a)
    datadir = _XSTR(TESSDATA_PREFIX);
#undef _XSTR
#undef _STR

  // insert code here

  // datadir may still be empty:
  if (datadir.length() == 0) {
    datadir = "./";

  if (getenv("TESSDATA_PREFIX")) {
      datadir = getenv("TESSDATA_PREFIX");
  } else {
    // check dir with tessdata
    struct stat sb;
    if (stat("/usr/share/tesseract-ocr/tessdata", &sb) == 0 && S_ISDIR(sb.st_mode)) {    
      datadir = "/usr/share/tesseract-ocr";

and include:
#include <sys/stat.h>

And rebuild and reinstall tesseract-ocr:
cd tesseract-3.03
sudo make install

So, after that, if you have TESSDATA_PREFIX env variable, it will be loaded, and if you have tessdata folder with files in /usr/share/tesseract-ocr/ it will be loaded, otherwise directory with your python example module (./) will be checked for tessdata folder.

P.S. Take a look at the repo, by the way, there are already built deb package:


If you have the next error message when importing tesseract module in Python:
Traceback (most recent call last):
  File "", line 5, in
    from tesseract_ocr import TesseractOCR
  File "/home/user/ocr/module/", line 9, in
    import tesseract
  File "/usr/local/lib/python2.7/dist-packages/python_tesseract-0.9-py2.7-linux-x86_64.egg/", line 28, in
    _tesseract = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/python_tesseract-0.9-py2.7-linux-x86_64.egg/", line 24, in swig_import_helper
    _mod = imp.load_module('_tesseract', fp, pathname, description)
ImportError: /usr/local/lib/python2.7/dist-packages/python_tesseract-0.9-py2.7-linux-x86_64.egg/ undefined symbol: cvSetData

Check that opencv library is linked in the, because cvSetData is opencv's function.
ldd | grep libopencv

If the output is empty try to build using this command:

sudo c++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/tesseract_wrap.o build/temp.linux-x86_64-2.7/main.o -lstdc++ -ltesseract -llept -lopencv_superres -lopencv_video -lopencv_videostab -lopencv_ml -lopencv_ocl -lopencv_contrib -lopencv_flann -lopencv_calib3d -lopencv_imgproc -lopencv_core -lopencv_legacy -lopencv_stitching -lopencv_features2d -lopencv_photo -lopencv_ts -lopencv_objdetect -lopencv_highgui -lopencv_gpu -o build/lib.linux-x86_64-2.7/

NB: The command python build must be executed before the above one, otherwise next errors will be printed:
c++: error: build/temp.linux-x86_64-2.7/tesseract_wrap.o: No such file or directory
c++: error: build/temp.linux-x86_64-2.7/main.o: No such file or directory

After successful build of copy it to python-tesseract directory:
sudo cp ./build/lib.linux-x86_64-2.7/ .

And check again that opencv library is linked in the
ldd | grep libopencv
Should be something like: => /usr/local/lib/ (0x000070d310313370)

Now install python-tesseract:
python install

After that the problem "undefined symbol: cvSetData" will be solved.

If you have the next error:
>>> import tesseract
Traceback (most recent call last):
  File "", line 1, in
  File "/usr/lib/python2.7/dist-packages/", line 28, in
    _tesseract = swig_import_helper()
  File "/usr/lib/python2.7/dist-packages/", line 24, in swig_import_helper
    _mod = imp.load_module('_tesseract', fp, pathname, description)
ImportError: /usr/lib/python2.7/dist-packages/ undefined symbol: pixGenerateFlateData

Try to use old python-tesseract svn revision (e.g. 659 or 660).

I've added a new tutorial Installing tesseract for python on Ubuntu 15.10.

Thursday, October 2, 2014

Useful tools for CTF

I've selected useful and must-have tools for CTF games and computer security competitions. Most of this tools are often indispensable during the games (especially task-based/jeopardy CTF games).
I've combined tools by categories just like in CTF games: Reverse, Steganography, Networking, Forensics, Cryptography, Scripting.
Most of tools are cross-platform, but some of them are only for Windows or Linux.
Here the light and dark editions of cheat sheets/posters with tools:
Утилиты, программы и тулзы для CTF игр
This is the first version of useful CTF tools cheat sheets. I'm planning to update them with new useful tools.
Thanks to shr for a good advice to add the links for tools. Here are the links to the tools from cheat sheets:

Reverse Engineering:
IDA Pro -
Immunity Debugger -
OllyDbg -
radare2 -
Hopper -
nm - unix/linux tool
objdump - linux tool
strace - linux tool
ILSpy -
FFDec -
dex2jar -
uncompyle2 -
Hex editors:
HxD -
Neo -
Bless -
wxHexEditor -
Exe unpackers - Unpacking Kit 2012 -

Wireshark, tshark -
OpenVPN -
OpenSSL -
tcpdump -
netcat -
nmap -

OpenStego -
OutGuess -
SilentEye -
Steghide -
StegFS -
pngcheck -
Audacity -
MP3Stego -
ffmpeg (for video analysis) -

dd - unix/linux tool
strings - unix/linux tool
scalpel -
TrID -
binwalk -
foremost -
ExifTool -
Digital Forensics Framework (DFF) -
Computer Aided INvestigative Environment (CAINE) Linux forensics live distribution -
The Sleuth Kit (TSK) -
Volatility -

Scripting / PPC (Professional Programming and Coding):
Text editors:
Sublime Text -
Notepad++ -
vim -
emacs -

Cryptool -
hashpump -
Sage -
John the Ripper -
xortool -
Online tools:
Modules for python - pycrypto -