Wednesday, October 29, 2014

Installing tesseract for python on Ubuntu 14.04

Building and installing tesseract for python on Ubuntu 14.04.

root@server:/home/user/tesseract# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.1 LTS"

Install packages
sudo apt-get install python-distutils-extra tesseract-ocr tesseract-ocr-eng libopencv-dev libtesseract-dev libleptonica-dev python-all-dev swig libcv-dev python-opencv python-numpy python-setuptools build-essential subversion
sudo apt-get install autoconf automake libtool
sudo apt-get install libpng12-dev libjpeg62-dev libtiff4-dev zlib1g-dev

For tesseract training install the next packages:
sudo apt-get install libicu-dev libpango1.0-dev libcairo2-dev

Download leptonica
wget http://www.leptonica.com/source/leptonica-1.71.tar.gz
tar xvf leptonica-1.71.tar.gz

and build it
cd leptonica-1.71
./configure
make
make install

Download tesseract-ocr
wget https://bitbucket.org/3togo/python-tesseract/downloads/tesseract-3.03-rc1.tar.gz
tar xvf tesseract-3.03-rc1.tar.gz

and build it
cd tesseract-3.03
./autogen.sh
./configure
make
sudo make install
sudo ldconfig

Download (checkout) python-tesseract
svn checkout http://python-tesseract.googlecode.com/svn/trunk/src python-tesseract

I've used 659 revistion.

and build it
cd python-tesseract
python setup.py clean
python setup.py build
python setup.py install

After that try to run your python example.
If you'll get such error:
Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
AdaptedTemplates != NULL:Error:Assert failed:in file adaptmatch.cpp, line 174
Segmentation fault (core dumped)

You could fix it by patching "mainblk.cpp" file inside tesseract-3.03\ccutil\ folder the next way:

In the "mainblk.cpp" file code:
  if (argv0 != NULL) {
    datadir = argv0;
  } else {
    if (getenv("TESSDATA_PREFIX")) {
      datadir = getenv("TESSDATA_PREFIX");
    } else {
#ifdef TESSDATA_PREFIX
#define _STR(a) #a
#define _XSTR(a) _STR(a)
    datadir = _XSTR(TESSDATA_PREFIX);
#undef _XSTR
#undef _STR
#endif
    }
  }

  // insert code here

  // datadir may still be empty:
  if (datadir.length() == 0) {
    datadir = "./";

add:
  if (getenv("TESSDATA_PREFIX")) {
      datadir = getenv("TESSDATA_PREFIX");
  } else {
    // check dir with tessdata
    struct stat sb;
    if (stat("/usr/share/tesseract-ocr/tessdata", &sb) == 0 && S_ISDIR(sb.st_mode)) {    
      datadir = "/usr/share/tesseract-ocr";
    }
  }

and include:
#include <sys/stat.h>

And rebuild and reinstall tesseract-ocr:
cd tesseract-3.03
make
sudo make install

So, after that, if you have TESSDATA_PREFIX env variable, it will be loaded, and if you have tessdata folder with files in /usr/share/tesseract-ocr/ it will be loaded, otherwise directory with your python example module (./) will be checked for tessdata folder.

P.S. Take a look at the repo, by the way, there are already built deb package: https://bitbucket.org/3togo/python-tesseract/

Update:

If you have the next error message when importing tesseract module in Python:
Traceback (most recent call last):
  File "test_module.py", line 5, in
    from tesseract_ocr import TesseractOCR
  File "/home/user/ocr/module/tesseract_ocr.py", line 9, in
    import tesseract
  File "/usr/local/lib/python2.7/dist-packages/python_tesseract-0.9-py2.7-linux-x86_64.egg/tesseract.py", line 28, in
    _tesseract = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/python_tesseract-0.9-py2.7-linux-x86_64.egg/tesseract.py", line 24, in swig_import_helper
    _mod = imp.load_module('_tesseract', fp, pathname, description)
ImportError: /usr/local/lib/python2.7/dist-packages/python_tesseract-0.9-py2.7-linux-x86_64.egg/_tesseract.so: undefined symbol: cvSetData

Check that opencv library is linked in the _tesseract.so, because cvSetData is opencv's function.
ldd _tesseract.so | grep libopencv

If the output is empty try to build _tesseract.so using this command:

sudo c++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/tesseract_wrap.o build/temp.linux-x86_64-2.7/main.o -lstdc++ -ltesseract -llept -lopencv_superres -lopencv_video -lopencv_videostab -lopencv_ml -lopencv_ocl -lopencv_contrib -lopencv_flann -lopencv_calib3d -lopencv_imgproc -lopencv_core -lopencv_legacy -lopencv_stitching -lopencv_features2d -lopencv_photo -lopencv_ts -lopencv_objdetect -lopencv_highgui -lopencv_gpu -o build/lib.linux-x86_64-2.7/_tesseract.so

NB: The command python setup.py build must be executed before the above one, otherwise next errors will be printed:
c++: error: build/temp.linux-x86_64-2.7/tesseract_wrap.o: No such file or directory
c++: error: build/temp.linux-x86_64-2.7/main.o: No such file or directory

After successful build of _tesseract.so copy it to python-tesseract directory:
sudo cp ./build/lib.linux-x86_64-2.7/_tesseract.so .

And check again that opencv library is linked in the _tesseract.so.
ldd _tesseract.so | grep libopencv
Should be something like:
libopencv_core.so.2.4 => /usr/local/lib/libopencv_core.so.2.4 (0x000070d310313370)

Now install python-tesseract:
python setup.py install

After that the problem "undefined symbol: cvSetData" will be solved.

If you have the next error:
>>> import tesseract
Traceback (most recent call last):
  File "", line 1, in
  File "/usr/lib/python2.7/dist-packages/tesseract.py", line 28, in
    _tesseract = swig_import_helper()
  File "/usr/lib/python2.7/dist-packages/tesseract.py", line 24, in swig_import_helper
    _mod = imp.load_module('_tesseract', fp, pathname, description)
ImportError: /usr/lib/python2.7/dist-packages/_tesseract.x86_64-linux-gnu.so: undefined symbol: pixGenerateFlateData

Try to use old python-tesseract svn revision (e.g. 659 or 660).
https://code.google.com/p/python-tesseract/source/detail?r=659

Update: 
I've added a new tutorial Installing tesseract for python on Ubuntu 15.10.

Thursday, October 2, 2014

Useful tools for CTF

I've selected useful and must-have tools for CTF games and computer security competitions. Most of this tools are often indispensable during the games (especially task-based/jeopardy CTF games).
I've combined tools by categories just like in CTF games: Reverse, Steganography, Networking, Forensics, Cryptography, Scripting.
Most of tools are cross-platform, but some of them are only for Windows or Linux.
Here the light and dark editions of cheat sheets/posters with tools:
Утилиты, программы и тулзы для CTF игр
This is the first version of useful CTF tools cheat sheets. I'm planning to update them with new useful tools.
Thanks to shr for a good advice to add the links for tools. Here are the links to the tools from cheat sheets:

Reverse Engineering:
GDB - http://www.gnu.org/software/gdb/download/
IDA Pro - https://www.hex-rays.com/products/ida/support/download.shtml
Immunity Debugger - http://debugger.immunityinc.com/
OllyDbg - http://www.ollydbg.de/
radare2 - http://www.radare.org/y/?p=download
Hopper - http://www.hopperapp.com/download.html
nm - unix/linux tool
objdump - linux tool
strace - linux tool
ILSpy - http://ilspy.net/
JD-GUI - http://jd.benow.ca/#jd-gui-overview
FFDec - http://www.free-decompiler.com/flash/download.html
dex2jar - http://code.google.com/p/dex2jar/
uncompyle2 - https://github.com/wibiti/uncompyle2
Hex editors:
Windows:
HxD - http://mh-nexus.de/en/hxd/
Neo - http://www.new-hex-editor.com/hex-editor-downloads.html
Linux:
Bless - http://home.gna.org/bless/downloads.html
wxHexEditor - http://www.wxhexeditor.org/download.php
Exe unpackers - Unpacking Kit 2012 - http://forum.exetools.com/showthread.php?t=13610

Networking:
Wireshark, tshark - https://www.wireshark.org/download.html
OpenVPN - https://openvpn.net/
OpenSSL - https://www.openssl.org/related/binaries.html
tcpdump - http://www.tcpdump.org/
netcat - http://netcat.sourceforge.net/
nmap - http://nmap.org/download.html

Steganography:
OpenStego - http://www.openstego.info/
OutGuess - http://www.outguess.org/download.php
SilentEye - http://www.silenteye.org/download.html
Steghide - http://steghide.sourceforge.net/download.php
StegFS - http://sourceforge.net/projects/stegfs/
pngcheck - http://www.libpng.org/pub/png/apps/pngcheck.html
GIMP - http://www.gimp.org/downloads/
Audacity - http://audacity.sourceforge.net/download/
MP3Stego - http://www.petitcolas.net/steganography/mp3stego/
ffmpeg (for video analysis) - https://www.ffmpeg.org/download.html

Forensics:
dd - unix/linux tool
strings - unix/linux tool
scalpel - https://github.com/sleuthkit/scalpel
TrID - http://mark0.net/soft-trid-e.html
binwalk - http://binwalk.org/
foremost - http://foremost.sourceforge.net/
ExifTool - http://www.sno.phy.queensu.ca/~phil/exiftool/
Digital Forensics Framework (DFF) - http://www.digital-forensic.org/download/
Computer Aided INvestigative Environment (CAINE) Linux forensics live distribution - http://www.caine-live.net/
The Sleuth Kit (TSK) - http://www.sleuthkit.org/sleuthkit/download.php
Volatility - http://code.google.com/p/volatility/

Scripting / PPC (Professional Programming and Coding):
Text editors:
Sublime Text - http://www.sublimetext.com/
Notepad++ - http://notepad-plus-plus.org/
vim - http://www.vim.org/
emacs - http://www.gnu.org/software/emacs/

Crypto:
Cryptool - https://www.cryptool.org/
hashpump - https://github.com/bwall/HashPump
Sage - http://www.sagemath.org/
John the Ripper - http://www.openwall.com/john/
xortool - https://github.com/hellman/xortool
Online tools:
http://www.crypo.com/
http://www.cryptool-online.org/
http://rumkin.com/tools/cipher/
Modules for python - pycrypto - https://www.dlitz.net/software/pycrypto/