Building and installing tesseract for python on Ubuntu 14.04.
root@server:/home/user/tesseract# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.1 LTS"
root@server:/home/user/tesseract# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.1 LTS"
Install packages
sudo apt-get install python-distutils-extra tesseract-ocr tesseract-ocr-eng libopencv-dev libtesseract-dev libleptonica-dev python-all-dev swig libcv-dev python-opencv python-numpy python-setuptools build-essential subversion
sudo apt-get install autoconf automake libtool
sudo apt-get install libpng12-dev libjpeg62-dev libtiff4-dev zlib1g-dev
For tesseract training install the next packages:
sudo apt-get install libicu-dev libpango1.0-dev libcairo2-dev
Download leptonica
wget http://www.leptonica.com/source/leptonica-1.71.tar.gz
tar xvf leptonica-1.71.tar.gz
and build it
cd leptonica-1.71
./configure
make
make install
Download tesseract-ocr
wget https://bitbucket.org/3togo/python-tesseract/downloads/tesseract-3.03-rc1.tar.gz
tar xvf tesseract-3.03-rc1.tar.gz
and build it
cd tesseract-3.03
./autogen.sh
./configure
make
sudo make install
sudo ldconfig
Download (checkout) python-tesseract
svn checkout http://python-tesseract.googlecode.com/svn/trunk/src python-tesseract
I've used 659 revistion.
and build it
and build it
cd python-tesseract
python setup.py clean
python setup.py build
python setup.py install
After that try to run your python example.
If you'll get such error:
Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
AdaptedTemplates != NULL:Error:Assert failed:in file adaptmatch.cpp, line 174
Segmentation fault (core dumped)
You could fix it by patching "mainblk.cpp" file inside tesseract-3.03\ccutil\ folder the next way:
In the "mainblk.cpp" file code:
if (argv0 != NULL) {
datadir = argv0;
} else {
if (getenv("TESSDATA_PREFIX")) {
datadir = getenv("TESSDATA_PREFIX");
} else {
#ifdef TESSDATA_PREFIX
#define _STR(a) #a
#define _XSTR(a) _STR(a)
datadir = _XSTR(TESSDATA_PREFIX);
#undef _XSTR
#undef _STR
#endif
}
}
// insert code here
// datadir may still be empty:
if (datadir.length() == 0) {
datadir = "./";
add:
if (getenv("TESSDATA_PREFIX")) {
datadir = getenv("TESSDATA_PREFIX");
} else {
// check dir with tessdata
struct stat sb;
if (stat("/usr/share/tesseract-ocr/tessdata", &sb) == 0 && S_ISDIR(sb.st_mode)) {
datadir = "/usr/share/tesseract-ocr";
}
}
and include:
#include <sys/stat.h>
And rebuild and reinstall tesseract-ocr:
cd tesseract-3.03
make
sudo make install
So, after that, if you have TESSDATA_PREFIX env variable, it will be loaded, and if you have tessdata folder with files in /usr/share/tesseract-ocr/ it will be loaded, otherwise directory with your python example module (./) will be checked for tessdata folder.
P.S. Take a look at the repo, by the way, there are already built deb package: https://bitbucket.org/3togo/python-tesseract/
Update:
If you have the next error message when importing tesseract module in Python:
Traceback (most recent call last):
File "test_module.py", line 5, in
from tesseract_ocr import TesseractOCR
File "/home/user/ocr/module/tesseract_ocr.py", line 9, in
import tesseract
File "/usr/local/lib/python2.7/dist-packages/python_tesseract-0.9-py2.7-linux-x86_64.egg/tesseract.py", line 28, in
_tesseract = swig_import_helper()
File "/usr/local/lib/python2.7/dist-packages/python_tesseract-0.9-py2.7-linux-x86_64.egg/tesseract.py", line 24, in swig_import_helper
_mod = imp.load_module('_tesseract', fp, pathname, description)
ImportError: /usr/local/lib/python2.7/dist-packages/python_tesseract-0.9-py2.7-linux-x86_64.egg/_tesseract.so: undefined symbol: cvSetData
Update:
If you have the next error message when importing tesseract module in Python:
Traceback (most recent call last):
File "test_module.py", line 5, in
from tesseract_ocr import TesseractOCR
File "/home/user/ocr/module/tesseract_ocr.py", line 9, in
import tesseract
File "/usr/local/lib/python2.7/dist-packages/python_tesseract-0.9-py2.7-linux-x86_64.egg/tesseract.py", line 28, in
_tesseract = swig_import_helper()
File "/usr/local/lib/python2.7/dist-packages/python_tesseract-0.9-py2.7-linux-x86_64.egg/tesseract.py", line 24, in swig_import_helper
_mod = imp.load_module('_tesseract', fp, pathname, description)
ImportError: /usr/local/lib/python2.7/dist-packages/python_tesseract-0.9-py2.7-linux-x86_64.egg/_tesseract.so: undefined symbol: cvSetData
Check that opencv library is linked in the _tesseract.so, because cvSetData is opencv's function.
ldd _tesseract.so | grep libopencv
If the output is empty try to build _tesseract.so using this command:
sudo c++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/tesseract_wrap.o build/temp.linux-x86_64-2.7/main.o -lstdc++ -ltesseract -llept -lopencv_superres -lopencv_video -lopencv_videostab -lopencv_ml -lopencv_ocl -lopencv_contrib -lopencv_flann -lopencv_calib3d -lopencv_imgproc -lopencv_core -lopencv_legacy -lopencv_stitching -lopencv_features2d -lopencv_photo -lopencv_ts -lopencv_objdetect -lopencv_highgui -lopencv_gpu -o build/lib.linux-x86_64-2.7/_tesseract.so
NB: The command python setup.py build must be executed before the above one, otherwise next errors will be printed:
c++: error: build/temp.linux-x86_64-2.7/tesseract_wrap.o: No such file or directory
c++: error: build/temp.linux-x86_64-2.7/main.o: No such file or directory
After successful build of _tesseract.so copy it to python-tesseract directory:
sudo cp ./build/lib.linux-x86_64-2.7/_tesseract.so .
And check again that opencv library is linked in the _tesseract.so.
ldd _tesseract.so | grep libopencv
Should be something like:
libopencv_core.so.2.4 => /usr/local/lib/libopencv_core.so.2.4 (0x000070d310313370)
Now install python-tesseract:
python setup.py install
After that the problem "undefined symbol: cvSetData" will be solved.
If you have the next error:
>>> import tesseract
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python2.7/dist-packages/tesseract.py", line 28, in
_tesseract = swig_import_helper()
File "/usr/lib/python2.7/dist-packages/tesseract.py", line 24, in swig_import_helper
_mod = imp.load_module('_tesseract', fp, pathname, description)
ImportError: /usr/lib/python2.7/dist-packages/_tesseract.x86_64-linux-gnu.so: undefined symbol: pixGenerateFlateData
Try to use old python-tesseract svn revision (e.g. 659 or 660).
https://code.google.com/p/python-tesseract/source/detail?r=659
Update:
I've added a new tutorial Installing tesseract for python on Ubuntu 15.10.
If you have the next error:
>>> import tesseract
Traceback (most recent call last):
File "
File "/usr/lib/python2.7/dist-packages/tesseract.py", line 28, in
_tesseract = swig_import_helper()
File "/usr/lib/python2.7/dist-packages/tesseract.py", line 24, in swig_import_helper
_mod = imp.load_module('_tesseract', fp, pathname, description)
ImportError: /usr/lib/python2.7/dist-packages/_tesseract.x86_64-linux-gnu.so: undefined symbol: pixGenerateFlateData
Try to use old python-tesseract svn revision (e.g. 659 or 660).
https://code.google.com/p/python-tesseract/source/detail?r=659
Update:
I've added a new tutorial Installing tesseract for python on Ubuntu 15.10.
Thanks VERY MUCH for this! Built this recently and still had to apply the fix for undefined symbol...
ReplyDeleteYou're Welcome!
DeleteThanks a million. Installation worked perfectly using your guide and later on when I got some nasty errors, this was my life saviour. Explains everything crystal clear. Thank you again!!!!!
ReplyDeleteI'm glad it helped :)
DeleteHi,
ReplyDeleteI have ubuntu 14.04 and am getting following error while trying to rebuild _tesseract.so as described by you. Can you please help me fix this. (I did run 'python setup.py build' before running the command to fix .so
surinder@suriubu:~/leptonica-1.71/tesseract-3.03/python-tesseract$ sudo c++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/tesseract_wrap.o build/temp.linux-x86_64-2.7/main.o -lstdc++ -ltesseract -llept -lopencv_superres -lopencv_video -lopencv_videostab -lopencv_ml -lopencv_ocl -lopencv_contrib -lopencv_flann -lopencv_calib3d -lopencv_imgproc -lopencv_core -lopencv_legacy -lopencv_stitching -lopencv_features2d -lopencv_photo -lopencv_ts -lopencv_objdetect -lopencv_highgui -lopencv_gpu -o build/lib.linux-x86_64-2.7/_tesseract.so
/usr/bin/ld: cannot find -lopencv_superres
/usr/bin/ld: cannot find -lopencv_videostab
/usr/bin/ld: cannot find -lopencv_ocl
/usr/bin/ld: cannot find -lopencv_contrib
/usr/bin/ld: cannot find -lopencv_legacy
/usr/bin/ld: cannot find -lopencv_objdetect
/usr/bin/ld: cannot find -lopencv_highgui
collect2: error: ld returned 1 exit status
I do have opencv on my machine as below :
surinder@suriubu:~$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>>
and my /usr/bin/ has an ld file of 1.1Mb as link to program
Please check that all opencv libs are installed.
Deletepkg-config --cflags --libs opencv
Try to use older python-tesseract svn revision (e.g. 659).
DeleteI solved the ld error with:
ReplyDeletesudo apt-get install opencv-superres-dev libopencv-videostab-dev libopencv-ocl-dev libopencv-contrib-dev libopencv-legacy-dev libopencv-objdetect-dev libopencv-highgui-dev
Hi,
ReplyDeletecould u help me with this?
import tesseract
File "/usr/local/lib/python2.7/dist-packages/python_tesseract-0.9.1-py2.7-linux-x86_64.egg/tesseract.py", line 28, in
_tesseract = swig_import_helper()
File "/usr/local/lib/python2.7/dist-packages/python_tesseract-0.9.1-py2.7-linux-x86_64.egg/tesseract.py", line 24, in swig_import_helper
_mod = imp.load_module('_tesseract', fp, pathname, description)
ImportError: /usr/local/lib/python2.7/dist-packages/python_tesseract-0.9.1-py2.7-linux-x86_64.egg/_tesseract.so: undefined symbol: _ZN9tesseract16TessTextRendererC1Ev
Also have this issue... Anyone found a solution? Thanks!
DeleteThis comment has been removed by the author.
DeleteHi, please take a look at my new tutorial "Installing tesseract for python on Ubuntu 15.10" - there I've solved the problem with TessTextRenderer: http://delimitry.blogspot.com/2016/02/installing-tesseract-for-python-on.html
DeleteI don't know who you are..
ReplyDeleteBut I will find you and hug you!
Thank you so much sir :)
Hello, when i try to run this line in my code tesseract.pixThresholdToBinary(pixImage, long(160)) the following occurs: TypeError: in method 'pixThresholdToBinary', argument 2 of type 'l_int32' anyone know how to fix this problem? Thanks! Manuel
ReplyDeleteHas anyone found a solution for:
ReplyDeleteImportError: /usr/local/lib/python2.7/dist-packages/python_tesseract-0.9.1-py2.7-linux-x86_64.egg/_tesseract.so: undefined symbol: _ZN9tesseract16TessTextRendererC1Ev
I'm using Tesseract 3.04, libleptonica 1.72. Thanks in advance!
Take a look at my new tutorial "Installing tesseract for python on Ubuntu 15.10" - there updated version with TessTextRenderer is used: http://delimitry.blogspot.com/2016/02/installing-tesseract-for-python-on.html
DeleteThanks for good tutorial, allthough I get stuck on the last error message in your guide:
ReplyDeleteTraceback (most recent call last):
File "test.py", line 2, in
import tesseract
File "/usr/local/lib/python2.7/dist-packages/python_tesseract-0.9.1-py2.7-linux-x86_64.egg/tesseract.py", line 28, in
_tesseract = swig_import_helper()
File "/usr/local/lib/python2.7/dist-packages/python_tesseract-0.9.1-py2.7-linux-x86_64.egg/tesseract.py", line 24, in swig_import_helper
_mod = imp.load_module('_tesseract', fp, pathname, description)
ImportError: /usr/local/lib/python2.7/dist-packages/python_tesseract-0.9.1-py2.7-linux-x86_64.egg/_tesseract.so: undefined symbol: pixGenerateFlateData
I have change the content in the file "/var/python-tesseract/allheader_mini.h" to the following which is in revision 659: https://code.google.com/p/python-tesseract/source/browse/trunk/src/allheaders_mini.h?spec=svn659&r=659
And also added the file to the same path the following file "/var/python-tesseract/allheader_mini_170.h": https://code.google.com/p/python-tesseract/source/browse/trunk/src/allheaders_mini_170.h?spec=svn659&r=659
Allthough the error still shows up. I have restarted server. What am I doing wrong?
BTW check my new tutorial "Installing tesseract for python on Ubuntu 15.10", may be it will help you: http://delimitry.blogspot.com/2016/02/installing-tesseract-for-python-on.html
Delete