Home » Graphics and Drawing » Scanning and Optical Character Recognition » Tesseract

Tesseract 3.0.2

Program Specifications

Version:

3.0.2

Size:

12.90 MB

Publisher:

Ray Smith

Date Added:

Nov 20, 2013

License [?]:

Open Source

Operating System:

iOS, Windows XP, Windows Vista, Windows 2008, Windows 7, Unix, Mac OS X, Linux, Android

Requirements:

Download Links:

Download Tesseract

BumperSoft Editor's Review Status:

Publisher's Description of Tesseract

" Most accurate open source OCR engine available. It can read a wide variety of image formats and convert them to text in over 60 languages. "
- From Ray Smith

Tesseract is probably the most accurate open source OCR engine available. Combined with the Leptonica Image Processing Library it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.

It is released under the Apache License 2.0. It can be used directly, or (for programmers) using an API.

Tesseract works on Linux, Windows (with VC++ Express or CygWin) and Mac OSX. It can also be compiled for other platforms, including Android and the iPhone, though these are not as well tested platforms.

There are two parts to install, the engine itself, and the training data for a language.

Linux
Tesseract is available directly from many Linux distributions. The package is generally called 'tesseract' or 'tesseract-ocr' - search your distribution's repositories to find it. Packages are also generally available for language training data (search the repositories,) but if not you will need to download the appropriate training data, unpack it, and copy the .traineddata file into the 'tessdata' directory, probably /usr/share/tesseract-ocr/tessdata or /usr/share/tessdata.

If Tesseract isn't available for your distribution, or you want to use a newer version than is available, you can compile your own. Note that older versions of Tesseract only supported processing .tiff files.

Mac OS X
The easiest way to install Tesseract is through homebrew. Once homebrew is installed, you can install Tesseract by running the command: brew install tesseract.

If you want to use language training data not included with the homebrew package, download the appropriate training data, open it with Finder, and copy the .traineddata file into the /usr/local/Cellar/tesseract//share/tessdata directory.

Windows
An installer is available for Windows from our download page. This includes the English training data.

If you want to use another language, download the appropriate training data, unpack it using 7-zip, and copy the .traineddata file into the 'tessdata' directory, probably C:\Program Files\Tesseract OCR\tessdata.

Running Tesseract
Tesseract doesn't have a built-in GUI, but there are several available from the 3rdParty page.

Tesseract is a command-line program, so first open a terminal or command prompt. The command is used like this:

tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

So basic usage to do OCR on an image called 'myscan.png' and save the result to 'out.txt' would be:

tesseract myscan.png out

Or to do the same with German:

tesseract myscan.png out -l deu

Tesseract 3.0.2

Program Specifications

Publisher's Description of Tesseract

Share Tesseract with Friends

Tags

More information
Download Help Submit Software Newsletter About BumperSoft Contact Us RSS Privacy Policy Terms of Use Disclaimer