Tesseract 3.0.2

Program Specifications

Download Tesseract
Version: 3.0.2
Size: 12.90 MB
Publisher: Ray Smith
Date Added:
License [?]: Open Source
Operating System: iOS, Windows XP, Windows Vista, Windows 2008, Windows 7, Unix, Mac OS X, Linux, Android
Requirements:
Download Links: Download Tesseract
BumperSoft Editor's Review Status:

Publisher's Description of Tesseract

" Most accurate open source OCR engine available. It can read a wide variety of image formats and convert them to text in over 60 languages. "
- From Ray Smith

Tesseract is probably the most accurate open source OCR engine available. Combined with the Leptonica Image Processing Library it can read a wide variety of image formats and convert them to text in over 60 languages. It was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but since then it has been improved extensively by Google.

It is released under the Apache License 2.0. It can be used directly, or (for programmers) using an API.

Tesseract works on Linux, Windows (with VC++ Express or CygWin) and Mac OSX. It can also be compiled for other platforms, including Android and the iPhone, though these are not as well tested platforms.

There are two parts to install, the engine itself, and the training data for a language.

Linux
Tesseract is available directly from many Linux distributions. The package is generally called 'tesseract' or 'tesseract-ocr' - search your distribution's repositories to find it. Packages are also generally available for language training data (search the repositories,) but if not you will need to download the appropriate training data, unpack it, and copy the .traineddata file into the 'tessdata' directory, probably /usr/share/tesseract-ocr/tessdata or /usr/share/tessdata.

If Tesseract isn't available for your distribution, or you want to use a newer version than is available, you can compile your own. Note that older versions of Tesseract only supported processing .tiff files.

Mac OS X
The easiest way to install Tesseract is through homebrew. Once homebrew is installed, you can install Tesseract by running the command: brew install tesseract.

If you want to use language training data not included with the homebrew package, download the appropriate training data, open it with Finder, and copy the .traineddata file into the /usr/local/Cellar/tesseract//share/tessdata directory.

Windows
An installer is available for Windows from our download page. This includes the English training data.

If you want to use another language, download the appropriate training data, unpack it using 7-zip, and copy the .traineddata file into the 'tessdata' directory, probably C:\Program Files\Tesseract OCR\tessdata.

Running Tesseract
Tesseract doesn't have a built-in GUI, but there are several available from the 3rdParty page.

Tesseract is a command-line program, so first open a terminal or command prompt. The command is used like this:

tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

So basic usage to do OCR on an image called 'myscan.png' and save the result to 'out.txt' would be:

tesseract myscan.png out

Or to do the same with German:

tesseract myscan.png out -l deu

Share Tesseract with Friends


Tags

opensource ocr   |  tesseract ocr   |  ocr   |  image processing   |  opensource   |  ocr engine   |  optical character recognition   

RELATED DOWNLOADS
(Scanning and Optical Character Recognition)

Capturix ScanShare 2.05.387
Share a scanner over the network and use it also as a copy machine.
Screen OCR 6.7
Recognizes text from dialogboxes, protected web pages, flash, PDF and more.
WEEK'S TOP DOWNLOADS
(All Categories)

1. FastSum Standard Edition
2. FastSum
3. PDF Split and Merge (PDFsam)
4. S14-Task Schedulers
5. S11-Floor Scheduler
6. S3-Template Schedulers
7. Macromedia FreeHand MX
8. PdfGrabber
9. S15-Easy Shift Schedulers
10. Sweet Home 3D

More information