Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable text. It is used to convert paper books and documents into electronic files.
Lime OCR is build with tessearact-ocr which is an OCR Engine that was developed at HP Labs between 1985 and 1995, and now at Google. Lime OCR was initially developed for internal use of Lime Consultants, and now published under GNU General Public License v3.
Lime OCR is free, simple to use and currently supports 29 languages, and support all tesseract-ocr trained data files. It is fully UTF8 capable, and is fully trainable using tesseract-ocr. Lime OCR is evolved from Tesseract-GUI by Juan Ramon Castan, so it includes all features of Tesseract-GUI, which is a Linux software.
As like Tesseract-GUI, Lime OCR is not a front-end for tesseract-ocr. It is just a graphical way to use it with simple image manipulation through ImageMagick.
Some features are:
- Image formats: Supports over 50 types of images.
- PDF Input Support: OCR Scanned PDF files with PDF add-on.
- Auto-index: Process lots of images and index and rename all output text files automatically.
- Rotate: To correct the angle of images after scan and before covert them.
- Crop: Convert just an area of the image. You can use it to create columns.
- Normalize: Try to do the best contrast of images before convert them.
- Generalize: Apply the changes to every image.
- Concatenate: Now you can create a only final text file with this option.
- More Languages: New languages will be automatically added to GUI.