Library archive software OCR image zoom

Search for a word in newspaper

- only on current page

The autosuggest search field above takes the data from any XML or HTML files which contain OCR data. The data is loaded over /axZm/zoomLoadOCR.php and returns JSON data suited to this example.

Currently zoomLoadOCR.php supports two schemes: "hOCR" and "ALTO". However it can be easily extended to process any other structured OCR data, whereby the data could be also taken from a database and not necessarily flat files.

For "hOCR" you could use a "free" Apache 2.0 licensed tesseract-ocr sofware (ver. 3.0+) to process your images containing text and retrieve positions of the words in "hOCR" format which are then instantly saved in html files.

At ajax-zoom.com this example is based on images courtesy of "Bibliothèque royale de Belgique" and the OCR data scheme is "ALTO". In the download package the OCR data is "hOCR" made with "tesseract".

Please note that unlike in most other examples several transitions are either disabled or accelerated here. Ofcourse these and many other settings can be simply adjusted in the config file "/axZm/zoomConfigCustom.inc.php" after elseif ($_GET['example'] == 'ocr'){

In general this example can be used as a basis for more sophisticated applications extendable with AJAX-ZOOM API and other scripts. On request any customizing or integration task can be partly or fully done by AJAX-ZOOM team.

Test: load different content (images) and ocr data without reloading the page.

[Last updated: 2015-03-02]