Skip to main content
Skip table of contents

Tesseract engine

With the Tesseract OCR engine, the pdf, pdfa, and txt formats are supported.

Supported languages: Afrikaans (afr), Albanian (sqi), Azerbaijani (aze), Belarusian (bel), Bosnian (bos), Breton (bre), Bulgarian (bul), Catalan (cat), Cebuano (ceb), Corsican (cos), Croatian (hrv), Czech (ces), Danish (dan), Dutch/Flemish (ndl), English (eng), English Middle 1100-1500 (enm), Esperanto (epo), Estonian (est), Faroese (fao), Filipino (fil), Finnish (fin), French (fra), Gaelic (gla), Galician (glg), German (deu), Haitian (hat), Hebrew (heb), Hungarian (hun), Icelandic (ici), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kyrgyz (kir), Latin (lat), Latvian (lav), Lithuanian (lit), Macedonian (mkd), Malay (msa), Maltese (mlt), Maori (mri), Norwegian (nor), Occitan (oci), Polish (pol), Portuguese (por), Quechua (que), Romanian/Moldovan (ron), Russian (rus), Serbian (srp), Serbian Latin (srp_latn), Slovak (slk), Slovenian (slv), Spanish (spa), Sundanese (sun), Swahili (swa), Swedish (swe), Tajik (tgk), Tonga (ton), Turkish (tur), Ukrainian (ukr), Uzbek (uzb), Uzbek Cyrlic (uzb_cyrl), Vietnamese (vie), Welsh (cym), Western Frisian (fry), Yoruba (yor), Азəрбајҹан, ქართული ენა - Georgian.

Selecting multiple languages will take much more time to process the files.

For further information about the engine, see the dedicated documentation from its developer.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.