Tesseract engine
With the Tesseract OCR engine, the pdf, pdfa, and txt formats are supported.
Supported languages: Afrikaans (afr), Albanian (sqi), Azerbaijani (aze), Belarusian (bel), Bosnian (bos), Breton (bre), Bulgarian (bul), Catalan (cat), Cebuano (ceb), Corsican (cos), Croatian (hrv), Czech (ces), Danish (dan), Dutch/Flemish (ndl), English (eng), English Middle 1100-1500 (enm), Esperanto (epo), Estonian (est), Faroese (fao), Filipino (fil), Finnish (fin), French (fra), Gaelic (gla), Galician (glg), German (deu), Haitian (hat), Hebrew (heb), Hungarian (hun), Icelandic (ici), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kyrgyz (kir), Latin (lat), Latvian (lav), Lithuanian (lit), Macedonian (mkd), Malay (msa), Maltese (mlt), Maori (mri), Norwegian (nor), Occitan (oci), Polish (pol), Portuguese (por), Quechua (que), Romanian/Moldovan (ron), Russian (rus), Serbian (srp), Serbian Latin (srp_latn), Slovak (slk), Slovenian (slv), Spanish (spa), Sundanese (sun), Swahili (swa), Swedish (swe), Tajik (tgk), Tonga (ton), Turkish (tur), Ukrainian (ukr), Uzbek (uzb), Uzbek Cyrlic (uzb_cyrl), Vietnamese (vie), Welsh (cym), Western Frisian (fry), Yoruba (yor), Азəрбајҹан, ქართული ენა - Georgian.
Selecting multiple languages will take much more time to process the files.
For further information about the engine, see the dedicated documentation from its developer.