When you scan a document that has text or numeric data on it, you are able to read and understand what is written in the scanned image. However, to a computer, the resulting image file is just as meaningless an assortment of pixels as a landscape photo. In order to transform this information into an editable format that you can search through, copy, and modify without retyping it manually, you will need the an Optical Character Recognition (OCR) software.

There is a wide variety of OCR software available. While they all share the ability to convert images of machine printed (not handwritten) text or numbers into an editable format, the various software often have different features, accuracy, prices, and language options.

Our OCR Software Guide and Comparison Chart explain the differences between the assortment of software available, as well as offer our recommendation for the best overall software when it comes to converting English documents. However, there is also a difference in the number and selection of languages that the various software can convert. Below, you will find a list of languages that our top three choices in software are able to convert, with the languages that have dictionary support marked in italics.

Some language groups are more recent additions to the OCR scene. Among these are Arabic scripts, including Hebrew, and Asian characters, such as Chinese. While not all software support them out of the box, they are slowly being integrated, first as add-ons to the base software and eventually as part of the default language selection.

SimpleSoftware OCR engines are using two different systems for language support. In the end languages supported by your OCR is based on your basic version of SimpleIndex installed, any addons (SimpleIndex Server, SimpleCoversheet, and so on) do not add any additional language support.

All SimpleSoftware products have Tesseract 5 OCR languages support. You can learn more about it and download additional language libraries here. And you can check and add more OCR languages libraries supported with Tesseract on your station here:

C:\Program Files (x86)\SimpleIndex\Tesseract\v5.3.0\

SimpleIndex Pro and SimpleIndex OCR are using FineReader engine. It has one of the largest libraries of supported OCR languages. You can check OCR languages supported with FineReader on your station here:

C:\Program Files (x86)\SimpleIndex\OCRLanguages.txt

ABBYY Vantage RPA Machine Learning OCR Data Capture

Abkhaz
Adyghe
Afrikaans
Agul
Albanian
Altaic
Arabic (Saudi Arabia)
Armenian (Eastern)
Armenian (Grabar)
Armenian (Western)

Avar
Aymara
Azeri (Cyrillic)
Azeri (Latin)
Bashkir

Basic
Basque
Belarusian
Bemba
Blackfoot
Breton
Bugotu
Bulgarian
Buryat
C/C++
Catalan
Cebuano
Chamorro
Chechen
Chinese Simplified
Chinese Traditional
Chukchee
Chuvash
COBOL
Corsican
Crimean Tatar
Croatian
Crow
Czech
Dakota
Danish
Dargwa
Dungan
Dutch (Belgian)
Dutch
English

Eskimo (Cyrillic)
Eskimo (Latin)
Esperanto
Estonian
Even
Evenki
Faroese
Fijian
Finnish
Fortran
French
Frisian
Friulian
Gagauz
Galician
Ganda
German (Luxembourg)
German (new spelling)
German
Greek

Guarani
Hani
Hausa
Hawaiian
Hebrew
Hungarian

Icelandic
Ido
Indonesian
Ingush
Interlingua
Irish
Italian
Japanese

Java
Jingpo
Kabardian
Kalmyk
Karachay-Balkar
Karakalpak
Kasub
Kawa
Kazakh
Khakass
Khanty
Kikuyu
Kirghiz
Kongo
Korean (Hangul)
Korean

Koryak
Kpelle
Kumyk
Kurdish
Lak
Latin
Latvian

Lezgi
Lithuanian
Luba
Macedonian
Malagasy
Malay
Malinke
Maltese
Mansi
Maori
Mari
Maya
Miao
Minangkabau
Mohawk
Moldavian
Mongol
Mordvin
Nahuatl
Nenets
Nivkh
Nogay
Norwegian (Bokmal)
Norwegian (Nynorsk)

Nyanja
Occidental
Occitan
Ojibway
Ossetian
Papiamento
Pascal
Polish
Portuguese (Brazil)
Portuguese

Quechua
Rhaeto-Romance
Romanian
Romany
Rundi
Russian (old spelling)
Russian
Russian with accents

Rwanda
Sami (Lappish)
Samoan
Scottish Gaelic
Selkup
Serbian (Cyrillic, Latin)
Shona
Simple chemical formulas
Slovak
Slovenian

Somali
Sorbian
Sotho
Spanish
Sunda
Swahili
Swazi
Swedish
Tabasaran
Tagalog
Tahitian
Tajik
Tatar
Thai

Tok Pisin
Tongan
Tswana
Tun
Turkish
Turkmen (Cyrillic)
Turkmen (Latin)
Tuvinian
Udmurt
Uighur (Cyrillic, Latin)
Ukrainian
Uzbek (Cyrillic, Latin)
Vietnamese
Welsh
Wolof
Xhosa
Yakut
Yiddish
Zapotec
Zulu

Italics signify dictionary support.

IRIS Powerscan OCR Server

Afaan Oromo
Afrikaans
Albanian
Arabic (PC Only)
Asturian
Aymara
Azeri (Latin)
Balinese
Basque
Bemba
Bikol
Bislama
Bosnian (Cyrillic)
Bosnian (Latin)
Brazilian
Breton
Bulgarian
Bulgarian-English
Byelorussian
Byelorussian-English
Catalan
Cebuano
Chamorro
Chinese (Simplified)
Chinese (Traditional)
Corsican
Croatian
Czech
Danish
Dutch
English (UK)
English (USA)
Esperanto
Estonian
Faroese
Farsi (PC Only)
Fijian
Finnish
French
Frisian
Friulian
Galician
Ganda
German
German (Switzerland)
Greek
Greek-English
Greenlandic
HaitianCreole
Hani
Hebrew
Hiligaynon
Hungarian
Icelandic
Ido
Ilocano
Indonesian
Interlingua
Irish (Gaelic)
Italian
Japanese
Javanese
Kapampangan
Kazakh (PC Only)
Kicongo
Kinyarwanda
Korean
Kurdish
Latin
Latvian
Lithuanian
Luba
Luxemburg
Macedonian
Macedonian-English
Madurese
Malagasy
Malay
Manx (Gaelic)
Maori
Mayan
Mexican
Minangkabau
Moldovan
Mongolian (Cyrillic) (PC Only)
Nahuatl
Norwegian
Numeric
Nyanja
Nynorsk
Occitan
Papiamento
PidginEnglish (Nigeria)
Polish
Portuguese
Quechua
Rhaeto-Roman
Romanian
Rundi
Russian
Russian-English
Samoan
Sardinian
Scottish (Gaelic)
Serbian
Serbian (Latin)
Serbian-English
Shona
Slovak
Slovenian
Somali
Sotho
Spanish
Sundanese
Swahili
Swedish
Tagalog
Tahitian
Tatar (Latin)
Tetum
TokPisin
Tonga
Tswana
Turkish
Turkmen (Latin)
Ukrainian
Ukrainian-English
Uzbek
Waray
Welsh
Wolof
Xhosa
Zapotec
Zulu

Afrikaans
Albanian
Aymara
Basque
Bemba
Blackfoot
Breton
Bugotu
Bulgarian
Byelorussian
Catalan
Chamorro
Chechen
Chinese (Simplified)
Chinese (Traditional)
Corsican
Croatian
Crow
Czech
Danish
Dutch
English
Esperanto

Estonian
Faroese
Fijian
Finnish
French

Frisian
Friulian
Gaelic (Irish)
Gaelic (Scottish)
Galician
Ganda/Luganda
German
Greek

Guarani
Hani
Hawaiian
Hungarian
Icelandic
Ido
Indonesian
Interlingua
Inuit
Italian
Japanese
Kabardian
Kasub
Kikuyu
Kongo
Korean
Kpelle
Kurdish
Latin
Latvian
Lituanian
Luba
Luxembourgian
Macedonian
Malagasy
Malay
Malinke
Maltese
Maori
Mayan
Miao
Minankabaw
Mohawk
Moldavian
Nahuatl
Norwegian
Nyanja
Occidental
Ojibway
Papiamento
Pidgin English
Polish
Portuguese
Portuguese (Brazilian)

Provencal
Quechua
Rhaetic
Romanian
Romany
Ruanda
Rundi
Russian
Sami
Sami Lule
Sami Northern
Sami Southern
Samoan
Sardinian
Serbian (Cyrillic)
Serbian (Latin)
Shona
Sioux
Slovak
Slovenian
Somali
Sorbian
Sotho
Spanish
Sundanese
Swahili
Swazi
Swedish
Tagalog
Tahitian
Tongan
Tswana
Tun
Turkish
Ukranian
Visayan
Wa
Welsh
Wolof
Xhosa
Zapotec
Zulu

Italics signify dictionary support.

Simple Software Document Scanning and OCR Software
Afrikaans
Amharic
Arabic
Assamese
Azerbaijani
Azerbaijani – Cyrillic
Belarusian
Bengali
Tibetan
Bosnian
Bulgarian
Catalan; Valencian
Cebuano
Czech
Chinese – Simplified
Chinese – Traditional
Cherokee
Welsh
Danish
German
Dzongkha
Greek, Modern (1453-)
English
English, Middle (1100-1500)
Esperanto
Estonian
Basque
Persian
Finnish
French
German Fraktur
French, Middle (ca. 1400-1600)
Irish
Galician
Greek, Ancient (-1453)
Gujarati
Haitian; Haitian Creole
Hebrew
Hindi
Croatian
Hungarian
Inuktitut
Indonesian
Icelandic
Italian
Italian – Old
Javanese
Japanese
Kannada
Georgian
Georgian – Old
Kazakh
Central Khmer
Kirghiz; Kyrgyz
Korean
Kurdish
Lao
Latin
Latvian
Lithuanian
Malayalam
Marathi
Macedonian
Maltese
Malay
Burmese
Nepali
Dutch; Flemish
Norwegian
Oriya
Panjabi; Punjabi
Polish
Portuguese
Pushto; Pashto
Romanian; Moldavian; Moldovan
Russian
Sanskrit
Sinhala; Sinhalese
Slovak
Slovenian
Spanish; Castilian
Spanish; Castilian – Old
Albanian
Serbian
Serbian – Latin
Swahili
Swedish
Syriac
Tamil
Telugu
Tajik
Tagalog
Thai
Tigrinya
Turkish
Uighur; Uyghur
Ukrainian
Urdu
Uzbek
Uzbek – Cyrillic
Vietnamese
Yiddish