eSpeak Speech Synthesizer

3. LANGUAGES

Help Needed

Many of these are just experimental attempts at these languages, produced after a quick reading of the corresponding article on wikipedia.org. They will need work or advice from native speakers to improve them. Please contact me if you want to advise or assist with these or other languages.

The sound of some phonemes may be poorly implemented, particularly [r] since I'm English and therefore unable to make a "proper" [r] sound.

A major factor is the rhythm or cadance. An Italian speaker told me the Italian voice improved from "difficult to understand" to "good" by changing the relative length of stressed syllables. Identifying unstressed function words in the xx_list file is also important to make the speech flow well. See Adding or Improving a Language

Character sets

Languages recognise text either as UTF8 or alternatively in an 8-bit character set which is appropriate for that language. For example, for Polish this is Latin2, for Russian it is KOI8-R. This choice can be overridden by a line in the voices file to specify an ISO 8859 character set, eg. for Russian the line:

     charset 5

will mean that ISO 8859-5 is used as the 8-bit character set rather than KOI8-R.

In the case of a language which uses a non-Latin character set (eg. Greek or Russian) if the text contains a word with Latin characters then that particular word will be pronounced using English pronunciation rules and English phonemes. Speaking entirely English text using a Greek or Russian voice will sound OK, but each word is spoken separately so it won't flow properly.

Sample texts in various languages can be found at http://<language>.wikipedia.org and www.gutenberg.org

3.1 Voice Files

A number of Voice files are provided in the espeak-data/voices directory. You can select one of these with the -v <voice filename> parameter to the speak command, eg:

   espeak -vaf

to speak using the Afrikaans voice.

Language voices generally start with the 2 letter ISO 639-1 code for the language. If the language does not have an ISO 639-1 code, then the 3 letter ISO 639-3 code can be used.

For details of the voice files see Voices.

Default Voice

default: This voice is used if none is specified in the speak command. Copy your preferred voice to "default" so you can use the speak command without the need to specify a voice.

3.2 English Voices

en: is the standard default English voice.
en-us: American English.
en-sc: English with a Scottish accent.
en-n en-rp en-wm: are different English voices. These can be considered caricatures of various British accents: Northern, Received Pronunciation, West Midlands respectively.

3.3 Voice Variants

To make alternative voices for a language, you can make additional voice files in espeak-data/voices which contains commands to change various voice and pronunciation attributes. See voices.html.

Alternatively there are some preset voice variants which can be applied to any of the language voices, by appending + and a variant name. Their effects are defined by files in espeak-data/voices/!v.

The variants are +m1 +m2 +m3 +m4 +m5 +m6 +m7 for male voices, +f1 +f2 +f3 +f4 +f5 for female voices, and +croak +whisper for other effects. For example:

   espeak -ven+m3

The available voice variants can be listed with:

   espeak --voices=variant

3.4 Other Languages

The eSpeak speech synthesizer does text to speech for the following additional langauges.

af Afrikaans: This has been worked on by native speakers and it should be OK.
bs Bosnian: Usable, but I'm unsure whether wrong stressed syllables are a problem. It accepts both Latin and Cyrillic characters. This voice is similar to sr Serbian and hr Croatian
ca Catalan
cs Czech: Usable.
da Danish: Usable.
de German: This has improved from easlier versions. A problem is stress placement (which like English is irregular), prosody, and the use of compound words where correct detection of the sub-word boundaries would probably be needed for accurate pronunciation.
el Greek: Stress position is marked in text and spelling is fairly regular, so it shouldn't be too bad. It uses a different alphabet and switches to English pronunciation for words which contain Latin characters a-z.
eo Esperanto: Esperanto has simple and regular pronunciation rules, so it should be OK.
es Spanish: Spanish has good spelling rules, so it should be OK.
es-la Spanish - Latin America: This contains a few changes from es, notably the pronunciation of "z","ce","ci".
fi Finnish: This has had assistance from native speakers and should be usable.
fr French: This has been improved by a native speaker, and should be OK.
hr Croatian: Usable, but I'm unsure whether wrong stressed syllables are a problem. It accepts both Latin and Cyrillic characters. This voice is similar to sr Serbian and bs Bosnian
hu Hungarian: This has had assistance from a native speaker and it should be OK.
it Italian: This has had some feedback from a native speaker but more work is needed. Spelling is fairly regular, but stress marks and vowel accents are often omitted from text, so for some words the dictionary/exceptions list will need to determine the stress position or whether to use open/close [e] or [E] and [o] or [O].
kn Kannada: Not much feedback yet, but I'm told that it sounds reasonable.
ku Kurdish: Not much work yet, but Kurdish has good spelling rules so it should be OK.
lv Latvian: This has had assistance from a native speaker and it should be OK.
nl Dutch: Needs improvement of the spelling-to-phoneme rules.
pl Polish: Usable.
pt Portuguese (Brazil): Brazilian Portuguese. This has had assistance from a native speaker and it should be OK. Like Italian there is further work to do about the ambiguity in the spelling between open/close "e" and "o" vowels.
pt-pt Portuguese (European)
ro Romanian: Probably OK. More work is needed to improve the position of stress within words.
sk Slovak: This has had assistance from a native speaker, so it should be OK.
sr Serbian: Usable. Wrong stressed syllables may be a problem. It accepts both Latin and Cyrillic characters. This voice is similar to hr Croatian and bs Bosnian
sv Swedish: This has now had some work done on the pronunciation rules, so it should be useable.
sw Swahihi: Not much feedback yet, but the spelling and stress rules are fairly regular, so it's probably usable.
ta Tamil: This has had assistance from a native speaker, so it should be OK.
tr Turkish: Not much work yet, but I'm told it sounds reasonable.
zh Mandarin Chinese: This speaks Pinyin text and Chinese characters. There is only a simple one-to-one translation of Chinese characters to a single Pinyin pronunciation. There is no attempt yet at recognising different pronunciations of Chinese characters in context, or of recognising sequences of characters as "words". The eSpeak installation includes a basic set of Chinese characters. More are available in an additional data file for Mandarin Chinese at: http://espeak.sourceforge.net/data/.

3.5 Provisional Languages

These languages are only initial naive implementations which have had little or no feedback and improvement from native speakers.

cy Welsh: An initial guess, awaiting feedback.
grc Ancient Greek: Includes a short pause between words to help understanding.
hi Hindi: This is interesting because it uses the Devanagari characters. I'm not sure about Hindi stress rules, and I expect the sound of aspirated/unaspirated consonant pairs needs improvement.
hy Armenian: Needs feedback from native speakers. The hy-west voice has different pronunciation of some consonants for Western Armenian pronunciation.
id Indonesian: An initial guess, no feedback yet.
is Icelandic: An initial guess, awaiting feedback.
jbo Lojban: An artificial language.
ka Georgian: An initial guess, awaiting feedback.
la Latin: Stress rules are implemented, but it needs text where long vowels are marked with macrons.
mk Macedonian: This is similar to hr Croatian, so it's probably usable. It accepts both Latin and Cyrillic characters.
no Norwegian: An initial guess, awaiting feedback.
ru Russian: So far it's just an initial attempt with basic pronunciation rules. Work is needed especially on the consonants. Russian has two versions of most consonants, "hard" and "soft" (palatalised) and in most cases eSpeak doesn't yet make a proper distinction.
Russian stress position is unpredictable so a large lookup dictionary is needed of those words where eSpeak doesn't guess correctly. To avoid increasing the size of the basic eSpeak package, this is available separately at: http://espeak.sourceforge.net/data/
sq Albanian: Some initial feedback, but needs more work.
vi Vietnamese: This is interesting because it's a tone language. I don't know how it should sound, so it's just a guess and I need feedback.
zh-yue Cantonese Chinese: Just a naive simple one-to-one translation from single Simplified Chinese characters to phonetic equivalents in Cantonese. There is limited attempt at disambiguation, grouping characters into words, or adjusting tones according to their surrounding syllables. This voice needs Chinese character to phonetic translation data, which is available as a separate download for Cantonese at: http://espeak.sourceforge.net/data/.
The voice can also read Jyutping romanised text.

3.6 Mbrola Voices

Some additional voices, whose name start with mb- (for example mb-en1) use eSpeak as a front-end to Mbrola diphone voices. eSpeak does the spelling-to-phoneme translation and intonation. See mbrola.html.