Basics of Russian lyric diction and guide to Russian IPA


There are six vowel sounds in Russian. Most of them are the same as in Italian:

[i] би́рка birra
[ɛ] э́ра era (imp., essere)
[ɑ] па́з casa
[ɔ] со́рт sorte
[u] бу́рый burro
[ɨ] бы́ть (no match)

The only sound that doesn’t exist in standard languages is [ɨ]. It is a common misconception (especially among English native speakers) that this sound is very dark, low, and is formed very far back in the mouth. In reality, it is a high vowel and the singer should always keep that in mind. It is barely lower than [i] (see the vowel trapezoid). In singing, this sound should always have the same resonance as [i].

One way to find this sound is the following. Say a long [i] in a comfortable range, then move the tongue to [u] without rounding the lips. It is very important to keep the lips neutral. After this is accomplished without any tension in any of the muscles, move the tongue again about half-way toward [i]. That will give you the [ɨ] sound.

Remember that [ɨ] is based on [i] and should be as resonant.

Just like any other vowel, this one should always feel comfortable in singing. If it feels somehow artificial, if it falls too far from the rest of the vowels or is not as resonant, then it should be adjusted. Once again, position for [ɨ] is not far from [i].

Double and triple vowels

In some cases, there is more than one vowel in a row.

Any number of vowels in a row should be treated the same way as in Italian. They should be connected and sung legato. No additional sound or a stop should be inserted between them.1

Two of the same vowels in a row should be treated as one.


Just like the vowels, most Russian consonants are the same as in the standard languages:

[b] баня It. bagno
[p] порт It. porto
[d] дать It. dare
[t] тост It. tosto
[ɡ] гараж It. garage
[k] кухня It. cucina
[ʟ] лом En. loam
[m] музей It. museo
[n] нос It. nostro
[r] роза It. rosa
[z] звон It. svolto
[ʒ] жанр Fr. genre
[s] сон It. sogno
[ʃ] шанс It. chance
[ᶋː] щит (no match)
[v] воля It. voglia
[f] форма It. forma
[x] хунта Sp. junta
[ɣ] всех зову (no match)
[t͜s] пицца It. pizza
[t͜ʃ] черти It. certo
[j] паёк It. pajolo

It is important to note that the plosive consonants [p], [t], and [k] are unaspirated, just like in Italian or French.

The only three Russian consonants that don’t exist in Italian, French, or German are [ᶋː], [x], and [ɣ].

[ʃ] and [ᶋː]

There should be a clear distinction between [ʃ] and [ᶋː]. [ʃ] is somewhat lower than the English consonant in she or the French consonant in chaud. A good way to find the correct Russian sound is for example to sustain the consonant in she and to lower the tip of the tongue slightly away from the soft palate.

[ᶋː] is the palatalized counterpart of [ʃ] (on palatalization, see below). It is a much more energetic sound similar to the initial consonant in Italian sciagurato, especially when said with expression. Just like in Italian, it is always long.

[x] and [ɣ]

Another Russian consonant unfamiliar from the standard languages is [x]. It is the same as in Spanish in jugo, junta etc. It is very tempting for most singers to substitute it with a German ich-laut, especially in final position in words like белых [ˈᶀeʟɪx], смех [sᶆex], лемех [ˈᶅeᶆex], and even more so when there is a palatalized [ᶍ] in the same word: тихих [ˈƫiᶍix].

The position of the tongue for [x] is the same as for [k]. A good way to find it is to form a [k] and let the air go between the palate and the tongue, slightly releasing the closure (“fricative [k]”).

In comparison to German ach-laut, the Russian [x] has much less friction. It is formed less deep in the mouth and never has any vibration. It is closer to aspirated [h], although it has considerably more noise.

In singing, [x] is usually quite short and shouldn’t be overthought and overdone. However, in the final position it should be audible and have a sufficient length to be clearly heard.

The sound [ɣ] is the voiced version of [x]. It is formed the same way as [x], except the vocal cords are vibrating. It is rather rare in modern Russian and occurs mostly in assimilations between words to facilitate smooth transition between the consonants and to avoid shadow vowels.

Первых дней [ˈᶈervɨɣ ͜ ˈᶁᶇej] (of the first days)
В этих звуках [ˈvɛƫiɣ ͜ ˈzvukɑx] (in these sounds)
Всех зову [ˈfᶊeɣ ͜ zɑˈvu] (I call everyone)
Зловещих дум [zʟɑˈᶌeᶋːiɣ ͜ ˈdum] (of fateful thoughts)


Palatalization is a very important feature in Russian. Most Russian consonants form pairs of non-palatalized (also called bright) and palatalized (dark) sounds. Palatalization is indicated with a hook underneath the symbol2:

ᶀ ᶈ ᶁ ƫ ᶃ ᶄ ᶆ ᶇ ᶉ ᶎ ᶌ ᶂ ᶍ

It may seem like there is a lot of new consonants, but that is not true. The only difference in articulation between a "normal" and a palatalized consonant is the position of the front of the tongue, while everything else remains the same.

In fact, palatalization occurs in English as well. Compare the [s] in sit and seat, [b] in bin and bean, etc. Note how the front of the tongue is raised when the consonant occurs before a higher vowel. In English, palatalization of consonants is not used to distinguish different words, that's why most English speakers don't notice it. In Russian it can alter the meaning of the word, and therefore it is very important to distinguish between palatalized and non-palatalized consonants.

In articulation of palatalized consonants, the front of the tongue is raised toward the hard palate to create an [i] shape in addition to the main articulation of the consonant. It accompanies the main articulation without otherwise changing it. The two shapes create a single merged palatalized consonant. For more details, see Avanesov, 1984, pp. 63-64, and Jones & Ward, 1969, pp. 81-83.

Palatalization before a front vowel

In most cases, palatalization occurs when a consonant is followed by a front vowel ([i] or [e]). In such cases it occurs naturally. The speaker needs only to pronounce the consonant in the position of the vowel:

лисица [ᶅiˈᶊit͜sɑ] (a fox)
берег [ˈᶀeᶉek] (a shore)
перепел [ˈᶈeᶉeᶈeʟ] (a quail)
прежний [ˈpᶉeʒᶇij] (former)
телесный [ƫeˈᶅesnɨj] (corporeal)

Palatalization before other vowels

However, in about 13% of all consonant+vowel combinations, a palatalized consonant is followed by a non-front vowel. In that case, a palatalized (front and high) consonant differs in tongue position from the following low or back vowel. One needs to pay much attention to these occurrences.

In such positions, to an English native speaker a palatalized consonant sounds like a combination of a “default” consonant and a separate j-glide. If this is reproduced in speaking or singing, instead of changing the shape of making the consonants high and bright, additional j-glides are added. For example, instead of [ᶉɑ] with a palatalized [ᶉ], [rjɑ] is pronounced. This is probably the most distinct feature of a strong English accent.

люблю [ᶅubˈᶅu] and not [ʟjubˈʟju] (I love)
меня [ᶆeᶇɑ] and not [mjɛnjɑ] (of me)
грёзы [ˈɡᶉɔzɨ] and not [ˈɡrjɔzɨ] (dreams)

One of the hardest words in that regard is люблю [ᶅubˈᶅu] ("I love"). English native speakers tend to default to two ways of saying it. One is to add j-glides [ʟjubʟju], the other being to ignore palatalization altogether and say something like ‘loobloo’ [ʟubʟu]. Needless to say, both are incorrect.

One way to fix this is to add a short [i] after the palatalized consonantː [ᶅiubˈᶅiu]. Sustain the consonant in the high [i] position making sure it is palatalized (bright). Once this is mastered, make the [i] shorter and shorter. The goal is to get rid of [i] altogether and to go straight to the vowel: [ᶅubˈᶅu].

Again, there is a high [i] position combined with the main articulation of the consonant, but no separate [i] or [j] sound.

Combinations of a palatalized consonant with a j-glide also occur, although more rarely. While in these cases there is a real j-glide, the consonant should nevertheless also be palatalized:

рьяный [ˈᶉjɑnɨj] (fervent)
платье [ˈpʟɑƫjɛ] (a dress)
бьётся [ˈᶀjɔtt͜sɑ] (beats)
деревья [ᶁeˈᶉeᶌjɑ] (trees)

The "Russian L"

The difference between dark [ʟ] and bright [ᶅ] is one of the most noticeable features of Russian. There is a popular misconception that there is a special "Russian L" that is extremely dark. In fact, both sounds are familiar to any English native speaker with some knowledge of Italian. The dark [ʟ] is the same as in American English in words like mile, pile, loom, balloon, etc. No additional effort should be put into making this sound any darker than a natural English dark [l]. The bright [ᶅ] is the same as in Italian, for example in lieto, malia, pagliacci, etc.

Palatalization before a stop

When a palatalized consonant occurs before a stop (a rest, a breath, a pause of any kind), a very short vocalic offset is heard. This is a natural result of a consonant being pronounced in the high [i]-position rather than a conscious articulatory act. In IPA this offset is indicated with a superscript ʲ ([znaƫʲ], [ᶌeᶉʲ], [ˈkaˈraᶊʲ]) mostly to reinforce the idea of palatalization of the final consonant. It should not become a full j-glide.

It is worth a note that Russian palatalized consonants are not the same as Czech “soft” consonants. For example, a soft Czech [t’] is pronounced with the tip of the tongue against the lower teeth, which changes the main articulation considerably. In Russian, however, in palatalized [ƫ] the main articulation (with the tip of the tongue against the upper teeth) remains the same as in [t]. This is important to remember for singers with much experience singing in Czech. “Soft” Czech [t’], [d’], and [ɲ] may even be perceived as speech disorder in Russian.

Stress and inflection

Stresses are indicated with the symbol ˈ in front of the stressed syllable: [pɑˈᶅeznɨj], [zɑˈmɑx], [ᶁeᶉeˈᶌɑnnɨj]. Russian stress is similar to Italian, meaning the stressed syllables are longer and unstressed ones are shorter3. This is very important in faster passages, recitatives, or when a composer writes a number of even notes in a row. Such rhythm never sounds natural if performed exactly as written.

Necessity of choices

Any living language, just like any living organism, is an immensely complex system with lots of elements, some static, but most constantly moving and changing. In any language there is a relatively stable core and a large area of uncertainty and flux. The language is always changing with time, but there are always aspects that native speakers disagree on at any given moment. This includes vocabulary, pronunciation, and sometimes even grammar (we are not talking about regional dialects; modern Russian is quite uniform, at least much more uniform that modern English.) There is also considerable difference between formal and informal speech.

In case of diction, this means that once we move past the basics, there is a large area of uncertainty. For example, there is almost no variation to how the Russian consonant [t] is formed, but there is a whole spectrum when it comes to things like regressive palatalization, reduction of unstressed vowels, devoicing of final consonants in clitics, and so on.

Any diction resource for non-native speakers must reduce the everchanging living organism of language to a forever static form of phonetic transcription. One and only one choice must be made in any situation where options are possible. For this reason, no phonetic reading can contain the ultimate truth. There is always room for disagreement and discussion. We see the transcription given on this resource of a set of recommendations and not as the only possible solution.

The author believes that there are two main sources choices should be based on: 1) recommendations of respectable scholars and institutions on pronunciation based on study of modern spoken Russian, and 2) research on diction of Russian native singers. In other words, the first source is the consensus on what the pronunciation of a modern educated native speaker should be, and the second one is what the pronunciation is in actual singing. For countless reasons, these two often contradict each other, and ultimately the goal of the author, the way he sees it, is to find a balance between these two sources. We believe that the author's personal speech habits (“what sounds right to me”) should be disregarded as much as possible.

It is important to understand that such balance cannot be found once and for all. There can be no perfect solution, and any aspect of the solution can be disagreed with. We think, however, that the solution offered here provides the most important qualities necessary for lyric diction: it is clear, intelligible, free of regionalisms, more elevated than the everyday language, but at the same time doesn’t sound dated.

Contemporary Standard Russian and Old Muscovite

The standard dialect of Russian is called Contemporary Standard Russian (CSR). To a large degree it is based on the dialect of educated Muscovites. We believe that Russian lyric diction should be based on the pronunciation of CSR, with necessary adjustments.2

CSR is rooted in Old Muscovite (OM) pronunciation, with considerable differences. OM was recommended for stage up until 1970s and was considered more elevated than CSR. One can find features of OM in some vocal phonetic readings published outside Russia even today. However, OM sounds dated to a modern Russian, and a singer using it can appear pretentious.

One of the biggest sources of difference in pronunciation of modern educated Russian native speakers is the influence of Old Muscovite and other dialects that contributed to CSR. The degree of such influence and its specific aspects can vary greatly from person to person. Russian native singers often let their speech habits influence their diction in singing. The result is a wide variety in pronunciation. Unfortunately, many Russian singers don’t pay enough attention to diction in their native language.


Scholars talk about “elliptic” code (informal speech) and “explicit” code (formal speech) in modern spoken Russian4. The list of specifics is rather long, but the most notable are the following. In the elliptic code, the tempo of speech is faster and there is more reduction in more positions than in the explicit code. Elliptic code is generally more relaxed and less clear. Of course, there is a whole spectrum between these two points.

This is another source of inconsistency in the pronunciation of Russian native singers: different singers shift more toward the elliptic code or toward the explicit code. Once again, this seems to be due to the lack of attention to diction. Most singers likely don’t make these choices consciously.

We believe that lyric diction should be largely based on the explicit code for its clarity. Also, it is somewhat more elevated than everyday speech and therefore is more suited for poetic texts.

Reduction of unstressed vowels

While stressed vowels in Russian are generally stable, there is a lot of reduction of unstressed vowels in spoken language. One of the main reasons for reduction is shortening of unstressed vowels.

Length of the vowels is one of the biggest differences between spoken language and lyric diction. In most cases, in vocal music the tempo is much slower than average speech tempo. For example, in Russian, the vowel in the second pretonal syllable is on average 55 milliseconds long5 (the first vowel in a word like хорошо). For comparison, an average blink takes about 100 milliseconds. In Rachmaninoff’s How fair this spot, the first syllable of the word хорошо is set as an eight note. At an average speed of quarter = 60 (one quarter note per second), an eight note lasts half a second, or 500 milliseconds. This is an order of magnitude longer than in speech. Of course, this example is extremely generalized, but it gives a good idea of the magnitude of difference.

Reduced length of unstressed vowels in speech is the main reason for the reduction in their phonetic quality. The word хорошо from the example above would be pronounced [xərʌˈʃɔ] at natural speech tempo. The effect disappears, however, if the speaker is asked to say the same word slower. In that case, none of the vowels are reduced: [xɑrɑˈʃɔ].

There is a correlation between vibrato and the amount of reduction of a vowel in Russian native singers. In most cases we analyzed, if a vowel is too short to have any vibrato, it is more likely to come out reduced phonetically. It is a dynamic process, and if a native singer is asked to lengthen the note, the vowel in most cases will not be reduced.

As a general principle, we recommend reduction of unstressed vowels only on very short notes, i.e., on a note too short to have any vibrato. The most common cases are discussed in more detail below.6

Reduction of unstressed /a/

In spoken Russian, there are two levels of reduction of unstressed /a/: [ʌ] in first pretonal syllable, and [ə] in all other cases. It is important to remember that Russian [ə] (schwa) is quite different from English or German. Russian [ə] is somewhat similar to [ɨ]. This causes much confusion for English native speakers. For this reason, we never use the symbol “ə” in our materials.

Most American sources recommend two levels of reduction of unstressed /a/ for lyric diction, as they occur in spoken language. Most Russian sources, on the contrary, recommend avoiding reduction of unstressed /a/ in most cases, unless it occurs on a very short note. See, for example, Sadovnikov's recommendations (1958, p. 10).

This is also what most Russian singers do in singing. In Rachmaninoff’s Daisies, most native singers pronounce маргариток as [mɑrɡɑˈᶉitɑk], few sing [mʌrɡɑˈᶉitɑk], while we didn’t find one clear case of [mərɡʌˈᶉitɑk], as this word would be pronounced in speech.7

To summarize, reduction of quality of unstressed vowels in Russian is a direct result of their reduced length: unstressed vowels are quite short in spoken Russian (about 1/20 to 1/10 of a second), therefore their phonetic quality changes. It seems questionable to apply the same logic to vocal music, where most notes are much longer than an average length of unstressed vowels in speech. [ə] on a note of considerable length with full vibrato sounds rather strange to a native speaker. For these reasons, we recommend singing clear [ɑ] in almost all cases, except on very short notes.

Reduction of other unstressed vowels. Ikanie and Ekanie

Another noticeable difference between OM and CSR is reduction of unstressed vowels after palatalized consonants and [j] in pretonic syllables. In OM, phonemes [ɛ], [ɔ], and [ɑ] in these positions are realized as [e], and in CSR as [i]. These two systems are called ekanie and ikanie. The differences are discussed at length by Panov (1979, pp. 153-160).


Ekanie (OM)

Ikanie (CSR)











It is easy to see that in ekanie there are three different vowel sounds that occur in these positions: [e], [i], [u]. In ikanie there are only two: [i] and [u]. For this reason, ekanie is more intelligible than ikanie. It also sounds more elevated to a modern Russian ear, while ikanie often sounds too prosaic in a poetic text.

Following the general logic described above, we recommend any reduction of unstressed vowels only on very short notes, including after palatalized consonants and [j], where the difference between ikanie and ekanie is apparent.

While American materials on Russian lyric diction mostly seem to recommend ikanie, we recommend ekanie in most cases for the reasons described above.

Vowels between two palatalized consonants

In spoken Russian, vowels are more fronted before or after a palatalized consonant, and significantly so between two palatalized consonants. For example, the vowel in сад is a dark [ɑ], while in сядь it is [æ]. The reason for this assimilation is simple: the tongue is further front and higher for palatalized consonants, and it doesn’t leave that space fully between them. However, this logic applies only when all three sounds are pronounced within tenths of a second from each other. If a native speaker is asked to say the same word at a slower speed, the effect disappears. It is easy to see that the same logic applies to singing. For these reasons, we don’t recommend modifying vowels based on surrounding palatalized consonants in singing: сад [sɑt] and сядь [ᶊɑƫ] would have the same vowel in lyric diction.

Regressive palatalization

There is a clear trend toward reduction of the amount of regressive palatalization in modern Russian. While about a century ago most consonants before another palatalized consonant would be palatalized, in modern Russian such palatalization occurs only in some position. Such pronunciation now sounds dated to most Russian native speakers. There is an extensive description of various aspects of this issue by Kalenchuk and Kasatkina (2001, pp. 14-39). According to the authors, it is acceptable in modern Russian for any consonant to be palatalized before any other palatalized consonant, although in most cases it is not recommended. Following their recommendations, in our materials, we apply regressive palatalization only in the following cases: [s] before [ƫ]; [z] before [ᶁ]; [s] and [z] before [ᶇ]; [n] before [ƫ], [ᶁ], [t͜ʃ], and [ᶋ]; [x] before [ᶄ]. Two more cases are added to facilitate legato in singing: [b] before [ᶆ]; [d] before [ᶇ].

Double consonants within a word become palatalized if the second one is palatalized: оттенок [ɑˈƫƫenɑk], аллея [ɑˈᶅᶅejɑ], весенний [ᶌeˈᶊeᶇᶇij], оббить [ɑˈᶀᶀiƫ], ввиду [ᶌᶌiˈdu], аммиак [ɑᶆᶆiˈɑk] etc.

Double consonants occurring between words do not always follow this rule. We think, however, that it is acceptable in such cases for all pairs of consonants to assimilate, except for [ʟ ᶅ] and [ᶅ ʟ], where the difference is the most obvious. We recommend pronouncing two consonants of different palatalization when there is enough time. In fast passages, it is probably inevitable that only the second consonant is pronounced.

Here are some examples from generally slower sections:

Мой ангел ли хранитель

mɔj ˈɑnᶃeʟ ᶅi xrɑˈᶇiƫeᶅʲ

Обман неопытной души

ɑˈbmɑ ͜ ᶇᶇeˈɔpɨtnɑj duˈʃɨ

Сон нисходит

ˈsɔ ͜ ᶇᶇisxɔᶁit

Закатилось солнце

zɑkɑˈƫiʟɑ ͜ ssɔnt͜sɛ


In case of double plosives, it doesn’t matter if the two components have different palatalization because only the second one is pronounced. We still indicate both as palatalized in the transcription:

Служить тебе

sʟuˈʒɨ ͜ ƫƫeˈᶀe

Средь диких скал

sᶉe ͜ˈᶁᶁiᶄix ˈskɑʟ



There are some cases, some in the most famous Russian pieces, where the pronunciation has changed significantly since the time when the poem was written. If the poem is spoken according to modern rules, the rhyme is destroyed. The choice is either to preserve the rhyme and to sound somewhat dated to the modern ear, or to sound modern but destroy the rhyme. Different singers solve this dilemma differently. We recommend keeping the rhyme and therefore the structure of the poetic text.

One notable case is in the opening lines of Rachmaninoff’s Do not sing, oh beautiful maiden:



ᶇe ˈpɔj krɑˈsɑᶌit͜sɑ pᶉi ˈmᶇe
Не пой, красавица, при мне


tɨ ˈᶈeᶊen ˈɡruᶎii ᶈeˈt͜ʃɑᶅnɑj
Ты песен Грузии печальной:


nɑpɑᶆiˈnɑjut ˈmᶇe ɑˈᶇe
Напоминают мне оне


druˈɡuju ˈʒɨᶎᶇʲ i ˈᶀeᶉeɡ ˈdɑᶅnɑj
Другую жизнь и берег дальной.


The rhyme structure of the verse is ABAB. In modern Russian, the last syllables of AA would be pronounced [mᶇe]/[ɑˈᶇi], and in BB [ᶈeˈt͜ʃɑᶅnɑj]/[ˈdɑᶅnij]. To keep the rhyme, we recommend [mᶇe]/[ɑˈᶇe] and [ᶈeˈt͜ʃɑᶅnɑj]/[ˈdɑᶅnɑj], which is closer to how it was pronounced in Pushkin's time.



1 This is different from Czech and singers with much experience singing in that language need to be careful not to bring this habit to Russian (Cheek, 2001, p. 20).

2 According to current IPA standards, palatalized consonants are represented with superscript j-glides. For example, люблю is transcribed [lʲubˈlʲu]. However, this looks to an average English native speaker without much experience with Russian as if there were j-glides after each palatalized consonant. For this reason, obsolete symbols with hooks are used: [ᶅubˈᶅu].

3 In spoken Russian, unstressed vowels undergo reduction according to rather complex rules. In lyric diction, reduction occurs to a much smaller degree. This is discussed in more detail below.

4 First summarized by Panov (O stiljax proiznošenija (v svjazi s obščimi problemami stilistiki) [On styles of pronunciation (in connection with general issues of style)], 2004), original article published in 1963. Comrie (1996) follows his ideas and terminology.

5 Kodzasov S., 2001, p. 474.

6 Letter "o" always denotes phoneme /a/ when not under stress (except for a few borrowed words), which can undergo reduction as described below. This is an issue of spelling, not vowel reduction.

7 It is interesting to note that higher voices tend to be more consistent in this regard than low voices. Russian basses and baritones in general tend to sing in a more open manner than their Western colleagues. The variety of shapes of vowels even under stress in Russian basses can be overwhelming.