In November 2021, linguists from around the world met in Lausanne, Switzerland, for the seventh edition of a conference focusing specifically on the “R” sound. The conference, called ‘R-Atics, included a presentation on the intrusive R used in the Falkland Islands, a reconstruction of what R sounded like in historical Armenian, and a discussion of the R sounds in Shiwiar, an indigenous Ecuadorian language spoken by well under 10,000 people, among other events and talks. Don’t be too surprised if, at a future ‘R-Atics conference, the “crispy R” joins the ranks of esoteric presentations from linguists obsessed with the weirdness and variation of this particular sound.
The crispy R is a phenomenon that some linguists had noticed, but which had gone largely unstudied—until the phrase “crispy R” was bestowed on it by Brian Michael Firkus, better known as Trixie Mattel, the winner of the third season of RuPaul’s Drag Race All Stars, and later popularized via TikTok. The sound is easier to point out than it is to either describe or reproduce. Some of the most frequent users of this unusual-sounding R include Kourtney Kardashian, Max Greenfield of New Girl fame, Stassi from Vanderpump Rules, and Ezra Koenig of Vampire Weekend. It sounds, to me at least, like a sort of elongated, curled sound, a laconic way of saying R.
To figure out what’s going on with this linguistic quirk, I pored over spectrograms of a podcast I like, ranking various spoken words on their degree of crispiness. I silently mouthed the word “crispy” over and over during interviews with several linguists who, I have to say, were at least as interested and enthusiastic about the crispy R as Katya (Brian Joseph McCook, Firkus’s frequent collaborator and cohost), who literally screams several times upon hearing the sound.
The linguists were careful to note that any conclusions about the crispy R at this stage are still preliminary. They’ll have to do more listening surveys, more spectrograms, and ideally capture one of these rare natural crispy R speakers and try to get an ultrasound of the way their tongues move inside their mouths. But to understand their explanation, we first need to explain what a weird, distinctive, unusual thing the R sound is.
The R sound is indeed fascinating enough to be worthy of its own conference. It was the subject of a seminal study in 1966 by the godfather of sociolinguistics, Bill Labov. This involved talking to salespeople at Manhattan department stores that catered to different socioeconomic groups, and getting them to say the phrase “fourth floor,” to see whether there was a connection between the ways they pronounce their R sounds and their social milieux.
Linguists use the word “rhoticity” to talk about the R sound on a very basic level. English is an especially ridiculous language in the way letters don’t always do what they appear to, and don’t even do that in any kind of consistent manner: Think about how utterly silly it is that “rough,” “cough,” “bough,” and “though” are all pronounced with different vowel sounds. A pretty substantial percentage of native English speakers are what’s called “non-rhotic,” meaning that in many situations, they pretend as if the letter isn’t even there. Speakers of many dialects of British English, Australian English, South African English, and a vanishing number of dialects of American English are non-rhotic.
In American English, that non-rhoticity was, from the colonial era up to the early 20th century, considered prestigious: It was associated with the wealthy port cities of the Northeast that had extensive contact with Europe. (Think of FDR’s “the only thing we have to feah is feah itself,” or basically anything JFK ever said.) Labov’s study documented the decline of that prestigious connection. Non-rhoticity vanished from the upper classes as the United States overtook England as a world power. You can learn an awful lot about people, culture, and politics by studying R, it turns out. Also, it’s really weird, and linguists love that.
Here’s an example: R isn’t really a consonant. It’s kind of a vowel, at least phonologically. From the perspective of how sounds are physically generated, consonants are made by constricting or closing some part of your mouth or throat. Sometimes that’s done by closing your lips (P), or by blocking the flow of air with your tongue and then suddenly releasing it (T), that kind of thing. Vowels, on the other hand, are made with basically an open tube, from your vibrating vocal cords through your throat and out of your mouth. You make different vowel sounds by moving your tongue and lips around, but the tube stays open. R, like some other “consonants,” such as W and Y, aren’t produced the way a K or B is. It’s produced like a vowel, at least in English.
The sounds indicated by many letters are pretty much the same across most languages. Take a nasal, such as M, which is basically the same in every language that has it. R is extremely not. The two most obvious examples are the rolled R in Spanish, its quicker cousin in Scottish English (called an alveolar flap, basically a single roll of the tongue that Spanish speakers do), and the French R (called a uvular R, a guttural sound also made in other central and northern European languages, but only sometimes in Québécois French). Aside from Spanish and Scottish, these variants have no relation to each other and are not produced in remotely the same way. But they’re all Rs.
I spoke to Tara McAllister, a linguist turned speech pathology researcher. “I got interested in R because while it’s linguistically interesting, and a lot of linguists study it, it’s also clinically interesting,” she says. “A lot of kids who are in speech therapy for a long time get stuck or plateau on it.” The R sound is often one of the last sounds that young speakers learn to make, and one of the most challenging; it’s why “baby talk” often includes swapping the difficult R for an easier W. (A widdle easiew, anyway.) McAllister works with ultrasound technology, which enables her to show kids, in real time, what their tongue is doing as they speak, and to help her coach them to make changes to produce the R sound.
For most sounds, that’s a pretty simple process, in theory if not in practice, but not for R—because it can be made in multiple completely different ways. When we learn how to speak, at least outside speech therapy, we aren’t taught tongue movements. Nobody says, “Hey, in order to make a ‘th’ sound, gently pinch your tongue between your slightly open teeth and then squeeze air out of your mouth as you sort of flick your tongue back inside your mouth.” We all generally, while very young, experiment with different ways of making noise: vibrating our vocal cords, changing the shape of our mouths and lips, toying around with airflow.
Typically, we all end up doing roughly the same thing; after all, to make, say, an M sound, you kind of have to follow certain steps (air through the nose, lips closed, vocal cords vibrating). But it’s not essential; the only thing that’s really important is that a listener will interpret the sound you’re making correctly.
R is unusual in that respect because in English there are two totally different ways to make the sound. They’re more like points in a spectrum, really. “There are as many ways to pronounce R as there are speakers,” says McAllister. But linguists generally talk about two main shapes: bunched and retroflex. Bunched is, as its name suggests, made with the tongue pulled back in the mouth, all folded and crammed in there. Retroflex is made with the tip of the tongue pointed upward. They’re completely different shapes, but somehow they both end up making an R sound.
It has generally been thought that a retroflex R and a bunched R can’t reliably be distinguished by the human ear. That’s not to say there’s no difference in the way they sound, but rather that the human ear is not very good at recognizing those differences. Luckily we’re no longer restricted to the clumsy appendage of the human ear! This turns out to be key to understanding what’s going on with the crispy R, and so we turn to the machines to reveal what our dumb ears and brains have trouble distinguishing.
To find out, I talked to Jeff Mielke, a phonologist at North Carolina State University and one of the premier experts on the American R. He did a spectrogram, which shows all the frequencies in human speech, for the TikTok videos of the crispy R. But those videos have an inherent problem: The speakers are imitating the crispy R, not naturally producing it. Are they making it in the same way as natural crispy R speakers or using some totally different way to create the audio qualities they hear?
When I heard the TikTok the first time, I immediately recognized the phenomenon being discussed. It was a sound I had heard before and taken note of as different. Revealing this unexpected talent—it turns out not everyone, including many of my friends and one linguist whom I played the video for—to Mielke meant that I was now a valuable resource in the early stages of crispy R research. He asked me to send him any clips I could think of featuring natural crispy R speakers. I sent him an episode of the very good podcast TrueAnon, whose two hosts, Liz Franczak and Brace Belden, both demonstrate the crispy R to varying degrees.
Some comments on the original TikTok suggested that what is being called the crispy R is actually just a retroflex R. McAllister mentioned that it’s very common to jump to a conclusion that any odd-sounding R might be retroflex rather than bunched; in fact, she suspected that her own daughter might be a retroflexer, and excitedly tested her out with the ultrasound. (Why have access to fun equipment if you’re not going to use it?) Turns out, as with most of the retroflex guessing, McAllister’s daughter was, as she put it, “the bunchiest buncher.”
This makes sense. From a study written by Mielke: “These different articulations are well known, as is the observation that the different configurations do not make a perceptible difference to the listener. That is, in contrast to many other linguistic variables, whether a person is bunching or retroflexing is not apparent just from listening.”
Studies are not really conclusive on this, but have over the years indicated that some people use bunched Rs for everything, some use retroflex Rs for everything, and some use both depending on the context. Aside from doing an ultrasound of tongue movement, which sounds very fun and like something I’d love to do at some point, there are ways to figure out whether an R is bunched or retroflex. Mielke walked me through a couple of spectrograms of TrueAnon cohost Franczak saying the word “crew.”
Speech sounds are like musical chords. The dominant sound we hear is the lowest-frequency tone—our ears are just better at picking that up—but then those sounds have cascading harmonics, higher notes that go alongside them. It’s a little like playing a note on a piano, but then adding another note, more softly, higher up, and then another and another. Eventually the human ear stops being able to hear the difference when a new harmonic is added, or the brain stops caring about those higher notes because they’re unlikely to affect meaning.
Each of those notes is called a formant, and the frequency of those formants reveals the difference between Rs. The first and second formants, the lowest, are what’s most important for the human ear. “When we talk about vowel formants, we often disregard the third and higher formants. But where they’re important is in talking about how R is different from other vowel-like sounds,” says Mielke. We can technically hear frequencies at those higher formant levels, but our brains just sort of ignore them. Studies have shown that it takes a lot of weird stuff happening in those formants for us to notice.
In a bunched R, the fourth and fifth formants are very close together. In a retroflex R, they’re much farther apart. It’s a giveaway, albeit one we can’t really process without technological help. But there’s more going on than that. For one thing, it’s not exclusive: Every crispy R seems to be retroflex, but not every retroflex R sounds audibly crispy.
So we have something peculiar: If we assume that crispy Rs are retroflex, how is it possible that I and others can differentiate them? The spectrograms suggest that we can, but we really shouldn’t be able to. And if we can tell the difference, why do not all retroflex Rs sound crispy?
One possible explanation shows up in the spectrogram. In the Rs that I rated as “crispiest,” they’re clumped with hard consonant sounds such as K and B. And in those cases, there’s a pretty substantial gap between the consonant and the R that follows, so “crew” sounds almost like “kuh-rew.” This is why, I think, Firkus decided on the term “crispy” to describe it. It’s not exactly evocative of a “crispy” sound (What would that even be? Like the sound of a knife running along the edge of a fresh loaf of sourdough?) but it’s useful—if you make crispy Rs, when you say the name of the phenomenon, you’ll be demonstrating it right there.
McAllister suggests that what might be happening is that, well, it’s not really about the R, but rather what the R does to a neighboring sound. A consonant such as K or B is called a “stop,” which means it is a sound that requires the cessation of noise. As you transition from that to an R sound—in a word like “crispy”—the shape of your tongue will change the path of the burst of air used for the combined sound. In “crispy,” according to this theory, it’s not the R that’s crispy. It’s the K.
(There’s another element: crispy Rs that allegedly appear at the end of words. I hear this in the way Death Cab for Cutie singer Ben Gibbard pronounces the word “year” in this song. It’s unclear if this ending R is related to the other crispy R; more study is needed.)
An especially fun thing I’ve found about linguists over the years is that they are universally very excited to hear about some weird new accent or linguistic quirk. Both McAllister and Mielke immediately got to work as soon as I introduced them to the crispy R. They posted about it on social media, shared it with other linguists whose specialties and subspecialties might provide insight, made videos, isolated and analyzed audio clips. I didn’t ask them to do this stuff. They were just psyched to dig into something new.
It’s also pretty likely that, even if linguists come up with a more precise name for the phenomenon, it will be forever referred to in academic literature and conferences as “known in the wider population as the ‘crispy R.’” All from some TikToks.