The meme hit in 2012. A noble Tumblr artist first created it. It was picked up by BuzzFeed, followed by a flood of YouTube uploads. In 2016, you’ll see tweets about it. Justin Timberlake has acknowledged it with typical good humor, deigning to sing the meme when asked—even now, more than a decade and a half later.

It’s not super fun to explain a meme, but we kind of have to, so: The “it’s gonna be may” meme is a reference to NSYNC’s 2000 hit “It’s Gonna Be Me,” in which lead singer Timberlake memorably sings the title of the song as “it’s gonna be may.” But I think what makes the meme resonate is that “it’s gonna be may” is just one example of a linguistic tendency that was weirdly popular in the late 1990s and early 2000s.

Think of Mandy Moore’s “Can-day,” Britney Spears growling “oh bay-bay bay-bay,” Gwen Stefani chanting “hey bay-bay hey bay-bay HEY.” The trend to turn the “ee” sound into “ay” continued for years, maybe most memorably in Gnarls Barkley’s “Crazy.” (Cray-zay, really.) This isn’t one guy’s vocal quirk: this is a trend, maybe a virus. Why did all these singers change their vowels in that particular way?

Accents in singers are an exercise in frustration; it’s often impossible to connect the accent in a singer’s speaking voice with the accent in his or her singing voice. Billie Joe Armstrong of Green Day—born and bred in Northern California—sings in a British accent. Adele, a Londoner, has an indistinct North American accent in her songs.

But I thought this one might be different. I asked some linguists if there are any accent groups in North America who turn their “ee” into “ay,” and sure enough, there are: the South. “The Southern Shift lowers and diphthongizes the vowel /i/, which is the pattern you’re observing here,” says Kara Becker, a linguist at Reed College. That’s a bit to unpack, but she’s referring to the Southern Vowel Shift, which is responsible for the changing of various vowel sounds in Southern speech. A “shift,” in linguistics, is a kind of broad-scale changing of vowel sounds: if one vowel sound changes, then something else will probably change to take the place of the first vowel. The Southern Vowel Shift is why, in Southern speech, “ride” sounds somewhere between “rod” and “rad,” “rat” sounds like “ray-at,” that kind of thing.

Britney Spears performing in 2011.

Britney Spears performing in 2011. Jen/CC BY 2.0

Southern speech also has a tendency to “diphthongize” sounds. In linguistics, a monophthong is a simple, single-part vowel, like, well, “ee.” A diphthong is a two-part, more complex vowel. The “ay” sound, like in “may,” is a diphthong: it’s constructed by starting at the monophthong “eh” and sliding to the monophthong “ee.” Southern speech has tons of diphthongs, even some triphthongs (that’s a three-part vowel), way more than other dialects in North America, which is part of the reason why Southerners have a reputation for “drawling” or speaking slowly. It’s not actually slower, Southern vowels just have more stuff crammed into them.

So, okay, turning “me” to “may” is kind of Southern. This makes sense! Justin Timberlake is from Tennessee, Britney Spears is from Louisiana, Mandy Moore is from Florida, and Cee-Lo Green, the singer of Gnarls Barkley, is from Georgia. Southern singers using Southern vowels! Problem solved, right?

Except, no, not really. Those singers don’t use any other Southern vowel sounds in their songs, and we still have the vexing problem of Mandy Moore, who, though she was technically raised in the South, is from Orlando, which does not have a traditional Southern accent, and Moore doesn’t demonstrate any Southern elements in her normal speaking voice.

Audible Southern accents are exceedingly rare in this kind of pop music. They’re common in pop-country, of course, and there are some R&B singers who will slot in Southern elements to pay homage to the fact that R&B was created in the South (using the word “ain’t,” for example). But certainly nobody would say that Britney Spears or NSYNC were Southern-sounding pop acts.

Gwen Stefani in concert in 2015.

Gwen Stefani in concert in 2015. Lorie Shaull/CC BY-SA 2.0

Shockingly, linguists have not really studied the linguistics of early 2000s pop singers. But others have thought intensely about the way these stars sing. I called Lis Lewis, a professional voice teacher based in Los Angeles, who over the past 40 years has trained a dizzying array of pop stars: Rihanna, Gwen Stefani, Britney Spears, Miguel, Courtney Love, the Pussycat Dolls, many more. Her job as a voice teacher isn’t just to get a singer’s voice sounding good. She also functions sort of like a physical trainer, getting these singers prepared for the hardships of belting out songs for hours every day while on tour or in the studio. Without proper preparation, it’s easy for a singer to, essentially, pull a muscle, and lose or weaken his or her voice for a period of time—very dangerous, given the amount of money on the line.

“We have two voices, one is the high voice and one is the low voice,” says Lewis. “The high voice is called ‘head voice’ and the low voice is called ‘chest voice.’ When a song gets really big and exciting, usually toward the bridge and the last chorus, the chest voice gets higher and kind of angst-ridden, it makes it sound really urgent.”

This is very complicated and not very well understood; physiologically, there isn’t wide agreement on what makes up chest voice as opposed to head voice, or where the divide is. There doesn’t seem to be any difference in how the vocal cords vibrate, so some voice teachers avoid it, but it’s been ingrained for singers for so long that it’s still the norm to talk of the two. Generally, amongst vocal teachers, it’s taught that chest voice is a range of notes wherein the breastbone is felt to vibrate, and head voice is the range higher than that, where the bones of the jaw and skull are felt to vibrate. (Falsetto, for the record, is something different.) More power is thought to come from chest voice.

What Lewis is talking about is the very upper end of the chest voice, which comes with a set of characteristics: emotion-filled, maybe a little scratchy, certainly loud. This is what we’re talking about when we say someone is “belting” out a note. This is right at the top of the range for the singer’s chest voice, meaning that the emotion-filled tone is the result of, basically, the singer having to strain to hit that note.

Billie Joe Armstrong of Green Day performing in 2009.

Billie Joe Armstrong of Green Day performing in 2009. Naomi Lir/CC BY-SA 2.0

Certain vowel sounds are easier or harder to sing when you’re straining so hard to hit a high chest voice note. “When the vowels stay small, like in ‘ee’ or ‘ooh,’ you can’t get up as high,” says Lewis. “Ee” and “ooh” are very, in linguistic terms, “tense” vowel sounds, which means that the opening of the mouth is very small. Belting out an intense, top-of-your-range note using a tense vowel is really difficult, and, says Lewis, can even lead to straining the singer’s vocal cords.

“When you’ve got those little vowels, you tend to want to slide over into your head voice,” says Lewis. But sliding into head voice would lose the power and tone you want. “So when you need a high note, you’d generally open a word like ‘me’ to ‘may,’ or ‘candy’ to ‘canday,’ or ‘you’ to ‘yuh,’” she says. Ah ha! We’ve solved it!

Listen to Stevie Wonder’s “Superstition.” The line “thirteen-month-old baby,” in most of the verses, is pronounced with a clean “ee” sound. Until, just like Lewis predicted, the build-up to the last chorus, when Wonder takes the melody higher and more intense. Listen at around 3:00 into this for his sudden change to “thirteen-month-old baybay.”

Except, most of those examples, especially “it’s gonna be may” and “canday,” aren’t especially high notes for Justin Timberlake or Mandy Moore. They aren’t actually straining to hit them. So why are they acting like they are?

Lewis’s theory, which makes sense to me, is that this is an attempt to co-opt the signifiers of intensity without actually needing to use them. Wonder sings “baybay” late in “Superstition” because he’s worked his energy level up, he’s hitting a high, hard note, it bursts out naturally because that’s the way it’s comfortable for him to sing it. “It’s gonna be may” is not like that; Timberlake could sing a clean “me” there perfectly comfortably. But listeners like the intensity of lines like Wonder’s; it’s big and bold and passionate. And Wonder’s vowel sound there has come to indicate to listeners that he’s being big and bold and passionate. Timberlake, Moore, and Spears all just…use that signifier, without any of the physiological need for it. It’s fake energy. Fake passion.

Gnarls Barkley’s “Crazy” is different; when Cee-Lo Green sings the word “cray-zay,” he’s belting out a difficult high note, hence the mutation. But the very next line ends with a much lower note in “possibly,” which he sings with a clean “ee” sound—he doesn’t need to sing “possiblay,” so he doesn’t.
Turning “ee” to “ay” when it’s not strictly necessary is a savvy kind of trick. In a recent interview, Justin Timberlake even said that Max Martin, the songwriter and producer of the song (along with “I Kissed A Girl,” “We Are Never Ever Getting Back Together,” “I Want It That Way,” and about a billion other songs), told Timberlake to sing it that way. Timberlake said he thought Martin wanted him to “sound like I’m from Tennessee.”