It’s not a what, it’s a who. George Kingsley Zipf (1902–1950) wasn’t the first person to notice that the number of occurrences of any word in any given corpus of literature – of any language – is inversely proportional to the rank of how common that word is, but he was the first person to realise how totally bizarre this is. For example, the most common word in English is “the”. The next most common word is “of”, and that occurs just half the time “the” does. The third most common word occurs a third of the time, the fourth a quarter, and so on until the two-hundredth thousandth, three hundred and forty-seventh most common word (“Zipf”, if you’re interested), which occurs 1/200,347 as often as “the”. The reason this is totally out of left field is that we’re saying an ordinal ranking (first, second, third, etc) is somehow a function of a quantity cardinal (one, fifty-seven, 200,347, etc). The two are completely different kinds of entity – it’s like saying apples are inversely proportional to oranges. Strange strange strange.
I’m a big fan of vsauce on Youtube, mainly because all the rules of how to make a science podcast would make him he’s as irritating as hell, yet he isn’t at all. There you go; there’s hope for us all. Anyhow, his one on the Zipf effect is well worth 15 minutes of your time.
Edited to add: the Zipf effect also explains the perversities of capitalism, why the big have an unfair advantage over the small, much greater than their relative sizes, and why Nero can have a queue coming out the door while the better – and cheaper! – indy coffee shop next door is empty. Right-wingers always talk about a “level playing field” – one of the reasons I’m a socialist is that I’d actually like one.