I raised an important point about language in yesterday’s blog (the topic of which I promised never to mention again). In it I raised the issue of “potential words” and linked it to my article on “How Many Words are in English?”

InfinityAs I say in the article, this question really doesn’t make sense for several reasons but the main reason is that not all words are real things. Let’s compare the question to the question, “How many sentences are there in English?” No one asks that question because we create sentences “on the fly”, as they say in geekish, so that there is no way to count them. Moreover, sentences are composed of words which may be rearranged in near infinite ways.

Sentences may contain an infinite number of subordinate clauses:

This is the maiden all forlorn, that milked the cow with the crumpled horn, that tossed the dog, that worried the cat, that killed the rat, that ate the malt, that lay in the house that Jack built.

This sentence continues for four more verses where it ends perfectly arbitrarily for it could go on forever.

Now, words comprise morphemes, parts of words with meaning: amuse is a morpheme. -Ing is a morpheme that may be attached to amuse, giving amusing. Un- may be attached to that word, giving unamusing, an adjective from which the adverb unamusingly may be derived.

From unamusingly we don’t seem to have any where to go, so infinitely long words seem impossible in English. English is a language with a dearth of affixes (prefixes and suffixes)—only about 36, most of which are seldom used. Eskimo languages, however, have somewhere in the neighborhood of 200 affixes and words in those languages get very long.

Even in affix-poor English, there are word constructions that suggest infinite extension. Let’s start with nation from which we can derive an adjective national. Now there is a verbal suffix -ize which attaches to any word ending on -al. Guess how you get a noun from verbs on -ize: right, the suffix -ation, which puts us right back where we were with nation, doesn’t it? So why not nationalizational, setting us up for nationalizationalize, a process which could go on forever.

But, I hear you whining, these words don’t mean anything! In fact, they do. The problem is—well, there are two problems. The first is that once we get past nationalizational we don’t have anything in real life for all the other derivations to refer to. But that isn’t English’s fault; the fact remains, this derivation could go on forever if its outputs were necessary. In fact, I’m not sure what the sentence in “This is the House that Jack Built” refers to, either.

The reason is the second problem: the human brain. The human brain can process only a limited amount of information in one chunk, whether that chunk be a sentence or word. We can process longer sentences better than long words, apparently, but Eskimoes process words as long as English sentences, so that may be simply a matter of practice.

The biggest reason no one can ever answer the question, “How many words are there in English”, is because most grammatically possible words in English are potential, created when needed on the fly by using the rules of lexical grammar. Even if we could spell out all those rules (and I know most of them) and could predict their output, it would not help because certain combinations of rules, as we saw above, create an infinite number of infinitely long words.

To me this aspect of English is far, far more surprising, fascinating, intriguing than a hard number for the English vocabulary. I don’t know why anyone would even be curious as to what such a number would be. I am infinitely uninterested in it.

