Monday, 11 June 2012

Investigating words: Validity (part 1: basic rules)

Posts in this series: part 1; part 2; part 3; part 4.

Since I'm not going to reblog the current episodes (here's the link page for them), I thought this would be a good week to talk about some other things.  We'll see how that turns out.

An ongoing concern for me and perhaps for other watchers of the show -- and certainly for contestants! -- is how to tell if a particular word is going to be allowed or not.  A lot of the time this is fairly easy to answer, but there is also a considerable grey area.  When a contestant steps into that realm of uncertainty then David makes a ruling, but the players at home are not so fortunate.  This is the first of three posts in which I am going to attempt to set out the situation as I understand it.

Additionally, over the course of this blog there have been several words come up for consideration which are of uncertain validity (or were at the time).  If they had been chosen on the show then we would have got a ruling, of course, so these are words which have not yet been tried.  (Or possibly, tried in episodes before this blog started.)  I'm going to list many of them in the appropriate parts, often with my guesses as to validity.

So what is the difficulty?  In a nutshell, it is the inflected forms.  English has a few standard inflected forms that cover the majority of grammatical uses; the ones of relevance to this discussion are:
  • plurals of nouns;
  • past tenses, (third person) present tenses, and present participles of verbs; and
  • comparatives and superlatives of adjectives.
Most of these follow a simple pattern (or one of a few simple patterns); inflected forms that follow these patterns are called the regular inflected forms.  Like most dictionaries, the Macquarie generally does not list these forms since they can be deduced by the reader.  Here's the Macquarie's explanatory note about inflected forms:
Inflected forms

If a headword has irregularly inflected forms, the summary of these forms is given immediately after the relevant part of speech.  Regularly inflected forms, not generally shown, include:
  1. Nouns forming a plural merely by the addition of -s or -es, such as dog (dogs) or class (classes);
  2. Verbs forming the past tense by adding -ed, such as halt (halted);
  3. Verbs forming the present tense by adding -s or -es, such as talk (talks) or smash (smashes);
  4. Verbs forming the present participle by adding -ing, such as walk (walking);
  5. Adjectives forming the comparative and superlative by adding -er and -est, such as black (blacker, blackest).
Regular forms are given, however, when necessary for clarity or the avoidance of confusion.

The past tense, past participle and present participle are given as the inflected forms of verbs.  Where, as commonly happens, the past tense and past participle are the same in form, this form is shown once.  For example, the inflected forms indicated for love are loved, loving, where loved is both the past tense and past participle.
Those with a certain mindset will note that the above explanation includes ambiguities in cases 1 and 3: Which (if either) of -s or -es is the right one to use?  Sadly, we are just supposed to know.  I regard this as terrible laziness on the part of the dictionary; they must have some internal guidelines that they work to, and they should make those explicit.

There's a less obvious issue which is another case of being expected to already know the answer: Some adjectives, particularly those with three or more syllables, form their comparatives and superlatives by prepending "more" or "most" instead of suffixing -er or -est.  e.g., amazing / more amazing / most amazing, rather than amazing / amazinger / amazingest.

Some other adjectives have meanings that do not  admit comparative or superlative forms; common examples are adjectives describing a property which an object either has or does not have, with no gradation, such as "unique" or "pregnant".

One further issue that can arise is the question of whether certain nouns are even pluralisable; this tends to make up the majority of the cases of uncertainty, at least on my part.  The main class of interest is mass nouns such as "furniture" or "knowledge", but there may be others (such as proper nouns for which there can only ever be one of the noun; almost all proper nouns are capitalised, however).  My perception is that the show has always taken a rather lenient attitude towards plurals of mass nouns (rightly erring on the side of caution), but that perception is arguably affected by an error on my part, which I will explain in part 3.

Into this morass of uncertainty wades David Astle, wielding his mighty powers of arbitration.  When a word is not clearly valid then he has to make the decision as to whether it will be allowed or not.  This is a somewhat thankless task, and the ambiguities above can make it quite difficult.  Sometimes the resolution depends on technical details which the definition in the dictionary has glossed over, which is a further complication.

Probably more than any of us, David would like these decisions to be easy and obviously correct -- it is his reputation on the line, after all.  The show, too, wants a general perception that the rulings are fair.  So to cope with these ambiguous situations David has a certain number of rules that he applies, and when one of them occurs he explains the rule so that we (the audience) can understand how to apply that rule in similar situations.  Eventually we can hope to build up a mostly complete picture of what will be allowed and what will not.

I will note that these rulings have changed a little over time; a quick flip through the first book (episodes 1 to 50) turns up five words (three attributed to contestants, two to David) that would definitely be rejected today.  This could be due to errors in the book -- it does have many typos, I'm afraid, and I'm prepared to believe that a contestant's invalid word is sometimes listed as valid -- but I think that it is instead indicative of David realising that a consistently applied set of rules is more predictable and thus fairer to contestants.  Even when the rules allow words that seem silly, or reject words which seem like they should be allowed, the advantages of consistency and predictability more than make up for those occasional lacks.

Here are the show's basic guidelines about validity:
  • Valid words are headwords, their inflected forms (regular and irregular), run-on headwords, and variant spellings.  Headwords, run-on headwords, and variant spellings must appear in bold face. 
  • Valid words must not be hyphenated, capitalised, or include an apostrophe.
  • Words that appear only in combination are not valid.
The second of these points just invites nitpicking, as it does not rule out items like "keV" (the kilo-electron-volt unit in physics) or "o.y.o." ("own-your-own").  But I think it is a reasonable inference that any punctuation or uppercase letter in the word will render it invalid; anyone who tries such a word is just asking for trouble.

The third rules out words like "mistle" that occur only in combination (in this case, only in the term "mistle thrush"); a more common example (to my dismay, since for a long time I thought that I had outdone David by finding it) is "cabernet", which is only listed in the Macquarie as part of the terms "cabernet franc" and "cabernet sauvignon".  It thus only ever appears in combination, rendering it invalid.

The first is not as appropriately encompassing as it perhaps should be.  Strictly speaking, it only allows inflected forms of headwords, and not of run-on headwords or variant spellings.  Perhaps more troubling (from an adjudicator's point of view) is that it raises the possibility of unmentioned irregular forms being allowed, which would call for external knowledge.

Fortunately the show has been sensible enough to make these guidelines rather than rules, and to have them interpreted by David.  My impression is that David is operating by very similar rules but would allow inflected forms of run-on headwords and variant spellings.  Additionally, irregular inflected forms must be listed before they will be considered.

Posts in this series: part 1; part 2; part 3; part 4.


Victor said...

Words that appear only in combination can be rather contentious. I first noted them when in some earlier game you ruled AMOUNTED invalid since it only appears followed by "to", (although even this is not listed as a verb sense).

So what do you think about EARSHOT? It is both a headword (thus valid) and only defined in combination with "within ..." or "out of ..." (and thus invalid).

On a side note, bad luck to Colin having DEBONES disallowed. This is the only time I've genuinely been annoyed with the Macquarie, because it IS listed in the online version! I suppose we'll see it in the 6th edition and that they overlooked it when printing the 5th, as it surely did not come into common usage between 2009 and today.

Geoff Bailey said...

Yes, I agree -- the whole "only in combination" thing is somewhat misapplied. My impression is that it is meant to rule out the individual words of compound terms like "bok choy" or "mistle thrush". In some cases, particularly with verb phrases like "amount to", this may feel like it falls on the wrong side of things.

I'll note that being a headword is not in and of itself a guarantee of validity; there are many headwords ruled out by other conditions, such as capitalisation, hyphenation, etc. The only case I can recall in the blog where this might matter is that of VERNICLE, which I closed the case on prematurely because of its headword listing; I shall have to revisit that at some point.

As for EARSHOT, my interpretation of matters is that it is only listed in combination in the Macquarie 5th. It is unusual (but not unique) in that the combining words come before it; thus, when it is listed the headword is provided so that the alphabetical ordering is preserved. (Another approach would be to list it only under the entries of "within" and "out of", but this would be unhelpful.)

That is, since the headword has no associated definition or part of speech, I interpret its presence as purely there for help in looking up the phrase listings associated to it, and not an indicator that it is a word in its own right (from the Macquarie's point of view).

Thus I would deem EARSHOT invalid for the show's purposes, and unluckily so to any contestant trying it.

I should add that I disagree strongly with the Macquarie here. There's a perfectly serviceable definition of EARSHOT as a noun: "range of hearing". In fact, they define "WITHIN EARSHOT" as "within range of hearing" and "OUT OF EARSHOT" as "out of range of hearing", which just highlights the absurdity of not defining EARSHOT on its own.