Playing Letters and Numbers: Finals preview (series 4): Rating the finalists

Thursday, 1 March 2012

Finals preview (series 4): Rating the finalists

Now that the eight finalists are known -- although it came down to the very last game! -- I thought that I would review their performance and make some predictions about likely results in the finals. Very little should be read into this, since the game has high variability and the contestants are likely to have changed in ability in the meanwhile; I know that when I was told I might make it to the finals I put in some more practice, for instance. That was around the time that I started this blog, I think, and my consistency has certainly changed since then.

Note: Shaun Ellis's first two games were played last series, and I do not have a record of them. I could probably work something out, but it seemed simpler just to omit them from consideration. Also, Sam Gaffney's fourth game needed a second conundrum round during actual play. However, I am treating that game as stopping after the first conundrum in order to make the comparisons match up more sensibly.

To start with, here are the solo totals for each contestant, ordered by their average score per game. The solo total is what they would have scored for all their rounds if there were no opponent.

							Total	Average
Sam Gaffney	51	81	67	64	64	55	382	63.67
Kerin White	68	63	59	63	71	51	375	62.50
Alan Nash	60	73	59	54	55	69	370	61.67
Toby Baldwin	65	48	60	48	54	50	325	54.17
Daniel Chua	51	59	56	53	53	52	324	54.00
Roman Turkiewicz	68	55	57	52	34		266	53.20
Sebastian Ham	49	55	65	40	56		265	53.00
Shaun Ellis			43	59	45	44	191	47.75

This shows a quite clear stratification, with the top three very close to each other, then the next four clustered further down, and then another gap to Shaun. I also find it interesting how consistent Daniel was, with only an eight point gap between his lowest and highest scores.

Of course, these figures don't tell us that much; the rounds in one game may have been much easier than the rounds in another. Below is the same table with my solo scores added for comparison; this time the table is sorted by the contestant's total score as a percentage of my total score. i.e., how close they were to my performance on the same games.

(It might be more objective to compare against the combined performance of David and Lily, but that makes judging the conundrums tricky; more importantly, they are too good. Suppose the contestant matches David with a seven-letter word. If the only one was GUANACO, then that's a fantastic result; if there were a couple including MAGPIES then it's a good result; and if there were many including TEARING then it is an average result. Loosely speaking, David is equally likely to find any of those, while I will be more towards the good end of the spectrum. Or so I would like to believe.)

							Total	Average	%
Sam Gaffney	51	81	67	64	64	55	382	63.67	97.45%
Me	63	57	68	77	64	61	390	65.00	97.45%
Kerin White	68	63	59	63	71	51	375	62.50	91.91%
Me	75	59	65	65	77	67	408	68.00	91.91%
Alan Nash	60	73	59	54	55	69	370	61.67	89.16%
Me	87	60	61	77	64	66	415	69.17	89.16%
Shaun Ellis			43	59	45	44	191	47.75	77.96%
Me			53	66	58	68	245	61.25	77.96%
Roman Turkiewicz	68	55	57	52	34		266	53.20	77.78%
Me	76	73	57	68	68		342	68.40	77.78%
Toby Baldwin	65	48	60	48	54	50	325	54.17	74.88%
Me	75	83	72	63	69	72	434	72.33	74.88%
Daniel Chua	51	59	56	53	53	52	324	54.00	72.81%
Me	77	76	74	76	76	66	445	74.17	72.81%
Sebastian Ham	49	55	65	40	56		265	53.00	71.24%
Me	68	74	85	69	76		372	74.40	71.24%

On this basis there is further separation. If we make the unrealistic assumption that my performance is a suitable baseline of comparison, then the top three are the same but Sam has a much clearer lead over Kerin and Alan than the solo scores alone would suggest; Shaun's standing has improved greatly -- my average score was the lowest in his games -- but is still well behind the top three; and Roman has moved up a little also.

(As a curiosity, I note that my solo scores during Daniel's run were almost smoother than his -- his opponent in the last game solved the conundrum too quickly, otherwise there might have been just a three point gap between lowest and highest -- as I erroneously thought was the case at first, due to not checking that game well enough.)

Of course, solo scores do not reflect the scoring of the game, and in particular the cost of finding a weak answer when a better one was relatively easy to find. Finding RATING instead of TEARING might only show up as a single point loss, instead of the seven point loss that it should be in practice. So in an attempt to take this into consideration, here are the head-to-head results (as recorded in this blog) between each finalist and myself. (Note: There are some slight differences between numbers here and those posted, due to ignoring the other contestant in those games.)

This table is sorted by the contestant's total score as a percentage of my total score; I also show the average per-game difference between their scores and mine. Positive values would reflect that the difference favours them; negative values indicate a corresponding advantage to me.

							Total	Avg Δ	%
Sam Gaffney	38	81	57	52	64	43	335	-1.00	98.24%
Me	63	41	51	77	54	55	341	-1.00	98.24%
Alan Nash	46	59	45	34	43	57	284	-18.17	72.26%
Me	87	60	55	67	64	60	393	-18.17	72.26%
Kerin White	38	37	35	49	35	39	233	-27.00	58.99%
Me	75	59	52	65	77	67	395	-27.00	58.99%
Shaun Ellis			33	35	33	13	114	-28.25	50.22%
Me			48	59	52	68	227	-28.25	50.22%
Roman Turkiewicz	30	21	40	32	11		134	-41.60	39.18%
Me	76	73	57	68	68		342	-41.60	39.18%
Sebastian Ham	14	43	23	27	38		145	-45.40	38.98%
Me	68	74	85	69	76		372	-45.40	38.98%
Toby Baldwin	23	21	33	31	36	14	158	-43.00	37.98%
Me	75	83	72	51	63	72	416	-43.00	37.98%
Daniel Chua	10	28	45	42	7	32	164	-44.67	37.96%
Me	77	69	68	76	76	66	432	-44.67	37.96%

On this metric the differences are massive. (Of course, it is of very dubious validity, but we'll see where it takes us anyway.) It's no surprise that Sam stays way on top, but the gap between him and Alan has stretched out greatly, as has that between Alan and Kerin. Shaun ends up pretty well separated from the remaining four, who are all very close to each other.

Based on this data, the first three quarter-finals should go with the higher-ranked seeds, but the fourth one has Shaun facing Toby. Toby is the higher seed (his total of 296 beating Shaun's total of 280), but Shaun's head-to-head percentage against me was much larger. Will this reflect what actually occurs? I guess we'll have to see!

Update: Commenter Victor suggested that the contestants be rated by their percentage of "maximums" -- times that they achieved the best possible results from the round. I have some doubts about this as a useful measure, as does commenter Mark, but here's a table anyway:

	Letters	Numbers	Conundrum	Average
Sam	10	12	3	4.17
Kerin	11	9	3	3.83
Alan	8	8	3	3.17
Daniel	7	10	2	3.17
Toby	9	4	4	2.83
Sebastian	4	7	2	2.60
Shaun	6	1	1	2.00
Roman	2	3	3	1.60

8 comments:

Victor said...: Hi Geoff,

Another metric you could try for comparison is percentage of maximums achieved, ie. in what percentage of rounds did the contestant find the best answer.

This may give a way of comparing contestants across shows of varying difficulty. In the long run (ie. over many shows) those who achieve a higher percentage of maximums should tend to win over those who achieve fewer.; 1 March 2012 at 10:51
Geoff Bailey said...: Victor: That's an interesting idea, and one I've seen applied to Countdown. I don't think it works that well for Letters and Numbers, though, in essence because we simply don't get enough maximums for statistical significance.

(Most of these remarks are restricted to letters rounds, as the numbers would be much more amenable to that kind of approach.)

One reason is that there's much less data to work with. A retiring champion on Letters and Numbers has played 30 letters rounds, while a Countdown octochamp has played 88.

Another is that the maximums just aren't reached that often, or at least that has been my impression; I'd have to check that. (Obviously nine-letter words are maximums, but there have been three of those from contestants all series.) Certainly programmatic searching has turned up enough obscure words that David has missed along the way.

I'd say much of this is attributable to the much lesser prominence that the show has here as opposed to the UK. (Very natural, given the differences in how long it has been running!) Combined with the population difference, we have many fewer people who are inclined to put in the practice until they can spot those maximums. (Unlike, say, Kirk Bevins, and the other rising set of Apterous players.); 1 March 2012 at 13:40
Mark said...: I don't like percentage of maximums. If a letters maximum is 9, then a player getting an 8 will be treated the same as a player getting a 5.

Also, I don't think it would necessarily be true that someone who achieves a higher percentage of maximums will tend to win over someone with a lower percentage, although often it will be true. Again using letters as an example, I think that a consistent player who usually gets 8 or 7 letter words and never gets full monties should usually beat a player who gets an occasional full monty (and therefore has a higher percentage of maximums) but also gets lots of 5 and 6 letter words.; 1 March 2012 at 17:17
Geoff Bailey said...: Mark: Your statement seems to be assuming that a full monty is always available, which is decidedly not true! Or perhaps you have misinterpreted the use of "maximums" in this context: It is a best-possible-result based on the letters (or numbers), not a nine-letter word (or exactly on target).

I still think it is a flawed metric, for the reasons you mention. However, I think it works somewhat for Countdown because of the much longer games and also that the top end players are much better -- in a finals series it will be fairly common for at least one contestant to get a maximum in each round, so the percentage of them matches well with the winner.; 1 March 2012 at 17:50
Mark said...: Geoff, yes you're right. I knew that "maximum" meant the best available, but I somehow got mixed up and wrote the second paragraph above with "maximum" meaning full monty. Silly me.; 1 March 2012 at 20:17
Victor said...: Ahh, on closer examination and actually looking over some of the blog posts here again, it does seem the analysis I proposed is - quite -unsuitable for the data at hand! As you noted, here is simply not enough data to work with.

I'll see if I can devise some robust method by the finals of next series :P; 1 March 2012 at 21:14
Geoff Bailey said...: No worries, Mark. It's a complicated thing to try.

I'll be interested in what you come up with, Victor. It's all a lost cause anyway as the targets shift greatly, mind you. With the exception of Roman, all of the finalists have had significant time to practice further by the time the finals came around.; 1 March 2012 at 21:23
Allan S said...: Why not define "maximum" to what David & Lily get each time...; 1 March 2012 at 22:46