the collected works of skrumgaer
table of contents:
i. the test against pure luck (TAPL)
ii. why the TAPL was changed
iii. the quality-of-opponent adjustment to the TAPL
iv. other tests
v. the test against zero datum (TAZD)
vi. the convolution integral (CI)
vii. guts per game (GPG)
viii. a rough-and-ready formula for spotting pga?s
ix. points per root games played (PPRG)
x. baseball-style standings (BSS) for kdice
i. the test against pure luck (TAPL)
The Test Against Pure Luck (TAPL) is a measure of skill that eliminates the random fluctuations that are characteristic of the game-by-game scoring system and rewards players who play more games. The TAPL can be calculated from data that Ryan gives us on our profiles. This method of scoring has no official endorsement from Ryan.
In math jargon, the TAPL is a Pearson?s chi-square goodness of fit test against a uniform distribution. For background, go to search Wikipedia for ?goodness of fit?.
If luck and nothing else governed the playing of kdice, a player would be equally likely to place first, second, third, fourth, fifth, sixth, or seventh in a particular game, and with a sufficient number of games played, the number of first place finishes, second place finishes, etc., for that player would be approximately the same. With skill, the distribution of games would be less uniform.
The TAPL is the normalized sum of seven numbers. The first number is found by taking the difference between the expected number of first-place finishes and the actual number of first-place finishes, given the number of games played, squaring this difference and dividing it by the expected number of first place finishes. The second number is obtained by doing the same calculation for the second-place finishes. The third number is for the third-place finishes, and so on for all seven levels of finishes. The sum is then divided by the square root of the number of games played to get a result expressed in standard deviations. The mean of the chi square goes as the number of games played, and its variance goes as the number of games played, so its standard deviation goes as the square root of the number of games played. The standard deviation is the commonly used measure for comparing scores (for example, the SAT scores are designed so that one standard deviation is 100 points) so I will use the same measure (mean divided by standard deviation) for the TAPL. The result thus obtained is multiplied by 1000 to yield a score that is comparable in magnitude to the old Elo scores used earlier in kdice.
For example, suppose I have played 35 games and I have 7 first place finishes, 6 second place finishes, 5 third place finishes, 4 fourth place finishes, 4 fifth place finishes, 4 sixth place finishes, and 5 seventh-place finishes. The expected number of finishes at each level is 5. My TAPL would be
TAPL = (7 ? 5)squared/5 + (6 ? 5)squared/5 + (5 ? 5)squared/5 + (4 ? 5)squared/5 + (4 ? 5)squared/5 + (4 ? 5)squared/5 + (5 ? 5)squared/5 = 1.60. Divide the 160 by 5.92 (the square root of 35, then multiply by 1000 to get a TAPL of 270.
Ryan gives us our number of places expressed as percentages instead of games played. The example above expressed in percentages would be 20%-17%-14%-11%-11%-11%-14%.
The sum of the squares can be calculated directly from the percentages, rather than the number of games, and the resultant is multiplied by the square root of number of games played and by 1000 to get the TAPL. The square root is a multiple instead of a divisor when you use the percentages because the number of games is factored out of the percentages, and the result is divided by the square root of the number of games so the overall effect is a multiple of the square root of the number of games.
To summarize, the TAPL can be calculated either by
((Sum of (expected games ? actual games)squared/expected games)/square root of games)x1000
or
(Sum of (expected %-actual %)squared/expected%) x square root of games x 1000.
The TAPL is easily calculated by lifting the numbers directly off your profile and loading them into a spreadsheet. Use the Copy command on the profile page and use Paste Special (Unicode Text or Plain Text, but not HTML) in the spreadsheet.
**NOTE. Ryan?s numbers are formatted as percentages and they are decimals in the spreadsheet. If you express your percentages as integers instead of decimals, you need to make a x10 multiplication instead of a x 1000 multiplication at the end to get the TAPL.**
***NOTE. If the expected number of games in a category is less than 5 (which is to say that the total number of games played is less than 35), the TAPL will give an overstated result unless Yates? Correction is applied. To apply Yates? Correction, subtract half a game (not half a percentage point) from the absolute value of expected games less actual games before squaring. The percentage equivalent of half a game depends on how many games you have played. For example, if you have played 10 games, half a game would be 5%. After you have figured out the percentage value of half a game, set any percentage that differs from 14% by less than that value to 14%. For the remainder of the percentages, subtract the value from your percentages over 14% and add that value to your percentages below 14%. With these adjusted percentages, calculate the corrected TAPL.***
ii. why the TAPL was changed
The original TAPL was directly proportional to the number of games played. This resulted in too wide a distribution of scores among players with different numbers of games played. The difference between two TAPL?s, or a TAPL and a norm, should be expressed in standard deviations, and, as stated in section i above, the standard deviation is achieved when the square root of the number of games is used.
iii. quality-of-opponent adjustment of the TAPL
An objection that has been raised about the TAPL is that it doesn?t take into account the skill of opponents, that is, a win against better opponents should count more than a win against not-so-good opponents. Ryan doesn?t provide us a breakdown of wins according to quality of opponent, but if you keep track of how often you play at a particular level of table, you can make a rough quality-of-opponent adjustment to your test against pure luck score in the manner described below.
If outcome of a game is due to pure luck (as if each player was randomly assigned a number from 1 to 7 and whoever drew the 1 was first, whoever drew the 2 was second, and so on), the scores would be based purely on luck. Those with higher scores would advance to a higher table, but the opponents at the higher table would also have arrived there as a result of pure luck. Therefore, for all games played at that table, your distribution of wins would resolve to 14-14-14-14-14-14-14, just as at the lower table. By extension of the reasoning, your pure luck distribution would be 14-14-14-14-14-14-14 at all levels of table at which you play, the only difference being that the number of games you play at the higher tables would be less because you would be eligible to play there less often.
The Quality of Opponent Adjustment (QOOA) is a comparison of how many times you play at a particular level of table compared to how many times you would play at that level if your score was a matter of pure luck.
The distribution of the number of tables of each level follow approximately the Golden Ratio (reference
http://en.wikipedia.org/wiki/Golden_ratio), known from the time of the ancient Greeks. This ratio is 1.618034, and the conjugate and inverse of it is 0.618034. The number of kdice tables varies from time to time, but the ratio of the number of zero-limit tables to all higher tables, and the ratio of 100-limit tables to all higher tables, and the ratio of 500 limit tables to all higher tables, etc, seem to approximate the golden ratio. There is something aesthetically pleasing about this ratio for the tables, because regardless of your level of play, the proportion of the time that you can reach the next higher table is constant.
Therefore I will set up a QOOA that does not have to be re-calibrated when new levels of tables are added. It is designed for six levels of tables, which is sufficient when the proportion of games played at each level is rounded off to a whole percent. The QOOA is
QOOA = p0/371 + p1/142 + p2/54 + p3/21 + p4/08 + p5/03
where p0 is the percentage of games you have played at the zero limit tables, expressed as an integer, p1 for the next higher level of table, p2 next higher level, and so on. The table limits are to be disregarded; what is important is the level of table. The level of table is to be determined at the time you sit down. For example, if you sit down at a 500 table when there are zero-limit and 100-limit tables available, that particular game is to be counted as part of p2. If you sit at a 500 table and a 100 table is open, that game is counted in p2, but if you sit at a 500 table and there are no 100 tables open, that game would be counted in p1. If Ryan adds a new level of table or changes the limits of a table you do not need to go back and recompute your current QOOA because the level of table for each game was ascertained when you sat down. As you play more games your QOOA will change if the percentage distribution among the tables changes.
Some sample calculations:
Suppose you have played 50% of your games at the zero limit tables and 50% at the 100-limit tables. Your QOOA would be 50/371 + 50/142 or 0.49. Suppose you have played 20% of your games at the zero limit tables and 80% of your games at the 100-limit. Your QOOA would be 20/371 + 80/142 or 0.62. Your QOOA increases when you play more at the higher level tables. The QOOA multiplied by the TAPL gives you your TAPL adjusted for quality of opponents.
Incidentally, the denominators in the QOOA formula closely approximate alternative numbers from the Fibonacci series 1, 1, 2, 3, 5, 8, etc. The Fibonacci numbers and golden ratio are closely linked.
iv. other tests
Plain luck is not very tough to beat; even negative skill can do it. So I offer some other tests as well:
Test Against Own Worst Rating (TAOWR): instead of using 14-14-14-14-14-14-14 as the expected score, use the profile from your lowest rating.
Test Against Own Last Zero (TAOLZ). As your rating goes up and down, you will fall back to zero periodically. Each time you are at zero. update the expected percentages with your current profile. This will toughen the challenge each time you do it. You can also have Test Against Own Last 1000, 2000, etc.
Test Against Known Bad Example (TAKBE). For this, use as the expected scores the profile of a player who is known not to have good kdice skills. For instance, you may use this profile from a player whom I won?t name: 03-03-06-20-28-21-17.
v. the Test Against Zero Datum (TAZD).
When Ryan began publishing the entire roster of players (roughly 8,000 in number) in the leaderboard, I had sufficient data to estimate the typical percentage profile for a zero-score player. I took a sample of roughly 1,000 players and came up with this profile:
10.0 ? 10.0 ? 10.3 ? 12.9 ? 15.5 ? 17.2 ? 19.4
The Test Against Zero Datum is like the TAPL except that it uses the numbers I have just given instead of 14 across the board. Thus the TAZD gives more weight to positive skill than does the TAPL.
vi. the convolution integral
?Convolution integral? is a fancy name for the expected payoff per point of buy-in for a player based on his percentage profile and the payoff for each place of finish at the table. For example, the expected payoff for a player who plays only at the zero-level (60-point buy-in) tables would be
CI = (60*p1 + 42*p2 + 35*p3 + 0*p4 -35*p5 ? 42*p6 ? 60*p7)/60.
The CI is exact for the 0 and 100 level tables, but it is inexact for the higher level tables because of dom. My estimates of CI for the higher level tables suggest that dom is not significant, so for now I make no correction for dom and use the CI for all levels of table. The value of the CI ranges from +1.00 to -1.00. Top players have a CI of +0.25 or so, though with the advent of tourney play, some players can reach the top 100 even with negative CI?s. That is because the player?s percentage profile includes only non-tourney games.
vii. guts per game (GPG)
A player?s guts per game is average buy-in per game. This statistic is not provided by Ryan, but it can be estimated by taking the ratio of a player?s earned points per game (PPG) to his convolution integral (CI). The GPG rewards players who risk their points at higher tables. Since the CI has some error and can lead to division-by-zero error, the CI is capped at 6000. Players who play only at the lowest level table will have GPG of 60.
viii. a rough-and-ready formula for spotting pga?s
Pga?s can be spotted from their stats because they play each other more often than one would expect from probabilities. What I mean by a pga is a ?pre-game affiliate? which could be a pre-game ally, a pre-game enemy, or a pre-game secret admirer. To further separate these three we would need stats on how often they attack each other. For now I will stick with the general category of pre-game affiliate.
Suppose you want to know if players x and y have some sort of affiliation. Suppose that player x has sat down at a table at which player y has not appeared. What would be the probability that player y would show up and sit at that table? It would be the number of empty seats left, divided by the number of players watching at all tables of that level (assuming they are all eligible to sit in), times the probability that player y is currently among the watchers. For each successive game played by player x (considering for the moment only games in which player y has not sat down before player x), the expected number of games that player y would show up (if he is watching) would depend on the number of empty seats and the number of watchers at each game, but for a reasonable number of games (say twenty), the law of averages would set in and we could say that the expected number of times that y would show up would be the probability that y is watching times the number of games times the average number of empty seats when x sits down divided by the average number of watchers. It turns out that the expected number of times that y shows up is governed by the Poisson distribution which for a reasonable number of games has a mean and variance similar to those of the normal distribution. So we can develop a statistic that determines whether player y shows up significantly more often than would be the case if the distribution of players at the tables were purely random. I will call this statistic the Passive Participation Coefficient (PPC) and its formula is
PPC = f * square root (wn/pe)
where f is the fraction of the games played by x in which y did not sit down before x but shows up later, w is the average number of watchers, n is the number of games played by x in which y did not sit down before (whether or not y shows up), p is the probability that y is watching, and e is the average number of empty seats. If the value of the PPC is 3 or more, player y has an affiliation with player x.
I use the word ?passive? to describe this statistic because no assumption is made that player x is initiating any attempt to form an affiliation. This is merely player y?s affiliation to player x. To measure player x?s affiliation for player y, we would have to use different data. We would use all the games played by y in which x did not sit down earlier and calculate player y?s PPC. Player y?s PPC is player x?s APC, which means the Active Participation Coefficient (APC). The same number serves two functions. Player x?s PPC with player y is player y?s APC with player x. Since we are looking for intent, it is the APC that we use to determine whether a player is a pga.
Before you rush out and use this formula to try to nail some pga?s, keep a few points in mind. First, you don?t know if the affiliation is friendly or hostile. Second, the value of p is hard to come by. We know how many watchers there are, but we don?t know who they are. To identify watchers, you would have to have spies at all the tables, all the time, when either x or y are playing. Incidentally, if you are spying, you have to subtract yourself from the number of watchers! If you don?t have a good idea of what p is, you will have to give x and y the benefit of the doubt and make p equal to one. Of course that means a smaller APC and a lesser likelihood of your having found a pga. Third, you need a good number of games?at least twenty?to have a reasonably reliable number.
ix. points per root games played (PPRG)
Points per square root of games played is a compromise between total points earned and points earned per game as a measure of player skill. Under the zero-minimum point system now in effect, the expected value of a random-walk score (no skill) goes as the square root of the number of games. So a measure of skill is the total points divided by the square root of the number of games played.
x. baseball-style standings (BSS)
In baseball, a team that loses a game is said to fall a game behind the team that wins the game, so games behind is the number of more games that the trailing team would have to win (compared to the leader) to catch up. Teams that miss games or are not playing that day lose half a game against winners and gain half a game against losers.
The adaption for kice is to provide for degrees of loss. The player who finishes seventh in a game loses a full game against the winner. The player who finishes sixth loses 5/6 of a game against the winner, the player who finishes fourth loses 4/6 of a game, and so on. The total games behind assessed per contest is 3 ½. Players who have not played games are assessed one-half of a game per game not played compared to the player who has played the most games.
The BSS places more emphasis on number of games played than does the TAPL because the latter uses the square root of games.
(last revised 25 January 2009)