As an aside, I realize that many computer raters regard their rating schemes as some sort of "black art" and are hesitant to divulge much or any of their formulas. This rating is based on standard and straightforward statistical principles, and thus has no "magic" ingredients. However, since it would be possible to reproduce my predictive rating given the information presented here, I will require that any use of the information on this page give credit to me.
Contents of this page:
A bit of the nomenclature:
I will warn you at the start that this is a mathematically-intensive description. In layman's terms, my goal is to identify the quality of a team's performance in each game, and use that to predict the odds that it will win a future game. The end result is a very accurate predictive system, which is good if you want to guess the results of a future game. However, the system sees very little difference between a close loss and close win, thus making rather pooor postseason selections.
In many ways, one has to approach golf the same way one does baseball. The number of at-bats per game is not fixed, but rather depends on the success a team has at the plate. A team that gets more hits will send more men to the plate. Golf is the opposite, as a player's round is prolonged by each bad shot.
There are many ways to treat this from a ratings standpoint; however given the easily-available data (number of strokes per round) I opt for the simple approach outlined here.
Suppose a player is sufficiently good that he will make a "perfect" shot, on average, "p" percent of the time. By "perfect" shot I mean one such that the player would expect to birdie a par 3 or 4 hole with all perfect shots, or eagle a par 5. Note that I treat a par 5 as a difficult par 4 for two reasons. One is that many players routinely reach these holes in 2 shots, meaning that they play like par 4's. The other is that the USGA frequently changes par 5s into 4s to protect par; it makes no sense to rate the players any differently in such a case.
Accepting this definition, the likelihood that such a player would take "N" shots to complete his round on a course whose par equals "s+18" is given by the binomial distribution:
P(N|s,p) = (N-1)! / (s-1)! / (N-s)! * p^s * (1-p)^(N-s).
The statisticians will certainly note several simplifications in this model; in the interest of openness I enumerate them here:
Interestingly, these factors have minimal effects on the final rankings. The only noticeable difference is that the scatter in a player's ranking from round to round is less than what you might expect from the statistical model.
If every PGA tour event were held at the same course in the same weather, one could stop here. It would be necessary only to find each player's shot probability ("p") that best describes his performance. This is not true, of course. Tournaments are rarely held at the same venue twice, and weather can change from day to day. Consequently, one needs to have a method of adjusting player shot probabilities for the course.
The simplest way of accomplishing this is to rate players on a linear scale from -inf to inf, and courses+days on the same scale. The player's shot probability for a given round is thus equal to some function of the sum of the player rating plus the course rating. There are other ways to do this, but since we can choose any function we desire, this will suffice.
Based on trial and error, I have found the best function to be:
p = CP(r+c),
Again, this is an approximation. Certain players perform better on certain courses than they do on others for reasons other than sheer difficulty.
Building a set of player and course ratings is fairly simple from this point onwards; one searches for the set of player and course ratings that gives the highest cumulative probability of all players shooting all scores.
It should be noted that there is one free parameter too many, since one could increase all player ratings by any fixed amount, decrease all course ratings equally, and leave the probabilities unchanged. I opt to eliminate this problem by forcing the average of the course ratings to be zero.
The end result is that I obtain ratings for each player and course difficulty ratings from every round of every tournament. To translate these into more practical values, I provide the following stats on the ratings pages:
In practice, however, the scatter is much lower because of the correlation between the difficulty of consecutive shots and the fact that shots aren't limited to "perfect" and "perfectly horrible". Typical scatter in a player's shot ranking from round to round is 0.03, meaning that a player who has played 36 rounds (9 tournaments) has an uncertainty of about 0.005 in his shot rating and 0.5 in his score rating.
Note: if you use any of the facts, equations, or mathematical principles on this page, you must give me credit.