Golf Ratings

The newest addition to the family of "Dolphin Ratings" are the golf ratings. My other ratings are primarily useful for team sports, where there is head-to-head competition. Although there is head-to-head competition in golf (i.e. match play), most professional golf tournaments feature stroke play in which each golfer is competing against the whole field. Consequently one must come up with an entirely different mechanism for golf ratings, which is described in this page.

As an aside, I realize that many computer raters regard their rating schemes as some sort of "black art" and are hesitant to divulge much or any of their formulas. This rating is based on standard and straightforward statistical principles, and thus has no "magic" ingredients. However, since it would be possible to reproduce my predictive rating given the information presented here, I will require that any use of the information on this page give credit to me.

Contents of this page:

Modeling Golf Scores
Modeling Golf Courses
Putting it Together

A bit of the nomenclature:

"X": used to define the variable X in text
X^Y: X to the Yth power
P(B|A): Probability that B will occur, based on A
sum(i=A,B) C: The sum of C, with variable "i" going from A to B
exp, ln, sqrt: exponential, natural log, and square root functions
CP(x) = integral(z=-inf,x) exp(-0.5*z^2)/sqrt(2*pi) dz

I will warn you at the start that this is a mathematically-intensive description. In layman's terms, my goal is to identify the quality of a team's performance in each game, and use that to predict the odds that it will win a future game. The end result is a very accurate predictive system, which is good if you want to guess the results of a future game. However, the system sees very little difference between a close loss and close win, thus making rather pooor postseason selections.

Modeling Golf Scores

In many ways, one has to approach golf the same way one does baseball. The number of at-bats per game is not fixed, but rather depends on the success a team has at the plate. A team that gets more hits will send more men to the plate. Golf is the opposite, as a player's round is prolonged by each bad shot.

There are many ways to treat this from a ratings standpoint; however given the easily-available data (number of strokes per round) I opt for the simple approach outlined here.

Suppose a player is sufficiently good that he will make a "perfect" shot, on average, "p" percent of the time. By "perfect" shot I mean one such that the player would expect to birdie a par 3 or 4 hole with all perfect shots, or eagle a par 5. Note that I treat a par 5 as a difficult par 4 for two reasons. One is that many players routinely reach these holes in 2 shots, meaning that they play like par 4's. The other is that the USGA frequently changes par 5s into 4s to protect par; it makes no sense to rate the players any differently in such a case.

Accepting this definition, the likelihood that such a player would take "N" shots to complete his round on a course whose par equals "s+18" is given by the binomial distribution:

   P(N|s,p) = (N-1)! / (s-1)! / (N-s)! * p^s * (1-p)^(N-s).

In English, this is the odds that "N" shots result in "s" perfect shots given a probability "p" of a perfect shot.

The statisticians will certainly note several simplifications in this model; in the interest of openness I enumerate them here:

Not every shot is equally difficult. Some shots are innately more difficult than others.
The success of shot outcomes is not merely "perfect" or "horrid"; most imperfect shots result in an easier subsequent shot, so that the odds of two consecutive failed shots is much less than p^2.
Players tend to have strengths and weaknesses, which means that the odds of any one shot being made is not the same as any other shot. For players whose greatest assets/weaknesses involve putting, driving, or iron play, this will largely be averaged out. However, for players particularly strong in recovery shots or the short game, they tend to use these shots after bad shots.
Players can occasionally get two "perfect" shots with one, in the case of holing approach shots or driving the green on a very short par 4. This is very rare, so the statistics are not overly harmed by such an assumption.

Interestingly, these factors have minimal effects on the final rankings. The only noticeable difference is that the scatter in a player's ranking from round to round is less than what you might expect from the statistical model.

Modeling Golf Courses

If every PGA tour event were held at the same course in the same weather, one could stop here. It would be necessary only to find each player's shot probability ("p") that best describes his performance. This is not true, of course. Tournaments are rarely held at the same venue twice, and weather can change from day to day. Consequently, one needs to have a method of adjusting player shot probabilities for the course.

The simplest way of accomplishing this is to rate players on a linear scale from -inf to inf, and courses+days on the same scale. The player's shot probability for a given round is thus equal to some function of the sum of the player rating plus the course rating. There are other ways to do this, but since we can choose any function we desire, this will suffice.

Based on trial and error, I have found the best function to be:

   p = CP(r+c),

where "r" is the player's rating and "c" is the course difficulty.

Again, this is an approximation. Certain players perform better on certain courses than they do on others for reasons other than sheer difficulty.

Putting it Together

Building a set of player and course ratings is fairly simple from this point onwards; one searches for the set of player and course ratings that gives the highest cumulative probability of all players shooting all scores.

It should be noted that there is one free parameter too many, since one could increase all player ratings by any fixed amount, decrease all course ratings equally, and leave the probabilities unchanged. I opt to eliminate this problem by forcing the average of the course ratings to be zero.

The end result is that I obtain ratings for each player and course difficulty ratings from every round of every tournament. To translate these into more practical values, I provide the following stats on the ratings pages:

For players, the odds of making a successful shot on an average course. This tends to be around 70\% for average players and 73\% for the best players.
For players, the typical number of strokes it would take to play an average course. This tends to be around 71 for average players and 68 for the best players.
For courses/rounds, the odds of an average player making a successful shot on that course on that day. Aside from tournaments with dramatic weather changes, this tends to be consistent from round to round. Note that the difficulty is affected by the number of par 5s, which are treated as par 4s in my ratings.
For courses/rounds, the typical number of strokes an average player would take. This generally is better than the field average for rounds played before the cut, and is usually worse for rounds after the cut or all rounds of elite events.

The mathematical uncertainty in a player's shot rating from a single round is roughly 0.05, corresponding to a scatter in scores of 4.5 strokes. This uncertainty drops as the square root of the number of rounds played, meaning that after four rounds the uncertainties are halved.

In practice, however, the scatter is much lower because of the correlation between the difficulty of consecutive shots and the fact that shots aren't limited to "perfect" and "perfectly horrible". Typical scatter in a player's shot ranking from round to round is 0.03, meaning that a player who has played 36 rounds (9 tournaments) has an uncertainty of about 0.005 in his shot rating and 0.5 in his score rating.

Return to ratings main page

Note: if you use any of the facts, equations, or mathematical principles on this page, you must give me credit.