In a game analysis section, I showed how likely one is to get a particular score for a game between two opponents whose location-adjusted ranking difference is dr. It seems we're almost there -- just calculate the G(sa,sb) values for all games, and run with it! Not quite. Bayes' theorem indeed allows for the inversion of P(sa,sb|dr) to P(dr|sa,sb), but there is an added term. Expanding dr to equal the difference between the two team ratings (a and b) plus the home field factor (h, assuming A is the home team), the correct equation for the inversion is:
P(a,b,h|sa,sb) = P(sa,sb|a,b,h) P(a) P(b) P(h).
"But wait", you ask, "aren't computer ratings supposed to be unbiased? How can you justify built-in prejudices to the rankings?" Well, I justify it because Bayes' theorem demands it. If no team in the history of sport has ever achieved a ranking of 1000 sigma above the league average, the odds of one doing it now are quite slim. The trick is to define the prior in such a way as to not bias the ranking in favor of or against any team.
The way I address this problem is to use the same prior for all teams. (In college football, I use different priors for I-A, I-AA, and so on, but still rank all teams within I-A using the same prior.) This obeys the requirement that a prior be used, while not rating Ohio State better than Northern Illinois merely on the basis that Ohio State has historically been a better team. To calculate the prior mean and width, I first calculate rankings with no prior, estimate the mean and inherent spread in the rankings (equal to the standard deviation minus the uncertainties in quadrature), make that the prior, and recompute the rankings. If the mean is m and the standard deviation is d, the prior P(a) equals:
P(a) = NP(-0.5*((a-m)/d)^2)
OK, now we have all the pieces. The probability of the entire season being produced given a set of team ratings equals the product of the probabilities of each game and all the priors. In other words:
P(r1,r2,...rn) = prod(i=games) P(sai,sbi|rai,rbi,h) * prod(i=teams) P(ri) * P(h)
Most rating systems stop here, compute the maximum likelihood solution for the ratings and homefield factor, and call it a ranking. It's possible to do much better, however. Recall that all of the probabilities are NP(x) functions, which can be trivially multiplied as NP(x)*NP(y)*NP(z) = NP(x+y+z). This means that we can rewrite the above equation as:
-2 lnP = sum(i=games) (rai-rbi+h-G(sai,sbi))^2 + sum(i=teams) (ri-m)^2/d^2 + (h-hm)^2/dm^2
Multiplying all of this out, one finds that -2 lnP is a second-order polynomial and can be written as:
-2 lnP = C + [ sum(i=teams) sum(j=teams) Mij ri rj ] + [ sum(i=teams) Vi ri ]
-2 lnP = C + D + Mkk rk^2 + [Vk + sum(i!=k) Mik ri] rk
-2 lnP = C + D - 0.25*[Vk + sum(i!=k) Mik ri]^2/Mkk + Mkk (rk + 0.5*[Vk + sum(i!=k) Mik ri]/Mkk)^2
-2 lnP = A r^2 + B r + C
P = NP( (r-B/2A) /sqrt(A) )
Repeating this process for all teams in the rating, one can arrive at a statistically-accurate rating, including uncertainty, that accounts for all interdependencies between the individual team ratings.
This strength rating is not shown anywhere on my ranking pages, but underlies the standard, median likelihood, and predictive rankings.
Note: if you use any of the facts, equations, or mathematical principles on this page, you must give me credit.