The Pi-Rate Ratings

December 7, 2014

Introducing PiRate Basketball Ratings

Filed under: College Basketball — Tags: , , , , , — piratings @ 11:15 am

Today, the PiRate Ratings foray back into the college basketball game. Every year, usually starting after the Super Bowl, we begin to devote full time to March Madness using unique formulae to determine which teams will advance in the NCAA Tournament and which teams are pretenders.

For many years, our method was very accurate. We discovered sleepers like George Mason, Virginia Commonwealth, and Butler, when these teams made their famous runs deep into the dance. We isolated teams like Georgetown and Vanderbilt as highly vulnerable of becoming upset victims more than one time, and more than one time the Hoyas and Commodores lost to double-digit seeds in the Round of 64.

This method chose eight NCAA Champions in a period of 11 years, but in the last three years, the game seems to have changed just enough so that the formula stopped being as effective. We knew we had to come up with a different formula, and for several months, we tested certain statistical data trying to figure out how to adjust our numbers.

In the end, we chose to totally scrap the old formula and start from scratch. A few years ago, our founder, a mathematical nerd for sure, read an interesting book, at least interesting for him. This book, Basketball on Paper, written by Dean Oliver introduced him to “The Four Factors” that determine what determines the outcome of basketball games.

Oliver used the same statistical parts that any basketball fan would use, but the “All-Star Mathlete” put clout behind the obvious statistics by determining how important each statistical part is. Here is what he determined:

1. Field Goal Accuracy and Defense of the same: 40%
2. Rebounding: 20%
3. Prevention of Turnovers and Ability to Force Turnovers: 25%
4. Free Throws—both getting to the line and making them: 15%

These four factors were tested by Oliver in the NBA over the course of multiple seasons, but it was shown subsequently to be accurate for college basketball as well with a minor adjustment.

Last season, we began trying to take these Four Factors and create an algorithm that “spit out” a pointspread for college games. Obviously, there are two more factors that must be included in college basketball predictions—strength of schedule and home court advantage (also visiting team disadvantage, since some teams play worse on the road than others, while a Kentucky might actually go on the road and receive points if 5,000 Blue Misters get into the gym.)

We are big fans of backtesting. It has shown positive results in stock picking, and it has shown positive results in picking football teams against the spread. You can test as many formulae as you can and find certain tendencies that lead you to higher accuracy.

After months of backtesting over the summer, we began to find three formulae that started to come close to actual pointspreads in past NCAA Tournaments. While we are not going to announce that we have cracked the code and have found a surefire method to become wealthy, we have found what we believe is our least amount of errors when using the Four Factors.

If you know a little about statistics, you must be familiar with means and standard deviations. A mean is simply the average. If you have the numbers 2, 3, 4, 7, and 9, the mean or average of these numbers is 5. The standard deviation is a little more involved, but it basically determines the degree of variation the numbers are from the mean. In the above sample, the standard deviation is 2.9, or on average, the numbers in the sample are 2.9 away from the norm.

When the standard deviation of something is high, then the mean is not all that important in something like picking sides against a basketball pointspread. The lower the standard deviation goes, the more accurate the formula will be. For weeks, we sought a formula that produces the lowest possible standard deviation.

In the end, we found three separate formulae that at certain points in time in the NCAA past were each the lowest standard deviation formula. Thus, we will go with three ratings this year as an experiment to determine winners in college basketball games.

Because there are five of us working full-time jobs doing something else, and because figuring the Four Factors for every NCAA team is something that must be done by typing in an entirely new set of statistics after every game, we cannot possibly pick every college basketball game. Additionally, until every college team has played about 10-12 games, these formulae standard deviations are wildly too high.

Thus, beginning in January, we will start to issue our ratings and picks for select Atlantic Coast Conference, Big Ten Conference, Big 12 Conference, Pac-12 Conference, and Southeastern Conference games, as well as other top games including teams like Gonzaga and Wichita State.

Once the season ends, we will select all the March Madness games. Remember, this is strictly an experimental exercise this year as we put these formulae into use in real time.

Here in a nutshell are the Four Factors plus our added strength of schedule and home court advantage. Each set of data is used both in an offensive and defensive subset.

1. Effective Field Goal Percentage
EFG% adds three-point shooting to the equation of accuracy. If you make one three-point shot in three attempts, you have made one point per shot attempt. If you make one layup and miss one short jumper, you have also made one point per shot attempt.

Formula: [Field Goals Made+ (0.5*Three Point Shots Made)]/Field Goals Attempted

Examples
Kentucky through 8 games: FG% 222-477 3Pt FG: 41
[222+(.5*41)]/477 = .508 or 50.8%

Kentucky Defensively: FG% 116-412 3Pt FG: 36
[116+(.5*36)]/412 = .325 or 32.5%
North Carolina through 7 games: FG% 188-444 3Pt FG: 34
[188+(.5*34)]/444 = .462 or 46.2%

North Carolina Defensively: FG% 142-423 3Pt FG: 41
[142+(.5*41)]/423 = .384 or 38.4%

Kentucky has a big advantage here on the surface before you look at who the two teams played and where these games were played.

2. Rebounding Rate
Getting offensive rebounds has always been a major factor in basketball success. Offensive rebounds tend to produce higher percentage shots, like tip-ins. Preventing the opponent from getting offensive rebounds is obviously equally important. This formula calculates the rate at which a team gets an offensive rebound or prevents the opponent from getting an offensive rebound.

Formula: Offensive Rebounds/(Offensive Rebounds + Opponents’ Defensive Rebounds)

Examples
Kentucky: Offensive Rebounds: 125 Opponents’ Defensive Rebounds: 148
125/(125+148) = .458 or 45.8%

Kentucky Defensively: Opponents’ Offensive Rebounds: 100 Kentucky’s Defensive Rebounds: 214
100/(100+214) = .318 or 31.8%

North Carolina: Offensive Rebounds: 120 Opponents’ Defensive Rebounds: 163
120/(120+163) = .424 or 42.4%

North Carolina Defensively: Opponents’ Offensive Rebounds: 115 North Carolina Defensive Rebounds: 190
115/(115+190) = .377 or 37.7%

Once again, Kentucky has an advantage here all things being equal.

3. Turnover Rate
Turnover rate or turnover percentage is simply the amount of turnovers created per 100 possessions, or defensively, it is the number of turnovers forced per 100 possessions. Obviously, this adds another factor that must be calculated—possessions. There are a couple of sites online that list the average number of possessions per game for each college team, but you can approximate this number rather accurately.

Calculating possessions for each team: FGA+(.475*FTA)-OR+TO
A possession ends with: a field goal attempt that is made or missed and rebounded by the opponent; a free throw that is made or missed and rebounded by the opponent, or a turnover. Because some free throws are the front end of a two-shot foul, not all free throws are counted, thus we use the constant of .475 to multiply (thanks to Mr. Ken Pomeroy at http://www.kenpom.com for this bit of data).

Remember that possessions per game can be affected by overtime games, where the game is more than 40 minutes long. For TO rate, this does not matter, but it will when we put pace to the equation in our algorithm.

Formula: TO/100 Possessions

Examples
Kentucky Possessions per game: FGA 477, FTA 202, OR 125 TO 87 Overtime minutes: 0
477+(.475*202)-125+87=534.95 =535 in 8 games, this averages to 66.9 possessions per game
Calculating this formula defensively, UK’s opponents show 65.9 possessions per game, which can be attributed to UK winning the tip and finishing the game with the ball more than average.

Turnover Rate: 87/535*100=16.3%
Defensive Turnover Rate: 148/527*100=28.1%

North Carolina Possessions per game: FGA 444 FTA 193 OR 120 TO 90 Overtime minutes: 0
444+(.475*193)-120+90=505.7 =506 in 7 games, this averages to 72.3 possessions per game
UNC’s Opponents’ Possessions=498 or 71.1 possessions per game

Turnover Rate: 90/506*100=17.8%
Defensive Turnover Rate: 112/498*100=22.5%

Once again, Kentucky enjoys the advantage in these two examples.

4. Free Throw Rate
This calculation has multiple mathematical geniuses in a little bit of disagreement, as there are at least three different philosophies on how to calculate this stat. The stat measures both how frequent a team can get to the foul line and how accurate they shoot foul shots, but not all math wizards agree on the proper method.

Oliver, in his original book, set FT Rate at: Free Throws Attempted/Field Goals Attempted. He posited that attempting free throws was all that mattered and getting to the line satisfied this criteria, as it placed the opposing team in foul trouble as fouls added up.

A second school of thought supported the formula as: Free Throws Made/Field Goals Attempted, believing that a made free throw added the obvious point accumulation while still including the fact that a foul was committed by the opponent.

Yet a third school of thought developed later that believed that free throws made per possession was more accurate in determining how important free throws were to the game. In mathematical tests, this metric actually proved to be a tad more accurate, but also a tad more time-consuming.

Accuracy is what we are looking for, so we will use the third option of FT Made per possession and multiply it by 100 to get a rate.

Formula: FT Made/100 Possessions

Examples
Kentucky: FT Made 131 Possessions 535
131/535*100= 24.5%

Kentucky Opponents: FT Made 95 Possessions 527
95/527*100=18.0%

North Carolina: FT Made 133 Possessions 506
133/506*100=26.3%

North Carolina’s Opponents: FT Made 120 Possessions 498
120/498*100=24.1%

Kentucky enjoys a slight advantage in this statistic.

And The Rest
Our formula for determining Strength of Schedule as it applies to pointspreads and our formula for determining home court advantage (and visiting team disadvantage) has not changed. How we figure these two sets of data would take much too long to explain, especially the home/visitor advantages, since there are 16 different variables that are possible, and in the end 90% of games will be end up within a two-point swing.

Putting It All Together
Once we have the “Four Factors” calculated, and we have determined how many points to alter the final product based on schedule strength and where the game will be played, we are ready to construct a pointspread.

As previously mentioned, we ended up with three separate algorithms, each of which at some point in the 21st Century past proved to be more accurate than all others tested.

We will call these formulae: PiRate Red, PiRate White, and PiRate Blue, because there is no distinct numerical statistic that really dominates any of the trio. It is simply a rearranging of numbers, so we cannot call one rating a mean rating, another a bias rating, and the other the regular rating like we do in football.

Unlike football, where we must record the scores and stats of every game in order to calculate ratings for the entire season, this rating only requires that we have up to date cumulative statistics and whichever SOS rating we choose to use.

Using our example, since North Carolina visits Kentucky next Saturday, our three ratings show the Wildcats to be favored today by 12.7, 11.9, and 16.3 points in the three algorithms. Of course, both teams play other games prior to their meeting in Lexington, so these stats would be a little different by Saturday morning.

Since this is just an example, we will use this one for you to refer to. Hopefully, it will prove to be somewhat accurate, and the Wildcats will win by about 14 points.

Look for our select basketball predictions to begin in January. In February, we will renew our weekly look at March Madness projections. Last year, we correctly picked 67 of the 68 teams on Selection Sunday morning.

Blog at WordPress.com.