I’m not familiar with Pablo rankings. What is this about?
Pablo rankings were created almost 20 years go for NCAA Women’s volleyball. Pablo was a poster at VolleyTalk who came up with this ranking system. There is a (badly outdated) FAQ for the process at www.RichKern.com for more information.
Pablo is a based on a probabilistic model. Consider, for example, if you could clone a volleyball team (replace “team” with “pair” for the beach version) and had them play each other. Who would win? Well, since they are exactly equal, you can’t pick one team over the other, and there is a 50/50 chance that either team would win. On the other hand, if you put the McNamara sisters from UCLA up against a pair made up of me and my 9 year old son, their chance of winning would be essentially 100% (there is a non-zero probability that lightening could strike them every time they tried to hit the ball, such that our team would win in a blowout! The probability of that happening might be on the order of 1 in 10400K, but it could happen). However, for two teams at the collegiate level, the probability is going to be somewhere between these extremes. Pablo rankings try to reflect that probability that either team will win if they play. Therefore, if two teams are rated exactly the same and play, Pablo says that it can’t predict who is going to win. However, if one team is rated higher than the other, then Pablo is telling you that if they play, it is more likely that the higher rated team will win. The greater the difference between the team ratings, the more likely the higher rated team is to win.
What’s the relationship between rating difference and winning probability?
I don’t know. In the Pablo FAQ at www.RichKern.com, there is a table that relates rating difference with win probability. However, that only applies for the indoor game, best of 5 with matches to 25 (15 in set 5), and it was determined through extensive modeling and comparison to years and years worth of results. The concept is the same for the beach, but I need to do the modeling and we need more data in order to establish the actual relationship. The concept is the same, though – the larger the difference between the teams, the more likely the higher rated team is to win.
What’s all this talk of probabilities? Matches aren’t determined by coin flips or dice, they are determined by players and their ability to perform on the court.
As much as we’d like to think players control the outcomes of the matches, there are factors that are out of their control that affect things. My favorite example was a match between Purdue and Utah I attended. Purdue led 2 sets to 1, and was at match point set 4, with Utah serving. The serve hit the net, and crawled over to fall on the floor, to tie the score. Utah went on to win that set, and ultimately won set 5 to win the match. Now, you can praise the outstanding skill of the Utah team to come back from behind and pull out the win, but the truth is, had that serve been 1 inch lower, it would have been a net serve and Purdue would have won the point. Alternatively, had it been 1 inch higher, it would have passed over the net and Purdue would have had a clean play and a much better chance of siding out to win the match. It just happened that the serve was in that place. No individual has that much control over a serve, and the fact it was there and not 1 inch higher or lower was basically random. Yet, it made the difference between winning and losing.
Over the course of a match, these types of little effects largely even out, but they don’t have to do so completely. There is a chance they won’t even out, with no explanation as to why.
I’m still not buying it. I agree that the little things matter, but as coaches, we focus on those little things, and making them to our advantage. Do you have any proof?
The best “proof” is in the pudding, as they say. And the short answer is, it works exactly as advertised. I have tested this model using years and years worth of real life results for the indoor women (we are talking more than 100 000 matches) and it behaves exactly as I have described. When teams are predicted to be very close to each other, the outcome is basically 50/50. When teams are separated by a lot, the higher rated team almost always wins (to the extent of 98 – 99% of the time, depending on how far separated). And if they are separated to the point where Pablo thinks the higher rated team should win 70% of the time, those teams win 70% of the time, etc. Therefore, at least in indoor volleyball, the probabilistic model has proved to be robust. I will be testing the results for the beach, but for now, this is what we have.
You have a team we beat rated higher than us. Doesn’t that show that there’s a problem? We already beat them once, why do you think you they’d beat us?
One of the most important features of Pablo ratings and the probability model is that upsets happen. In fact, upsets MUST happen – if they didn’t, then the probability model would be wrong. For example, if a Pablo thinks that a higher rated team should win 70% of the time, then that doesn’t mean that the lower rated team can’t win. The lower rated team should (and does) win about 30% of the time! If the lower rated team didn’t pull off the upset and win 30% of the time, then it would mean that the 70% win probability is wrong. If your team is rated lower to another team despite beating them, it means that Pablo thinks your win was an upset. This would be based on the sum total results for both teams. The ratings for the teams are based on the full body of work, of which this match is only a small part.
But in ranking two teams, shouldn’t head-to-head matter most?
The problem is, we aren’t only ranking those two teams. We have to rank those teams in the context of all their opponents, so if Pablo moves your team up ahead of the other team, all the teams that beat you need to be above them, too, right? But what if one of those teams lost to the team you beat? Whoops. That is not uncommon in volleyball (or, in all sports for that matter) that team A beats B who beats C who beats A. How do you create a ranking that accounts for all the head-to-head? You can’t do it.
I have created a ranking (I call it the “Ultimate Pablo Ranking”) that tries to optimize on the basis of maximizing the extent to which it reflects who-beat-whom, so that the rankings do the best they can to reflect the head-to-head outcomes. Make no mistake, it’s doable, but what is clear is that the rankings you get are less predictive than what you get with the original Pablo approach.
So how do you get the ratings that determine the probabilities?
Pablo ratings are based on a match-by-match assessment of what a team has done.
Teams do a lot of things. Can you be more specific?
OK, it’s an assessment of what they do on the court. A team’s outcome is assessed by the percentage of points they score in a match overall, so the total point percentage. The point-percentage advantage is used to calculate the rating difference between the two teams for that match. Therefore, if two teams score the same number of points in the match, Pablo concludes that they should be rated equally based on that match. Or, if the point percentage indicates a rating difference of 2000 points, then Pablo concludes the teams should be separated by 2000 points based on that match. So you do that for a bunch of matches and you assign ratings to the teams that give you the best agreement with what has happened on the court. Details of how that is done are proprietary, but that’s the idea.
Wait a minute! You base these rankings on points? And total points at that? First of all, what about winning? Doesn’t winning matter? And second, the way you win in volleyball is by winning more SETS than the opponent, not points. You don’t have to score more points than your opponents to win the match. Why should we take it seriously when it misses such fundamental concepts?
Oh I know it, I know it very well, in fact. I know very well the number of teams that win despite being outscored (it happens but it’s not all that common – about 5% of matches). And I’ve heard it plenty of times. “All we care about it winning.” I understand it. In terms of doing things like winning conference titles and tournaments, it’s true, you have to win.
But that’s not what Pablo rankings are trying to determine. We already know who won the conference or the tournament. Pablo is about looking forward to matches that haven’t been played and predicting who will win. And in doing that, the most important thing to consider is the total number of points.
I’m not just making that up. I’ve analyzed tens of thousands of match pairs, where two teams play each other twice in the same season, and I’ve looked at the relationship between the point percentage in the first match and the winning percentage in the second match. What I’ve found is that the winning percentage in the second match can be accurately predicted even if all you know the point percentage in the first match. You don’t need to know the winner, just look at point percentage.
Now, to an extent that is because the team that scores more points usually wins, but given choice between knowing who won and not knowing the point percentage, and knowing the point percentage and not knowing who won, if you want to have the better prediction, it’s better to know point percentage than who won. The same is true for sets. You can look at whether teams win in 3, 4 or 5 sets, and you can predict better by looking only at match wins, but that is because sets played is correlated with point percentage (for example, teams that win despite being outscored usually wins in 5). I understand that, given our system, coaches are “only concerned about winning.” And that may be true. But despite all that, the fact is that the greater percentage of points a team scores in a match, if they play the same team again, the more likely they are to win.
But doesn’t winning matter at all? What about teams that have the ability to win those close ones?
Winning is not completely irrelevant. For example, if two teams score the same number of points in the first match and play again, the team that won the first match has about a 52% chance of winning the second match. Based on points alone, you would predict they would win 50%, but it’s more like 52%. THAT’S the premium that you get for winning. It’s real, and it is absolutely incorporated into Pablo ratings. But that effect is trivial compared to the other things. For example, for teams that score the same number of points, the difference between winning and losing is about 180 Pablo rating points. When the average match outcome has a rating difference of 2200 points, you can see that the premium on winning is trivial. In a 3 set beach match, that’s a 2 point difference in one match (it’s the difference between winning 21 – 17, 17 – 21, 15 – 11 and 21 – 16, 18 – 21, 15 – 11).
There are far more important factors. For example, whether the match is at home or on the road makes a big difference, so Pablo gives a rating boost to the home team. The home court advantage is usually about 200 rating points (so in calculating the rating difference between teams, give the home team, if there is one, an extra 200 points). Also, in terms of predictability, how long ago the match was played makes a difference in terms of how predictive it is. This is not an issue in looking at ratings, but is taken into account when ratings are calculated. Matches played longer ago are weighted less. However, don’t get hung up on this part, because the dropoff isn’t all that large.
OK, so if point percentage is so important, does that mean that if I want to boost our Pablo ranking, we should run up the score?
No, don’t run up the score. First, although your Pablo ranking can be interesting to look at, it has no value in itself. It does no good to run up the score to boost your Pablo ranking a couple of spots. You might think it looks better, but you also know you have manipulated it. No one is impressed.
But more importantly, the system has built-in safeguards to minimize the effect of blowouts. It doesn’t matter if you win 21 – 15, 21 – 15 or 21 – 10, 21 – 10, Pablo just looks at these matches and says, you blew the other team away. So the benefit of running up the score to get a better Pablo rating is not worth the loss of development you could gain from sacrificing scoring to work on other aspects of your game.
Anything else you can tell us, especially as it applies to the beach game?
This beach stuff is new, and so we are still working to optimize the model. More results will allow for that analysis.
One thing that we really wanted to do with the beach rankings is to create individual ratings, and then you could have pair ratings to be the sum of the two (Karch Kirally has been wanting to do something like this for a long time, but it’s hard). We looked into doing that with NCAA Beach Volleyball, but it got too complicated.
Another feature of the NCAA Beach program is the best of five pairs format. It’s a fascinating approach from a game theory perspective, ultimately being a team event while based on the individual pairs. Just as the issue encountered with the individual ratings, there were some concerns about the different levels of pairs, such as level 1, 2, 3, 4 and 5, and how we could compare them. The good news is that there is enough cross over in levels that this is not a problem. Therefore, there are enough pairs who will play matches at levels 1, 2 and even 3, or 2, 3 and 4, or 3, 4 and 5, etc, so it’s possible to basically get a good comparison of all the pairs, and they can all be ranked together against each other.
Yeah, I’ve seen the rankings, and there are a lot of teams all together. Is there any way you could, for example, split them into level 1, 2, 3, 4 and 5? That would make life a lot easier.
I’m not quite sure how to do that. The problem is, if you have a pair that plays 10 matches at level 1 and 10 matches at level 2, do you put them in the level 1 group or level 2 group? And, more importantly, why would you bother? Why not rank them just in the overall scheme of things?
There are some reasonable distinctions you can make, however. For example, it does make sense to require a given number of matches. There are quite a few pairs who might only play together for 1 or 2 matches a year. In calculating the Pablo rankings, we absolutely include them and their matches, but the rankings listed on the website will likely have some minimum number of matches played to get included. Just for information purposes, there were 972 different pairs who played at least one match in 2018, but only 344 had as many as 10 matches total (across all levels).
We can also rank pairs by division (D1, D2 and D3).
What about teams? This is supposed to be a team sport.
There are ways to do this, and once I figure out the best way, I’ll update this section. Stay tuned.