Pablo Beach Volleyball Rankings FAQ
I’m not familiar with Pablo rankings. What is this about?
Pablo rankings were created almost 20 years go for NCAA Women’s volleyball. Pablo was a poster at VolleyTalk who came up with this ranking system. There is a (badly outdated) FAQ for the process at www.RichKern.com for more information.
Pablo is a based on a probabilistic model. Consider, for example, if you could clone a volleyball team (replace “team” with “pair” for the beach version) and had them play each other. Who would win? Well, since they are exactly equal, you can’t pick one team over the other, and there is a 50/50 chance that either team would win. On the other hand, if you put the McNamara sisters from UCLA up against a pair made up of me and my 9 year old son, their chance of winning would be essentially 100% (there is a non-zero probability that lightening could strike them every time they tried to hit the ball, such that our team would win in a blowout! The probability of that happening might be on the order of 1 in 10400K, but it could happen). However, for two teams at the collegiate level, the probability is going to be somewhere between these extremes. Pablo rankings try to reflect that probability that either team will win if they play. Therefore, if two teams are rated exactly the same and play, Pablo says that it can’t predict who is going to win. However, if one team is rated higher than the other, then Pablo is telling you that if they play, it is more likely that the higher rated team will win. The greater the difference between the team ratings, the more likely the higher rated team is to win.
What’s the relationship between rating difference and winning probability?
I don’t know. In the Pablo FAQ at www.RichKern.com, there is a table that relates rating difference with win probability. However, that only applies for the indoor game, best of 5 with matches to 25 (15 in set 5), and it was determined through extensive modeling and comparison to years and years worth of results. The concept is the same for the beach, but I need to do the modeling and we need more data in order to establish the actual relationship. The concept is the same, though – the larger the difference between the teams, the more likely the higher rated team is to win.
What’s all this talk of probabilities? Matches aren’t determined by coin flips or dice, they are determined by players and their ability to perform on the court.
As much as we’d like to think players control the outcomes of the matches, there are factors that are out of their control that affect things. My favorite example was a match between Purdue and Utah I attended. Purdue led 2 sets to 1, and was at match point set 4, with Utah serving. The serve hit the net, and crawled over to fall on the floor, to tie the score. Utah went on to win that set, and ultimately won set 5 to win the match. Now, you can praise the outstanding skill of the Utah team to come back from behind and pull out the win, but the truth is, had that serve been 1 inch lower, it would have been a net serve and Purdue would have won the point. Alternatively, had it been 1 inch higher, it would have passed over the net and Purdue would have had a clean play and a much better chance of siding out to win the match. It just happened that the serve was in that place. No individual has that much control over a serve, and the fact it was there and not 1 inch higher or lower was basically random. Yet, it made the difference between winning and losing.
Over the course of a match, these types of little effects largely even out, but they don’t have to do so completely. There is a chance they won’t even out, with no explanation as to why.
I’m still not buying it. I agree that the little things matter, but as coaches, we focus on those little things, and making them to our advantage. Do you have any proof?
The best “proof” is in the pudding, as they say. And the short answer is, it works exactly as advertised. I have tested this model using years and years worth of real life results for the indoor women (we are talking more than 100 000 matches) and it behaves exactly as I have described. When teams are predicted to be very close to each other, the outcome is basically 50/50. When teams are separated by a lot, the higher rated team almost always wins (to the extent of 98 – 99% of the time, depending on how far separated). And if they are separated to the point where Pablo thinks the higher rated team should win 70% of the time, those teams win 70% of the time, etc. Therefore, at least in indoor volleyball, the probabilistic model has proved to be robust. I will be testing the results for the beach, but for now, this is what we have.
You have a team we beat rated higher than us. Doesn’t that show that there’s a problem? We already beat them once, why do you think you they’d beat us?
One of the most important features of Pablo ratings and the probability model is that upsets happen. In fact, upsets MUST happen – if they didn’t, then the probability model would be wrong. For example, if a Pablo thinks that a higher rated team should win 70% of the time, then that doesn’t mean that the lower rated team can’t win. The lower rated team should (and does) win about 30% of the time! If the lower rated team didn’t pull off the upset and win 30% of the time, then it would mean that the 70% win probability is wrong. If your team is rated lower to another team despite beating them, it means that Pablo thinks your win was an upset. This would be based on the sum total results for both teams. The ratings for the teams are based on the full body of work, of which this match is only a small part.
But in ranking two teams, shouldn’t head-to-head matter most?
The problem is, we aren’t only ranking those two teams. We have to rank those teams in the context of all their opponents, so if Pablo moves your team up ahead of the other team, all the teams that beat you need to be above them, too, right? But what if one of those teams lost to the team you beat? Whoops. That is not uncommon in volleyball (or, in all sports for that matter) that team A beats B who beats C who beats A. How do you create a ranking that accounts for all the head-to-head? You can’t do it.
I have created a ranking (I call it the “Ultimate Pablo Ranking”) that tries to optimize on the basis of maximizing the extent to which it reflects who-beat-whom, so that the rankings do the best they can to reflect the head-to-head outcomes. Make no mistake, it’s doable, but what is clear is that the rankings you get are less predictive than what you get with the original Pablo approach.
So how do you get the ratings that determine the probabilities?
Pablo ratings are based on a match-by-match assessment of what a team has done.
Teams do a lot of things. Can you be more specific?
OK, it’s an assessment of what they do on the court. A team’s outcome is assessed by the percentage of points they score in a match overall, so the total point percentage. The point-percentage advantage is used to calculate the rating difference between the two teams for that match. Therefore, if two teams score the same number of points in the match, Pablo concludes that they should be rated equally based on that match. Or, if the point percentage indicates a rating difference of 2000 points, then Pablo concludes the teams should be separated by 2000 points based on that match. So you do that for a bunch of matches and you assign ratings to the teams that give you the best agreement with what has happened on the court. Details of how that is done are proprietary, but that’s the idea.
Wait a minute! You base these rankings on points? And total points at that? First of all, what about winning? Doesn’t winning matter? And second, the way you win in volleyball is by winning more SETS than the opponent, not points. You don’t have to score more points than your opponents to win the match. Why should we take it seriously when it misses such fundamental concepts?
Oh I know it, I know it very well, in fact. I know very well the number of teams that win despite being outscored (it happens but it’s not all that common – about 5% of matches). And I’ve heard it plenty of times. “All we care about it winning.” I understand it. In terms of doing things like winning conference titles and tournaments, it’s true, you have to win.
But that’s not what Pablo rankings are trying to determine. We already know who won the conference or the tournament. Pablo is about looking forward to matches that haven’t been played and predicting who will win. And in doing that, the most important thing to consider is the total number of points.
I’m not just making that up. I’ve analyzed tens of thousands of match pairs, where two teams play each other twice in the same season, and I’ve looked at the relationship between the point percentage in the first match and the winning percentage in the second match. What I’ve found is that the winning percentage in the second match can be accurately predicted even if all you know the point percentage in the first match. You don’t need to know the winner, just look at point percentage.
Now, to an extent that is because the team that scores more points usually wins, but given choice between knowing who won and not knowing the point percentage, and knowing the point percentage and not knowing who won, if you want to have the better prediction, it’s better to know point percentage than who won. The same is true for sets. You can look at whether teams win in 3, 4 or 5 sets, and you can predict better by looking only at match wins, but that is because sets played is correlated with point percentage (for example, teams that win despite being outscored usually wins in 5). I understand that, given our system, coaches are “only concerned about winning.” And that may be true. But despite all that, the fact is that the greater percentage of points a team scores in a match, if they play the same team again, the more likely they are to win.
But doesn’t winning matter at all? What about teams that have the ability to win those close ones?
Winning is not completely irrelevant. For example, if two teams score the same number of points in the first match and play again, the team that won the first match has about a 52% chance of winning the second match. Based on points alone, you would predict they would win 50%, but it’s more like 52%. THAT’S the premium that you get for winning. It’s real, and it is absolutely incorporated into Pablo ratings. But that effect is trivial compared to the other things. For example, for teams that score the same number of points, the difference between winning and losing is about 180 Pablo rating points. When the average match outcome has a rating difference of 2200 points, you can see that the premium on winning is trivial. In a 3 set beach match, that’s a 2 point difference in one match (it’s the difference between winning 21 – 17, 17 – 21, 15 – 11 and 21 – 16, 18 – 21, 15 – 11).
There are far more important factors. For example, whether the match is at home or on the road makes a big difference, so Pablo gives a rating boost to the home team. The home court advantage is usually about 200 rating points (so in calculating the rating difference between teams, give the home team, if there is one, an extra 200 points). Also, in terms of predictability, how long ago the match was played makes a difference in terms of how predictive it is. This is not an issue in looking at ratings, but is taken into account when ratings are calculated. Matches played longer ago are weighted less. However, don’t get hung up on this part, because the dropoff isn’t all that large.
OK, so if point percentage is so important, does that mean that if I want to boost our Pablo ranking, we should run up the score?
No, don’t run up the score. First, although your Pablo ranking can be interesting to look at, it has no value in itself. It does no good to run up the score to boost your Pablo ranking a couple of spots. You might think it looks better, but you also know you have manipulated it. No one is impressed.
But more importantly, the system has built-in safeguards to minimize the effect of blowouts. It doesn’t matter if you win 21 – 15, 21 – 15 or 21 – 10, 21 – 10, Pablo just looks at these matches and says, you blew the other team away. So the benefit of running up the score to get a better Pablo rating is not worth the loss of development you could gain from sacrificing scoring to work on other aspects of your game.
Anything else you can tell us, especially as it applies to the beach game?
This beach stuff is new, and so we are still working to optimize the model. More results will allow for that analysis.
One thing that we really wanted to do with the beach rankings is to create individual ratings, and then you could have pair ratings to be the sum of the two (Karch Kirally has been wanting to do something like this for a long time, but it’s hard). We looked into doing that with NCAA Beach Volleyball, but it got too complicated.
Another feature of the NCAA Beach program is the best of five pairs format. It’s a fascinating approach from a game theory perspective, ultimately being a team event while based on the individual pairs. Just as the issue encountered with the individual ratings, there were some concerns about the different levels of pairs, such as level 1, 2, 3, 4 and 5, and how we could compare them. The good news is that there is enough cross over in levels that this is not a problem. Therefore, there are enough pairs who will play matches at levels 1, 2 and even 3, or 2, 3 and 4, or 3, 4 and 5, etc, so it’s possible to basically get a good comparison of all the pairs, and they can all be ranked together against each other.
Yeah, I’ve seen the rankings, and there are a lot of teams all together. Is there any way you could, for example, split them into level 1, 2, 3, 4 and 5? That would make life a lot easier.
I’m not quite sure how to do that. The problem is, if you have a pair that plays 10 matches at level 1 and 10 matches at level 2, do you put them in the level 1 group or level 2 group? And, more importantly, why would you bother? Why not rank them just in the overall scheme of things?
There are some reasonable distinctions you can make, however. For example, it does make sense to require a given number of matches. There are quite a few pairs who might only play together for 1 or 2 matches a year. In calculating the Pablo rankings, we absolutely include them and their matches, but the rankings listed on the website will likely have some minimum number of matches played to get included. Just for information purposes, there were 972 different pairs who played at least one match in 2018, but only 344 had as many as 10 matches total (across all levels).
We can also rank pairs by division (D1, D2 and D3).
What about teams? This is supposed to be a team sport.
College beach volleyball has a unique set up, with 5 pairs competing as part of the team. The competition is like in tennis, where the pairs face off head-to-head, and the winning team is the one that wins the most matchups. Therefore, we need to keep this in mind when ranking teams.
One of the key features of the team strength is that it depends on all five pairs. A team could have the best #1 pair in the country, but that only gives them one win, and if their other 4 pairs are lousy, the team could still lose all its matches. On the other hand, if the team does have the best #1 pair in the country, they still have an advantage over the others in that they can generally count on winning that first matchup, and then they only need to win 2 of the other 4. Pablo takes this all into account in creating the team rankings. The process for ranking teams is described below:
a) The first step in the process is in choosing the 5 pairs for each team. This is probably the hardest part, because there is no simple way it can be automated and it is done manually. As such, it has some subjectivity. It works most simply when a team has a mostly consistent lineup. In that case, the team just consists of the teams that have been playing in the 5 positions for that team. However, many teams have a lot of variation in their lineups, and in that case, selecting the 5 pairs for the team is a little harder. The primary selection criterion in this case is whichever pair played the most at each level for the team gets selected. This can be an issue, because if a team plays pair AB at level 1 for 10 matches and CD at level 2, and then switches to AC at level 1 and BD at level 2 for the next 8 matches, Pablo will still use the pairs AB and CD, even though the current lineup is AC and BD. A very important consideration that overlays the whole process is that no player can be on more than one pair for the team (so in the example above, it has to be AB and CD and can’t be AB and AC). If there are multiple pairs for the team that have the same number of matches at a given level, Pablo looks to the other levels to see if there is a clear preference. For example, if AB and AC both have 10 matches at level 1 for a team, but CD is the clear choice for level 2, then Pablo will choose AB as the level 1 and CD as the level 2. If after this assessment, there is still not a clear preference to make, then Pablo just chooses the highest rated pair among the tied options.
b) Once the pairs for the team is selected, then Pablo has to assign them to levels. In this work, Pablo just assigns the levels on the basis of the pair rating, with the highest rated pair in level 1, the next highest in level 2, and so on. This may be the most controversial step in the process, because there can be differences between the levels that Pablo assigns and those where the teams have played during the season. However, this approach means that teams will be set at levels that correspond to how well they have actually performed, and is not biased by the coach’s assessment of where they should be. As a result, each team has their 5 pairs, in order from level 1 to 5.
c) With the teams set, the next step is to rank them in some way. This is the most interesting part, because it requires thinking deeply about game theory and the nature of the competition. Given that a match between 2 teams involves head-to-head competition at each level, but does not involve and cross-level competition, the starting point is within each level. It is trivially easy to rank each pair within each level on the basis of their rating, but it is not clear how to use this in team assessment. In this work, Pablo relies on the probabilistic nature of the system.
Consider the matchup between two teams, A and B, made up of A1, A2, A3, A4 and A5 and B1, B2, B3, B4 and B5. Given the Pablo ratings of the pairs, it is trivially easy to calculate the probability that A1 beats B1 (P1), A2 beats B2 (P2), A3 beats B3 (P3), A4 beats B4 (P4) and A5 beats B5 (P5). Given the probabilities P1 – P5, it is possible to calculate the probability that team A will win at least three of the matches. The calculation itself is somewhat involved, but not really difficult. It just needs to consider all the different ways that team A can get at least three wins. For example, it could be WWWLL, or WLWLW, or LWWWL. Or it could also be WWWWL, or LWWWW or even WWWWW (there are actually 16 different ways for team A to get at least 3 wins). So given the probabilities of the different levels (P1 – P5), it is possible to calculate the probability that team A beats team B in a head-to-head matchup.
This approach is a natural extension of the Pablo approach, which is all about the probability. It is treating the matchup exactly as it should be in that context. For example, it readily handles the extreme examples described in the introduction. In the case where the level 1 pair is much better and almost an assured win (P1 = 1.000), then the team the probability of A winning depends on how likely it is to win levels 2 – 5. And in that case, it doesn’t matter if B1 is relatively mediocre or awful, team A doesn’t gain much of an advantage, and it still comes down to levels 2 – 5. It has taken some time to get it figured out, but in the end, this the right way to examine the problem.
There are a couple of ways to turn this into a team ranking. One way to do it would be to rank the teams based on how likely they would be to beat an average team. It’s a legitimate approach, but has a bit of drawback. In practice, what we find is that the best teams are all very likely to beat an average team. Therefore, while the best team might have a 99.9% chance of beating an average team, but the second best team would have a 99.7% chance, and there is a 99.6% chance for the third best team. Although the teams are separated, the difference between them is misleading, because they are all to the point where they are pretty much going to win.
Alternatively, what Pablo does is to do a head-to-head comparison of all the teams. The current process has two groups, with D1, D2, D3 and NAIA teams in one division, and Community College in a separate division. So for each team, Pablo calculates its probability of beating all the other teams in the division. The sum of all these probabilities gives us the expected number of wins for that team if they were to play everyone else, in a round-robin format. The advantage of this approach is that if the best team is clearly better than the next, then even if they both are able to completely beat up all the other teams in the division, they will still be separated by 1 game in the final standings, because the top team wins the head-to-head matchup.
In the end, teams are ranked on this basis – the predicted number of wins in this hypothetical round-robin schedule against all the other teams in the division. The teams rankings list also includes a breakdown of the head-to-head results at each level, and the pair ranking within each level. This information is not actually used in calculating the team rankings, but it is useful when comparing the teams.