The European cycling season is in full swing, which for many means competing in various cycling fantasy leagues, with the main event being, of course, the Tour de France.
A fantasy league is a competition, often held between coworkers or friends, where participants compose a team of players for a certain sports season or event. The goal is to acquire as many points as possible determined by the league's score system while adhering to its constraints, most often in the form as a budget cap.
This is the first in a series of posts that will guide you through the various facets of designing a good fantasy league team. They generally won't introduce any particular strategies or models — sports are much too complex and varied to be captured in a single model — but rather present a number of criteria of a good model. The focus will be on cycling to keep things somewhat concrete, but most of the guidelines will apply to other sports as well.
Designing good fantasy leagues is a fascinating endeavor. It involves combining disciplines ranging from sports knowledge, team dynamics and the latest news on fitness to optimization, statistics and modeling. It will be valuable to be familiar with at least some of these subjects. Since they are so diverse, we'll aim our attention at a single aspect at a time, starting with optimization.
playing with a crystal ball
At heart, fantasy leagues are a kind of optimization problem: the objective is to maximize some league score function by assigning a riders to a team under a number of constraints. We're usually required to choose a team before the event starts and no points have been awarded. But before we tackle that problem, let's see how to solve it if the results were known in advance. Even with such perfect information, we'll see it's not exactly trivial.
Consider for example a Tour de France fantasy league where all participating riders have known (and fixed!) salary demands. Riders are awarded points at the end of the event for their individual accomplishments such as winning stages or finishing in Paris wearing the polka dot jersey. The total score for a team is determined by the sum of points awarded to the riders in the team under the constraint that the sum of the riders' salaries stays within a fixed budget.
Selecting the highest scoring team under these constraints is equivalent to the knapsack problem, a well studied problem in combinatorial optimization. Given a set of items, each given a weight and a value, the objective of the knapsack problem is to choose a subset of items to put in a knapsack that maximizes the total value while staying under a given weight limit. By perceiving riders with a known score and salary as objects with values and weights respectively, we can translate the team selection problem to a knapsack problem. This helps bring things into the realm of mathematical optimization, which provides many tools for handling such well studied problems.
If we add more constraints, like a limit on the number of riders in a team, the problem no longer fits the mold of the knapsack problem. We can still model it using the more general method of integer programming. The idea is to represent every rider by a unique number, say through , and then assign a to that number if the rider is in our team and otherwise. This can be seen as a list of length containing ones and zeros. Let's refer to this binary list using the symbol ; so when rider is part of our team, we write . These are the variables that we are optimizing for. Further, every rider is assigned a value and a cost . The corresponding integer program is commonly written like this:
It looks intimidating, but it is a direct translation of our team selection problem. On the bottom line is the requirement that every element in our list must be either or . This seems sensible: a rider is either in our team or not. The top line specifies the objective function. It is sum over the products , and because every is either zero or one, it is really summing up the values of the riders in our team! The middle line specifies the constraints. The budget constraint is for example given by , with being the team budget. Other limits, like a cap on the number of riders from the same team or the number of sprinters, can be encoded in a similar fashion.
Solving integer programs and computing optimal knapsacks is NP-hard, which means that there is no known algorithm that can solve the general case in reasonable time. In theory, it could take centuries to compute the optimal team from hundreds of riders, but usually the set of interesting riders is so small that a common laptop can do it in a few seconds.
Let's apply this approach for a fantasy league that is already underway. The velogames spring classics league is a good fit since it has typical cycling league rules. Scores are determined by the sum of the riders' points, and there is a limit on the team's cost. Additionally, there can be no more than nine riders to a team. We can encode this by adding the constraint to our integer program. Again, since every is either zero or one, this effectively counts the number of riders in our team, which is exactly what we need.
As with essentially all things that could conceivably be computed, there exists a python package that enables us to do this with only a few lines of code. Below is an annotated python script that uses the cvxpy library to find the optimal team using integer programming. The costs and scores for riders were taken from the velogames standings page on April 10, 2019.
import pandas import cvxpy import numpy as np # Load data table = pandas.read_csv('data.csv') weights = table['Cost'].values values = table['Points'].values rider_count = len(weights) # The variable we are solving for: a binary vector whose length # is given by the total number of riders selection = cvxpy.Variable(rider_count, boolean=True) # The total cost should be no more than 100, and the team # cannot contain more than 9 riders cost_constraint = weights * selection <= 100 size_constraint = np.ones(rider_count) * selection <= 9 # Our total utility is the sum of the item utilities total_utility = values * selection # We tell cvxpy that we want to maximize total utility # subject to cost_constraint and size_constraint knapsack_problem = cvxpy.Problem( cvxpy.Maximize(total_utility), [cost_constraint, size_constraint] ) # Solve the problem best_score = knapsack_problem.solve() # Retrieve rider indices indices = [i for i, val in enumerate(selection.value) if val >= 0.9] # Display the dream team and its score print(table.ix[indices]) print(best_score)
It produces the following team:
|Alexander Kristoff||UAE-Team Emirates||335||16|
|Wout Van Aert||Team Jumbo-Visma||305||14|
|Oliver Naesen||AG2R La Mondiale||435||14|
|Bob Jungels||Deceuninck - Quick Step||280||14|
|Mathieu Van Der Poel||Corendon - Circus||395||12|
|Nils Politt||Team Katusha - Alpecin||195||8|
|Kasper Asgreen||Deceuninck - Quick Step||250||8|
|Alberto Bettiol||EF Education First Pro Cycling Team||310||6|
|Anthony Turgis||Direct Energie||145||4|
This team consists of exactly 9 riders and has a cost of 96, which makes it a valid solution. Somewhat surprisingly, its total score of 2650 is only slightly better than that of the leader on the boards at the time of writing, who had 2530 points without any posterior information. Impressive!
We've seen how to optimally design a team when rider performance is known. This alone isn't enough to create a good fantasy league team for the upcoming Tour de France, but it can be used as a building block in models for the case without perfect information. One possible starting point would be to estimate riders' expected performance and to then use those expected values as weights in an integer program.
In future posts, we may look at how to maximize fantasy league win chances in the presence of uncertainty. It will involve not only predicting rider performance, but also dissecting the score function, computing correlations between riders, and analyzing the competition. That's where things get really interesting!