The goal of the challenge is to predict the number of passengers per plane on some flights in the US. The data is provided to us by a single company. This is a supervised regression problem.
From the company point of view, the interest of this challenge is to be able to evaluate the percentage of no-show reservations, in order to properly calibrate overbooking.
Some passengers make reservation but do not show up on the flight, leading to empty seats in the plane. Estimating the number of passengers effectively boarding the plane is thus important for the company. The left-out data has dates that come after the training data, so a time series approach is possible.
The training data is made available as a dataframe, whose columns are:
- flight_date: the flight's takeoff day
- from: the IATA code of the departure airport
- to: the IATA code of the arrival airport
- avg_weeks: average number of weeks between booking and flight date, across passengers
- std_weeks: standard deviation of number of weeks between booking and flight date, across passengers. You can think of the last two columns as being outputs of another, unknown ML pipeline.
The target variable is a transformation of the number of passengers boarding the plane, and is named `target` in the training dataframe.
The main difficulty of the challenge is the limited number of information per flight (the data is "thin", it has few columns). The participants are encouraged to enrich the data with other sources, e.g. weather, holiday calendar, etc.
The performance of the prediction will be quantified on left-out data, using the RMSE (Root Mean Squared Error).
The test data is available under the same format as the training data, minus the target column.
There will be a kickoff event on December 10th at 1730 Paris time.
The registration for the competition will close on December 13th at 2355 Paris time.
The submissions will be open until January 15th and the results will be announced on the 25th.
• English will be the official language during the whole competition.
• This competition is open to students only. The competition winners will need to prove their enrollment to a university program.
• Final teams of 1-3 people. They can be built thanks to the dedicated Discord forum. Once a submission is made for the creation of a team, members cannot leave and form a new team.
• One-strike policy. Treat everyone (fellow participants, company sponsors, organizers) with respect. Absolutely no harassment, witch hunting, sexism, racism, or hate speech will be tolerated. We want this competition to be a welcoming space. Anyone disrespecting those rules will be banned from the competition, at the sole discretion of the organizers.
• Once a day, competitors may submit their model/outputs to update the leaderboard on a validation set.
• Moderators may ask the participants for their code after each submission.
• If the code doesn't run or if the results are not reproducible, the score will not be taken into account.
• Using models from a public source is allowed, as long as the reference is cited.
• Prizes: there is no monetary prize for this competition.
• Specific rules for EMINES students: The EMINES students should work on a GitHub private repository, making it public on January 15th. The commit history will be used to judge the contributions and grade accordingly. Code should follow the PEP8 coding style e.g. not raise any errors when subject to `flake8`. There should be a script named `main.py` that eventually fetches the additional data, trains a model on the training data, and return the prediction vector submitted to the platform as a final submission.