Utilizing Python to search out the dream group, greatest formation, and ideally suited beginning XI for every nationwide group within the World Cup 2022
The FIFA World Cup 2022 has already began and there are a lot of insights we will get from knowledge about this competitors.
Issues just like the World Cup 2022 dream group, the most effective 26-man squad for every nationwide group, and their ideally suited beginning XI could be obtained utilizing Python.
However we gained’t solely try this! We’ll additionally add some knowledge visualizations to make this evaluation extra comprehensible and even the graph of a soccer pitch with the perfect lineup utilizing Python.
Desk of Contents
1. Dataset Overview
2. Adjusting the Dataset FIFA 22 to the World Cup 2022
3. The Metric To Use for This Evaluation
4. Dream Staff World Cup 2022
5. The Gamers with The Highest Score on every Nationwide Staff
6. The Finest 26-Man Squad for every Nationwide Staff
7. Finest Beginning XI and Formation
- Spain: Finest Beginning XI (4–2–3–1)
- Portugal: Finest Beginning XI (4–2–3–1)
- England: Finest Beginning XI (4–4–2)
- Brazil: Finest Beginning XI (4–3–3)
- France: Finest Beginning XI (4–2–3–1)
- Argentina: Finest Beginning XI (4–3–3)
- Germany: Finest Beginning XI (4–2–3–1)
The information we’ll use for this evaluation is from the sport FIFA 22 which accommodates details about most soccer gamers world wide. You’ll be able to obtain this dataset right here.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style('darkgrid')df = pd.read_csv('players_22.csv', low_memory=False)
This dataset accommodates greater than 100 columns, however for this evaluation, we’ll solely use a couple of of them.
df = df[['short_name', 'age', 'nationality_name', 'overall', 'potential',
'club_name', 'value_eur', 'wage_eur', 'player_positions']]
One thing to bear in mind is that the player_positions
column has a number of positions, so we must always choose just one for this evaluation. Additionally, we’ve to drop NaN knowledge.
# choosing just one place
df['player_positions'] = df['player_positions'].str.break up(',', broaden=True)[0]# dropping nan
df.dropna(inplace=True)
Right here’s how the df
dataframe appears at this level.
Now we’re going to adapt the dataset FIFA 22 to the World Cup 2022. Which means that we’ll solely think about these nations that certified for the competitors and solely these gamers that had been referred to as up for his or her nationwide group.
# dropping injured gamers
players_missing_worldcup = ['K. Benzema', 'S. Mané', 'S. Agüero', 'Sergio Ramos', 'P. Pogba',
'M. Reus', 'Diogo Jota', 'A. Harit', 'N. Kanté', 'G. Lo Celso', 'Piqué']drop_index = df[df['short_name'].isin(players_missing_worldcup)].index
df.drop(drop_index, axis=0, inplace=True)
# filtering solely nationwide groups on the earth cup
teams_worldcup = [
'Qatar', 'Brazil', 'Belgium', 'France', 'Argentina', 'England', 'Spain', 'Portugal',
'Mexico', 'Netherlands', 'Denmark', 'Germany', 'Uruguay', 'Switzerland', 'United States', 'Croatia',
'Senegal', 'Iran', 'Japan', 'Morocco', 'Serbia', 'Poland', 'South Korea', 'Tunisia',
'Cameroon', 'Canada', 'Ecuador', 'Saudi Arabia', 'Ghana', 'Wales', 'Costa Rica', 'Australia'
]
df = df[df['nationality_name'].isin(teams_worldcup)]
For this evaluation, we’ll use the FIFA scores for every participant. The ranking is represented by the total
column.
Let’s kind the dataframe by total
(in case a number of gamers have the identical ranking, we’ll additionally kind by the potential
and value_eur
columns)
df.sort_values(by=['overall', 'potential', 'value_eur'], ascending=False, inplace=True)
Right here’s how the dataframe appears now.
Now, let’s take a look at how the scores (aka total) are distributed amongst all of the gamers within the World Cup 2022.
import numpy as np
fig, ax = plt.subplots(figsize=(12, 5), tight_layout=True)sns.histplot(df, x='total', binwidth=1)
bins = np.arange(df['overall'].min(), df['overall'].max(), 1)
plt.xticks(bins)
plt.present()
It appears most gamers have a ranking of 65 and 67. For our dream group, we’ll think about solely the gamers with the best ranking per place.
A easy approach to get the dream group is by dropping each duplicate within the player_positions
column (the df
is sorted by total, so the “non-duplicates” will probably be these gamers with the best scores)
df.drop_duplicates('player_positions')
If we solely choose the columns short_name
, total
, club_name
, and player_positions
, we get the dataframe under that represents our dream group.
Now that we’ve the most effective gamers for every place, we will use this Python device to make the graph of a soccer pitch and choose any participant we would like (you solely must edit the title of the gamers on the backside of the script)
Right here’s my dream group.
Now let’s create a dataframe df_best_players
that exhibits the most effective participant from every nationwide group. We’ll extract the primary 3 letters of the nationality_name
column to know the place every participant comes from.
df_best_players = df.copy()
df_best_players = df_best_players.drop_duplicates('nationality_name').reset_index(drop=True)
country_short = df_best_players['nationality_name'].str.extract('(^w{3})', broaden=False).str.higher()
df_best_players['name_nationality'] = df_best_players['short_name'] +' (' + country_short + ')'
Let’s make a barplot to visualise the df_best_players
dataframe.
fig, ax = plt.subplots(figsize=(10, 6), tight_layout=True)sns.barplot(df_best_players, x='total', y='name_nationality',
palette=sns.color_palette('pastel'), width=0.5)
plt.present()
Let’s create a perform that returns the most effective squad attainable for every nationwide group. To take action, we’ll choose the most effective 2 gamers for every place on every nationwide group. As soon as we’ve the most effective squad, we’ll kind it by player_positions
, total
and potential
.
def best_squad(nationality):
df_best_squad = df.copy()
df_best_squad = df_best_squad.groupby(['nationality_name', 'player_positions']).head(2)
df_best_squad = df_best_squad[df_best_squad['nationality_name']==nationality].sort_values(['player_positions', 'overall', 'potential'], ascending=False)
return df_best_squad
Let’s take a look at what’s the most effective squad for Brazil
best_squad('Brazil')
As you may see, we get greater than 26 gamers as a result of we’re choosing 2 gamers per place and there are round 15 positions. You can do some tweaks to pick solely 26 gamers from this dataframe.
Now let’s see what the common ranking for the squad of every nationwide group is. Then we’ll kind it descending by the average_overall
.
average_overall = [best_squad(team)['overall'].imply() for group in teams_worldcup]df_average_overall = pd.DataFrame({'Groups': teams_worldcup, 'AVG_Overall': average_overall})
df_average_overall = df_average_overall.dropna()
df_average_overall = df_average_overall.sort_values('AVG_Overall', ascending=False)
df_average_overall
Listed here are the ten nationwide groups with the best common total.
Let’s visualize this with a barplot.
fig, ax = plt.subplots(figsize=(12, 5), tight_layout=True)sns.barplot(df_average_overall[:10], x='Groups', y='AVG_Overall',
palette=sns.color_palette('pastel'))
plt.present()
Now that we’ve the most effective 26-man squad for every nationwide group, we will see what’s the most effective beginning XI and formation.
Listed here are all of the formations that I got here up with (you may add extra if you would like)
dict_formation = {
'4-3-3': ['GK', 'RB', 'CB', 'CB', 'LB', 'CDM', 'CM', 'CAM', 'RW', 'ST', 'LW'],
'4-4-2': ['GK', 'RB', 'CB', 'CB', 'LB', 'RM', 'CM', 'CM', 'LM', 'ST', 'ST'],
'4-2-3-1': ['GK', 'RB', 'CB', 'CB', 'LB', 'CDM', 'CDM', 'CAM', 'CAM', 'CAM', 'ST'],
}
Now we create a best_lineup
perform that returns the most effective beginning XI based mostly on the formation we would like.
def best_lineup(nationality, lineup):
lineup_count = [lineup.count(i) for i in lineup]df_lineup = pd.DataFrame({'place': lineup, 'rely': lineup_count})
positions_non_repeated = df_lineup[df_lineup['count'] <= 1]['position'].values
positions_repeated = df_lineup[df_lineup['count'] > 1]['position'].values
df_squad = best_squad(nationality)
df_lineup = pd.concat([
df_squad[df_squad['player_positions'].isin(positions_non_repeated)].drop_duplicates('player_positions', maintain='first'),
df_squad[df_squad['player_positions'].isin(positions_repeated)]]
)
return df_lineup[['short_name', 'overall', 'club_name', 'player_positions']]
Lastly, let’s loop by the highest 7 groups within the World Cup and return the formation with the utmost common ranking.
for index, row in df_average_overall[:7].iterrows():
max_average = None
for key, values in dict_formation.objects():
common = best_lineup(row['Teams'], values)['overall'].imply()
if max_average is None or common>max_average:
max_average = common
formation = key
print(row['Teams'], formation, max_average)
Nice! Now we’ve the most effective formation for every nationwide group.
Spain 4-2-3-1 85.1
Portugal 4-2-3-1 84.9
England 4-4-2 84.45454545454545
Brazil 4-3-3 84.81818181818181
France 4-2-3-1 83.9
Argentina 4-3-3 83.54545454545455
Germany 4-2-3-1 84.1
Let’s discover out what’s the greatest beginning XI for Spain, Portugal, England, Brazil, France, Argentina, and Germany.
1. Spain: Finest Beginning XI (4–2–3–1)
best_lineup('Spain', dict_formation['4-2-3-1'])
2. Portugal: Finest Beginning XI (4–2–3–1)
best_lineup('Portugal', dict_formation['4-2-3-1'])
3. England: Finest Beginning XI (4–4–2)
best_lineup('England', dict_formation['4-4-2'])
4. Brazil: Finest Beginning XI (4–3–3)
best_lineup('Brazil', dict_formation['4-3-3'])
5. France: Finest Beginning XI (4–2–3–1)
best_lineup('France', dict_formation['4-2-3-1'])
6. Argentina: Finest Beginning XI (4–3–3)
best_lineup('Argentina', dict_formation['4-3-3'])
7. Germany: Finest Beginning XI (4–2–3–1)
best_lineup('Germany', dict_formation['4-2-3-1'])
That’s it! You will discover the whole code on my GitHub.