BSc (Hons) DS Final Project

Introduction to the model

This model was built to act as an objective baseline for choosing defenders to be included in the FTOTS. Details of the overall outlook of the model are provided below.

Imputation method: eXtreme Gradient Boosting
Algorithm chosen: Extremely Randomised Trees
Number of features chosen: 43 (40 selected using a customised implementation of Recursive Feature Elimination, after which the binary-encoded league variable was included)

Accordingly, the features selected for the defender model were as follows:

The interactive plot provided below showcases the results of a three-component Principal Component Analysis undertaken (on the test set) for the aforementioned 43 variables on all of the available seasons for defenders (after imputation of nulls), with information about model choices included. Each data point here represents a defender. Colours of the points represent the following:

Points marked in green signify players included in both the actual FTOTS and the model's choices.
Points marked in grey signify players not included in either the actual FTOTS or the model's choices.
Points marked in blue signify players only included in the model's choices.
Points marked in pink signify players only included in the actual FTOTS.

Click on legend once to exclude a trace. Double-click to isolate a trace.

The same plot, broken down by leagues, are provided below. Overall, it was observable that the points in blue (i.e. players only included in the model's choices) seemed to be scattered closer to the points in green (players included in both the actual FTOTS and the model's choices) than the points in pink (players only included in the actual FTOTS). This could essentially point to a general trend for choosing players into the FTOTS having been captured by the model.

Of the 111 possible inclusions in the test set, the defender model was able to choose an equivalent (either the exact same player chosen by the current system or one with the same score in the scoring method utilised) or better alternative to the actual inclusion 79.28% of the time; directly better alternatives were chosen 30.63% of the time.

More information on the defenders in the test set included in the FTOTS and those chosen by the model can be accessed by seasons under findings.

BSc (Hons) Data Science Final Project | COBScDS221P-008

Defenders

Introduction to the model