An overview of the model created for goalkeepers.
This model was built to act as an objective baseline for choosing goalkeepers to be included in the FTOTS. Details of the overall outlook of the model are provided below.
Imputation method: eXtreme Gradient Boosting
Algorithm chosen: Random Forest
Number of features chosen: 35 (32 selected using a customised implementation of Recursive Feature Elimination, after which the binary-encoded league variable was included)
Accordingly, the features selected for the goalkeeper model were as follows:
The interactive plot provided below showcases the results of a three-component Principal Component Analysis undertaken (on the test set) for the aforementioned 35 variables on all of the available seasons for goalkeepers (after imputation of nulls), with information about model choices included. Each data point here represents a goalkeeper. Colours of the points represent the following:
Click on legend once to exclude a trace. Double-click to isolate a trace.
The same plot, broken down by leagues, are provided below. Overall, it was observable that the points in blue (i.e. players only included in the model's choices) seemed to be scattered closer to the points in green (players included in both the actual FTOTS and the model's choices) than the points in pink (players only included in the actual FTOTS). This could essentially point to a general trend for choosing players into the FTOTS having been captured by the model.
Of the 35 possible inclusions in the test set, the goalkeeper model was able to choose an equivalent (either the exact same player chosen by the current system or one with the same score in the scoring method utilised) or better alternative 91.43% of the time; directly better alternatives were chosen 40% of the time.
More information on the goalkeepers in the test set included in the FTOTS and those chosen by the model can be accessed by seasons under findings.