The current area of application of the sports science data mining is in

  • olympic swimming
  • archery
  • disabled swimming

When it comes to the research targets, we are trying to

  • model the effects of different training strategies towards the outcome of an upcoming tournament,
  • predict the tournament time (or any standardized measure of success) at the Olympic Games.


So far, the neural network that was learned from training data (sic!) was generated in DataEngine. Due to the small size of the data set, cross validation had to be applied. This, however, is quite costly when it has to be done manually. Therefore, the approach is to use MatLab and automate the generation and validation of different networks and speed up the mining process. The MatLab scripting language comes in really handy, so we will want to make appropriate use of it.

Before tuning network parameters, we want to make sure that the results obtained using MatLab are consistent with the ones obtained from DataEngine. Therefore, the network parameters are as follows:

  • multi-layer-perceptron: 3 layers — 10 linear input neurons, 2 tanh hidden neurons, 1 linear output neuron
  • leave-one-out cross validation
  • backpropagation learning algorithm, random presentation of training samples
  • weight initialization in the range from -0.1 to 0.1
  • online-learning strategy
  • learning rate: 0.1, learning rate decay factor: 0.99999

Due to the small amount of available data, good results could be obtained when the plain, untrained network was pre-trained with the data from two different training phases. This initializes the network towards a stable version that can be trained with the actual data of the olympic finalist. Each pre-trained network was then fully trained up to a maximum number of 10.000 learning steps. At the end of the training process, using cross validation, the remaining training data could be fed through the neural network and the predicted result could be calculated. This was done for both phases and all n data sets. The error between the actual result and the predicted result was calculated. The obtained results were quite close to the actual ones.

References:

So far for the existing work. I will try to model the above process and the parameters with MatLab and publish the script here. Once the old results from DataEngine can be obtained with MatLab as well, the script can be fed new data from competitive swimming and archery to predict the competition results at the Olympic Games 2008.