Data Mining et al - Georg RuÃŸ' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

The „squared“ in „mean squared error“

By russ

The latest experiments with the sports science data always turned out outrageously high errors which most of the time missed the scale of the original attribute by orders of magnitude. After conducting some experiments which returned negative conclusions like:

the error is too high, therefore the network cannot be trained on these data or
there is some tendency to overfitting when the network size and the learning rate increase, but the error is way too large anyway,

I presented the examples to the sports science people which were also quite surprised about the error’s order of magnitude. When returning back to my office, I had the sudden idea what I had actually shown in the graphs — after taking the sqrt of the mse I ended up with what I had actually wanted to show in the plots: the absolute error, i.e. the absolute of the difference between the predicted result from the network and the actual result. It’s somewhat embarrassing, anyway, but at least I know the cause of the high-error issue.

For the plot below I also got some advice from the sports science people to take out some redundant data (four columns out of 24 are more or less sums of three others each). Now the error is in a region that is more reasonable when a min/max of 841/1004 are expected for the target attribute. The plot shows error vs. the size of the first and second hidden layer of the feed-forward network, respectively.

Posted in data mining, English, sports science | Kommentare deaktiviert

Overfitting vs. Cross-Validation

By russ

One more experiment with the sports science data clearly shows the issue of overfitting of the neural network. I devised a script that automatically generates networks with two hidden layers of neurons and varies the layers‘ sizes from 2 to 16 systematically, respectively. No matter how the parameters for the learning process were set, the mean squared error plotted against the two layer sizes (right: first hidden layer, left: second hidden layer) is zero as soon as the networks get larger:

However, when applying cross-validation (test set: 1 record, validation set: 1 record, training set: 38 records), the error, especially towards the larger layer sizes, rises. This is a clear sign of overfitting (scales as in the last figure):

As usual, the matlab script for this entry, which doesn’t differ much from the latest ones: neuro10-new_plot.m.

Posted in data mining, English, sports science | Kommentare deaktiviert

Normalization in the context of sports science data

By russ

At the moment I’m somewhat split between teaching „Intelligent Systems“ courses and thinking about the agriculture data as well as the sports science data. In yesterday’s meeting, we discussed the three data blocks that are available from them. I’ll receive the purged data soon. One thing that was mentioned was the issue of normalization. Read the rest of this entry »

Posted in data mining, English, sports science | Kommentare deaktiviert

Highly recommended: Matlab’s nnet manual

By russ

Diving deeper into Matlab’s endless built-in functions, I discovered (i.e. read) Mathworks‘ nnet manual. I usually abhor user manuals for specific programming languages, but Mathworks has made it an enjoyable read.
Read the rest of this entry »

Posted in agriculture, data mining, English, sports science | Kommentare deaktiviert

Some more results for the sports science data

By russ

I finally ended up simplifying the whole task and starting from the very beginning. I had two data sets of two athletes with the same training attributes (data columns). The earlier matlab script did some sort of pretraining with the one dataset and some sort of main training and cross validation with the second dataset. Remember, I am still trying to reproduce the results from the paper (which were generated with Data Engine) using MatLab.
Read the rest of this entry »

Posted in data mining, English, sports science | Kommentare deaktiviert

Some results for the sports science data

By russ

The prediction capabilities of the neural network that was coded in the last post do not seem to be as good as expected, at least not in the standard configuration. When I fed the data set (which I will not publish here) through the network and the cross validation, the results are as follows:
Read the rest of this entry »

Posted in data mining, English, sports science | Kommentare deaktiviert

MatLab script v1 for the sports science data

By russ

A well-commented script that tries to model the data mining process from the sports scientists is online.
Below is a quick screenshot for reading, the script can be downloaded here.

There are some steps (two main steps) for training the network:

Since there is not much data available for training, additional data was taken fromÂ anotherÂ athlete.
the network is initialized once and stored in a variable,
the network is pre-trained: it is assumed that it can then better adapt to the actual training data,
the main training is performed starting from the pre-trained network,
this is repeated for (number of data) and cross validation is carried out.

Read the rest of this entry »

Posted in data mining, English, sports science | Kommentare deaktiviert

Details on the sports science data mining process

By russ

The current area of application of the sports science data mining is in

olympic swimming
archery
disabled swimming

When it comes to the research targets, we are trying to

model the effects of different training strategies towards the outcome of an upcoming tournament,
predict the tournament time (or any standardized measure of success) at the Olympic Games.

Read the rest of this entry »

Posted in data mining, English, sports science | Kommentare deaktiviert

Prediction using sports science data

By russ

This project ties in with earlier work done by JÃ¼rgen Edelmann-Nusser and Nico Ganter: predicting athletes‘ tournament swimming times using only their training data. It works as follows:

Â During the athletes‘ training sessions, their amount of training in different disciplines (running, strength, stamina) is recorded.
TheÂ athletesÂ completeÂ aÂ tournamentÂ andÂ theirÂ resultsÂ areÂ recordedÂ asÂ well.
TheseÂ data, consisting of trainingÂ timesÂ and fields andÂ theÂ respectiveÂ resultÂ inÂ tournament,Â canÂ beÂ usedÂ toÂ trainÂ oneÂ orÂ moreÂ neuralÂ networks.
OnceÂ theÂ neuralÂ networksÂ areÂ trained,Â oneÂ canÂ predictÂ or try to predict the outcome of the upcoming tournament.
Furthermore, one could adapt the athletes‘ training strategy by varying the training parameters and applying the strategy with the best predicted tournament result.

Presumably, this work will be done using MatLab and its nnet Neural Networks toolbox. Since I’m on the application side of the work, I will probably be scripting the neural network stuff in MatLab and publish the scripts here.

Posted in data mining, English, sports science | Kommentare deaktiviert

Georg RuÃŸ' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

Seiten

Kategorien

Our recent book

Data Mining et al — RSS feed

Meta

The „squared“ in „mean squared error“

Overfitting vs. Cross-Validation

Normalization in the context of sports science data

Highly recommended: Matlab’s nnet manual

Some more results for the sports science data

Some results for the sports science data

MatLab script v1 for the sports science data

Details on the sports science data mining process

Prediction using sports science data