Georg Ruß' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

Oktober 1st, 2007

Some results for the sports science data

The prediction capabilities of the neural network that was coded in the last post do not seem to be as good as expected, at least not in the standard configuration. When I fed the data set (which I will not publish here) through the network and the cross validation, the results are as follows:
Read the rest of this entry »

September 28th, 2007

MatLab script v1 for the sports science data

A well-commented script that tries to model the data mining process from the sports scientists is online.
Below is a quick screenshot for reading, the script can be downloaded here.

There are some steps (two main steps) for training the network:

  • Since there is not much data available for training, additional data was taken from another athlete.
  • the network is initialized once and stored in a variable,
  • the network is pre-trained: it is assumed that it can then better adapt to the actual training data,
  • the main training is performed starting from the pre-trained network,
  • this is repeated for (number of data) and cross validation is carried out.

Read the rest of this entry »

September 20th, 2007

Details on the sports science data mining process

The current area of application of the sports science data mining is in

  • olympic swimming
  • archery
  • disabled swimming

When it comes to the research targets, we are trying to

  • model the effects of different training strategies towards the outcome of an upcoming tournament,
  • predict the tournament time (or any standardized measure of success) at the Olympic Games.

Read the rest of this entry »

September 18th, 2007

Prediction using sports science data

This project ties in with earlier work done by Jürgen Edelmann-Nusser and Nico Ganter: predicting athletes‘ tournament swimming times using only their training data. It works as follows:

  •  During the athletes‘ training sessions, their amount of training in different disciplines (running, strength, stamina) is recorded.
  • The athletes complete a tournament and their results are recorded as well.
  • These data, consisting of training times and fields and the respective result in tournament, can be used to train one or more neural networks.
  • Once the neural networks are trained, one can predict or try to predict the outcome of the upcoming tournament.
  • Furthermore, one could adapt the athletes‘ training strategy by varying the training parameters and applying the strategy with the best predicted tournament result.

Presumably, this work will be done using MatLab and its nnet Neural Networks toolbox. Since I’m on the application side of the work, I will probably be scripting the neural network stuff in MatLab and publish the scripts here.

September 13th, 2007

Classification using neuroscience data

Based on work by Christoph Reichert (diploma thesis, computer science) and his supervisor Jochem Rieger who works at the neuroscience school of the medical department, they seem to advance towards a certain cooperation between neuroscience and computer science. In a typical neurological experiment, a subject is presented a stimulus (an image) and he has to choose if he will recognize that particular image later on. During this time, his brain’s activity is recorded using MEG with high spatial (i.e. loads of sensors) and high temporal resolution. This activity is made accessible to a computer scientist using MatLab.

First, the task is to predict, from brain activity only, whether the subject will recognize the image or not. Due to the high dimensionality of the data, this classification (yes/no) will be performed by an SVM. From the SVM (or its separating hyperplane) the most significant activity that lead to the choice of the classification plane can be obtained. Therefore, the classifier contributes to understanding which part of the brain is the most active or most relevant for the given task. Furthermore, a transformation from the spatial to the frequency domain using wavelets showed some more interesting, additional results.

This work will be continued and the results so far look very promising.

September 12th, 2007

Optimization using agricultural data

A friend of mine works at the Martin-Luther-Universität Halle-Wittenberg in the agriculture department and he’s interested in using neural networks for yield optimization and/or forecasting. The project has much to do with precision farming, i.e., using modern differential GPS and sensor technology to automate and optimize fertilizer usage. It works as follows:

  • In one year, the amount of fertilizer used and the yield are measured and the data is stored. This is done on small-scale areas, hence the term precision farming.
  • Miscellaneous data from various sensors are obtained and stored with the yield data.
  • A neural network (or other forecasting methods) or decision trees can be learned from the data.
  • In subsequent years, using currently measured sensor input, the amount of fertilizer can be adjusted precisely to optimize, environmentally or economically.

The project could be developed towards a full-scale, all-inclusive solution for farmers:

  1. A contract between the data mining company (DMC) and the farmer is agreed upon.
  2. The DMC supplies (lends) sensory equipment to the farmers.
  3. The farmers fertilize and harvest the crop as usual, thereby harvesting data as well.
  4. The DMC acquires the data and mines them.
  5. In subsequent years, the farmer can obtain optimized fertilization data from the DMC. He can use it to reduce fertilizer usage or to optimize otherwise.

The ideas are based on work by Georg Weigert N-Düngung bei Winterweizen mit Entscheidungsbäumen (or his dissertation) and Martin Schneider.

September 7th, 2007

Blogging reloaded

Since doing a PhD is quite demanding and requires setting deadlines and documenting work accomplished, this blog will serve as both. There are currently three projects that I’m interested in that came up during the last six months:

  • data mining w/ optimization on agricultural data
  • data mining w/ classification on neuroscience data
  • data mining w/ prediction on sports science data

I will give some more details on these data mining projects soon.