Georg Ruß' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

Mai 28th, 2009

Publication submitted for SGAI AI-2009

Since the Series of AI conferences by the BCS Specialist Group on AI has been useful the last two times, I decided to submit yet another paper there. Again, I’m currently working with the agriculture data for yield prediction.

The question this time is: Which of the features in the data sets I have are actually useful for yield prediction? In recent publications I have done some research into different regression models which enable yield prediction. Taking this a step further, I’m now looking at feature selection. There’s a lot of research in that area (but mostly on classification, not regression) and I’m not quite finished getting to grips with it. Nevertheless, the central story line is emerging. In the SGAI AI-2009 paper I have developed or adapted a feature selection approach which uses forward selection (i.e. starting with an empty set of features and subsequently adding the most promising ones). The subsets are then evaluated using support vector regression and regression trees. In the end, a ranking of the features is presented which points out important (relevant) features and also shows the ones that are redundant or irrelevant.

As mentioned before, I’m now using R for computing, hence the script language will also be changed. I won’t be converting scripts from matlab to R, though. The script for the above SGAI AI-2009 paper (titled: Feature Selection for Wheat Yield Prediction) can be found here: 2009sgai-featuresel.R (or use the script overview)

It also seems as if the central theme of my dissertation will be along the above lines.

April 16th, 2009

ICDM/MLDM 2009, both accepted

Both my publications that I’ve handed in for the ICDM and MLDM conference have been accepted. The first one with two overall positive reviews, and the second one with just one review allowing it in for the poster session.

Apart from this, there’s not much to report, except that I’m currently converting (the scripts) to R for doing the computing stuff. Seems even more high-level and more abstract than matlab. And: it’s GNU and I can use vim for script editing and running.

März 23rd, 2009

Matlab script for IFCS/GfKl/ADAC 2009 article

I’ll submit an extended version of what was planned for the IFCS2009 conference at the ADAC journal (also at Springer). Some of the issues raised during the IFCS conference are addressed in this.

New baselines for the regression model comparison are computed:

  • a simple linear regression
  • a naive prediction: giving previous year’s yield as prediction

The result is that support vector regression outperforms MLP, RBF, RegTree and the two above predictors.

The matlab scripts (one outer one for the data set selection, and an inner one for the actual model comparison), are on-line:

März 23rd, 2009

Two emerging issues

I gave my talk on last week’s IFCS/GfKl conference in Dresden, sticking to the 15 minutes that had been set for speakers, with an additional five minutes for discussion. However, unbeknownst to me, Alexander Brenning attended that particular session. He had been working in the final stages of the preagro collaborative research project where my co-authors had also been involved. Hence, he knew exactly what I was talking about, knew the data and where they came from and could therefore raise two important issues:
Read the rest of this entry »

März 14th, 2009

Updated slides for IFCS2009

Here are my updated slides for IFCS 2009, which I’m currently participating in: slides-russ2009ifcs.pdf

So far, there have been a lot of interesting talks and I got some new visualization ideas. In particular, Michael Greenacre had some interesting videos with morphing data points and nice visualizations to show. Earlier on in this blog, I already had this idea to visualize data (or projections of those) to find split values. So there might be some interesting things in this direction.

März 11th, 2009

Submission for IDA 2009, Lyon

At the moment, I’m (in parallel) preparing slides for this week’s IFCS talk and working on an upcoming publication for IDA 2009. I will extend the work on the regression models with ideas on how to identify useful heterogeneity sensors. Some of those ideas will already be presented at this week’s talk at the IFCS. An evaluation is to follow due course.

März 6th, 2009

Yet another talk of mine at a conference

I spent the past five weeks thinking about how I am to proceed in my research. I’m not quite done yet with shuffling and re-structuring thoughts and ideas but I’ll be getting there sooner or later.

Anyway, there’s a talk of mine at next week’s IFCS conference in Dresden, Germany. My talk is scheduled for Sunday, March 15th, in the afternoon. I’ll present the regression model comparison shown earlier in this blog and some ideas on how to evaluate whether a sensor is useful or not for the purpose of yield prediction. I assume that the conference is going to yield some more ideas on the latter question.

My slides, as of now, are here: slides-russ2009ifcs.pdf

Januar 26th, 2009

A comparison of regression models — ICDM 2009 conference

I’ve just finished writing a paper which deals with the data sets I have for agricultural yield prediction. This will be handed in at the ICDM 2009 in Leipzig.

The abstract of the paper:
Nowadays, precision agriculture refers to the application of state-of-the-art GPS technology in connection with small-scale, sensor-based treatment of the crop. This introduces large amounts of data which are collected and stored for later usage. Making appropriate use of these data often leads to considerable gains in efficiency and therefore economic advantages. However, the amount of data poses a data mining problem — which should be solved using data mining techniques. One of the tasks that remains to be solved is yield prediction based on available data. From a data mining perspective, this can be formulated and treated as a multi-dimensional regression task. This paper deals with appropriate regression techniques and evaluates four different techniques on selected agriculture data. A recommendation for a certain technique is provided.

Read the rest of this entry »

Januar 12th, 2009

Extended Deadlines MLDM/ICDM

The deadlines for ICDM 2009 and MLDM 2009 have been mysteriously extended such that they coincide with the written examination for the course on Intelligent Systems, which I’m teaching this term. Nevertheless, the MLDM paper is almost finished whereas most of the work for the ICDM 2009 paper has been done, but has to be documented and ‚paperized‘ appropriately.

For the MLDM work, which is about applying Sammon’s mapping and Self-Organizing Maps to the agriculture data, there were some changes. For example, one of the data sets contains data for different fertilization strategies. This data set can also be split into two sub-data sets, one for each strategy, for in-depth analysis. One of the strategies was to use a neural network for yield prediction — and it has been unclear what kind of connections the NN has learnt so far. The idea is to visualize the data that the NN has used for training and prediction. So, projecting those data onto a mapping could yield interesting results as to the internal workings of the NN. This is more or less what the paper is about. Useful, interesting, and encouraging.

Without further commenting, here are some graphs:

I do see correlations. What about you?

Dezember 30th, 2008

Sammon’s mapping for visualization

While everyone else is at home, I rode to work and implemented Sammon’s mapping on two of the agriculture data sets. The data set descriptions are in the paper I presented at the SGAI. One of the sets has three farming strategies, the other one just two. I decided to apply the dimensionality reduction of Sammon’s mapping and it turns out that it works quite well for both of the data sets, although the interpretation is much easier for the second one. For starters, here are the plots:

Sammon\'s mapping F330 data setSammon\'s mapping F131 data set
Read the rest of this entry »