Data Mining et al - Georg RuÃŸ' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

Publication submitted for SGAI AI-2009

By russ

Since the Series of AI conferences by the BCS Specialist Group on AI has been useful the last two times, I decided to submit yet another paper there. Again, I’m currently working with the agriculture data for yield prediction.

The question this time is: Which of the features in the data sets I have are actually useful for yield prediction? In recent publications I have done some research into different regression models which enable yield prediction. Taking this a step further, I’m now looking at feature selection. There’s a lot of research in that area (but mostly on classification, not regression) and I’m not quite finished getting to grips with it. Nevertheless, the central story line is emerging. In the SGAI AI-2009 paper I have developed or adapted a feature selection approach which uses forward selection (i.e. starting with an empty set of features and subsequently adding the most promising ones). The subsets are then evaluated using support vector regression and regression trees. In the end, a ranking of the features is presented which points out important (relevant) features and also shows the ones that are redundant or irrelevant.

As mentioned before, I’m now using R for computing, hence the script language will also be changed. I won’t be converting scripts from matlab to R, though. The script for the above SGAI AI-2009 paper (titled: Feature Selection for Wheat Yield Prediction) can be found here: 2009sgai-featuresel.R (or use the script overview)

It also seems as if the central theme of my dissertation will be along the above lines.

Posted in agriculture, conferences, data mining, English, SGAI AI-2009 | Kommentare deaktiviert

ICDM/MLDM 2009, both accepted

By russ

Both my publications that I’ve handed in for the ICDM and MLDM conference have been accepted. The first one with two overall positive reviews, and the second one with just one review allowing it in for the poster session.

Apart from this, there’s not much to report, except that I’m currently converting (the scripts) to R for doing the computing stuff. Seems even more high-level and more abstract than matlab. And: it’s GNU and I can use vim for script editing and running.

Posted in agriculture, conferences, data mining, English, ICDM 2009, MLDM 2009 | Kommentare deaktiviert

Matlab script for IFCS/GfKl/ADAC 2009 article

By russ

I’ll submit an extended version of what was planned for the IFCS2009 conference at the ADAC journal (also at Springer). Some of the issues raised during the IFCS conference are addressed in this.

New baselines for the regression model comparison are computed:

a simple linear regression
a naive prediction: giving previous year’s yield as prediction

The result is that support vector regression outperforms MLP, RBF, RegTree and the two above predictors.

The matlab scripts (one outer one for the data set selection, and an inner one for the actual model comparison), are on-line:

ifcs09_svm_mlp_rbf_regtree_extended.m (outer script, calls the one below)
modelcomparison.m (inner script, replaces earlier version)

Posted in agriculture, conferences, data mining, English, IFCS 2009 | Kommentare deaktiviert

Two emerging issues

By russ

I gave my talk on last week’s IFCS/GfKl conference in Dresden, sticking to the 15 minutes that had been set for speakers, with an additional five minutes for discussion. However, unbeknownst to me, Alexander Brenning attended that particular session. He had been working in the final stages of the preagro collaborative research project where my co-authors had also been involved. Hence, he knew exactly what I was talking about, knew the data and where they came from and could therefore raise two important issues:
Read the rest of this entry »

Posted in agriculture, conferences, data mining, English, IFCS 2009 | Kommentare deaktiviert

Updated slides for IFCS2009

By russ

Here are my updated slides for IFCS 2009, which I’m currently participating in: slides-russ2009ifcs.pdf

So far, there have been a lot of interesting talks and I got some new visualization ideas. In particular, Michael Greenacre had some interesting videos with morphing data points and nice visualizations to show. Earlier on in this blog, I already had this idea to visualize data (or projections of those) to find split values. So there might be some interesting things in this direction.

Posted in agriculture, conferences, data mining, English, IFCS 2009 | Kommentare deaktiviert

Submission for IDA 2009, Lyon

By russ

At the moment, I’m (in parallel) preparing slides for this week’s IFCS talk and working on an upcoming publication for IDA 2009. I will extend the work on the regression models with ideas on how to identify useful heterogeneity sensors. Some of those ideas will already be presented at this week’s talk at the IFCS. An evaluation is to follow due course.

Posted in agriculture, conferences, data mining, English, IDA 2009 | Kommentare deaktiviert

Yet another talk of mine at a conference

By russ

I spent the past five weeks thinking about how I am to proceed in my research. I’m not quite done yet with shuffling and re-structuring thoughts and ideas but I’ll be getting there sooner or later.

Anyway, there’s a talk of mine at next week’s IFCS conference in Dresden, Germany. My talk is scheduled for Sunday, March 15th, in the afternoon. I’ll present the regression model comparison shown earlier in this blog and some ideas on how to evaluate whether a sensor is useful or not for the purpose of yield prediction. I assume that the conference is going to yield some more ideas on the latter question.

My slides, as of now, are here: slides-russ2009ifcs.pdf

Posted in agriculture, conferences, data mining, English, IFCS 2009 | Kommentare deaktiviert

A comparison of regression models — ICDM 2009 conference

By russ

I’ve just finished writing a paper which deals with the data sets I have for agricultural yield prediction. This will be handed in at the ICDM 2009 in Leipzig.

The abstract of the paper:
Nowadays, precision agriculture refers to the application of state-of-the-art GPS technology in connection with small-scale, sensor-based treatment of the crop. This introduces large amounts of data which are collected and stored for later usage. Making appropriate use of these data often leads to considerable gains in efficiency and therefore economic advantages. However, the amount of data poses a data mining problem — which should be solved using data mining techniques. One of the tasks that remains to be solved is yield prediction based on available data. From a data mining perspective, this can be formulated and treated as a multi-dimensional regression task. This paper deals with appropriate regression techniques and evaluates four different techniques on selected agriculture data. A recommendation for a certain technique is provided.

Read the rest of this entry »

Posted in agriculture, conferences, data mining, English, ICDM 2009 | Kommentare deaktiviert

Extended Deadlines MLDM/ICDM

By russ

The deadlines for ICDM 2009 and MLDM 2009 have been mysteriously extended such that they coincide with the written examination for the course on Intelligent Systems, which I’m teaching this term. Nevertheless, the MLDM paper is almost finished whereas most of the work for the ICDM 2009 paper has been done, but has to be documented and ‚paperized‘ appropriately.

For the MLDM work, which is about applying Sammon’s mapping and Self-Organizing Maps to the agriculture data, there were some changes. For example, one of the data sets contains data for different fertilization strategies. This data set can also be split into two sub-data sets, one for each strategy, for in-depth analysis. One of the strategies was to use a neural network for yield prediction — and it has been unclear what kind of connections the NN has learnt so far. The idea is to visualize the data that the NN has used for training and prediction. So, projecting those data onto a mapping could yield interesting results as to the internal workings of the NN. This is more or less what the paper is about. Useful, interesting, and encouraging.

Without further commenting, here are some graphs:

I do see correlations. What about you?

Posted in agriculture, conferences, data mining, English, ICDM 2009, MLDM 2009 | Kommentare deaktiviert

Sammon’s mapping for visualization

By russ

While everyone else is at home, I rode to work and implemented Sammon’s mapping on two of the agriculture data sets. The data set descriptions are in the paper I presented at the SGAI. One of the sets has three farming strategies, the other one just two. I decided to apply the dimensionality reduction of Sammon’s mapping and it turns out that it works quite well for both of the data sets, although the interpretation is much easier for the second one. For starters, here are the plots:

$Sammon\'s mapping F330 data set$ $Sammon\'s mapping F131 data set$
Read the rest of this entry »

Posted in agriculture, conferences, data mining, English, MLDM 2009, Uncategorized | Kommentare deaktiviert

Georg RuÃŸ' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

Seiten

Kategorien

Our recent book

Data Mining et al — RSS feed

Meta

Publication submitted for SGAI AI-2009

ICDM/MLDM 2009, both accepted

Matlab script for IFCS/GfKl/ADAC 2009 article

Two emerging issues

Updated slides for IFCS2009

Submission for IDA 2009, Lyon

Yet another talk of mine at a conference

A comparison of regression models — ICDM 2009 conference

Extended Deadlines MLDM/ICDM

Sammon’s mapping for visualization