Georg Ruß' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

Juni 27th, 2010

Slides for my talk at the IPMU’2010

Just a quick post that aims to make tomorrow’s slides for my IPMU 2010 talk available. It’s going to be about the management of spatial information and especially the issues which
arise when using non-spatial models on spatial data.

Slides link: russ2010ipmu-slides.pdf

As usual, the paper is in our publication database: Data Mining in Precision Agriculture: Management of Spatial Information.

Juni 11th, 2010

Paper submission for IEEE ICDM

I’ve had a whole lot of fun writing a paper for the IEEE ICDM conference, which
is going to take place in Sydney, Australia, this year. The programming work
was there, I had some novel data sets to analyse and I came to some cool
conclusions using my homebrew algorithm which explicitly assumes spatial
autocorrelation in the data sets. I could also show that the algorithm produces
meaningless results when spatial autocorrelation does not exist.

It also implements a more or less standard hierarchical agglomerative
clustering procedure on spatial data — there just was no existing work which
fit the problem and the data set, so I had to create my own algorithm using a
straightforward and easily explainable divide-and-conquer approach. I hope that
my reviewers at the IEEE ICDM conference like the idea.

I’m still looking for an easily pronouncable acronym, maybe HACSAD-PA
will do: hierarchical agglomerative clustering for spatially autocorrelated
data from precision agriculture :-)

Read the rest of this entry »

Juni 4th, 2010

Best graduate student paper award at ICPA 2010

Yesterday I was informed that I’ve been given the best graduate student paper award at the International Conference on Precision Agriculture 2010, which is something like the flagship conference in precision agriculture, much like the IEEE ICDM (held in Sydney, Australia, this year) or the PKDD (held in Barcelona, Spain, this year) are for data mining and knowledge discovery in databases. I had to be nominated for this award and parts of the nomination included my vision on the area of precision agriculture. This is quoted below:
Read the rest of this entry »

Mai 21st, 2010

Some remarks regarding my talk

Again, my talk was really good and it felt like everyone was listening. The questions were more or less standard and can be expected when presenting this type of research to this specific IDA audience. There’s two things which I realised only after the presentation:

  • First, it’s not so much the specifics of the modeling solution that I presented. The most interesting part for the audience is rather the fact that agriculture is turning into a data-driven discipline. I caught a lot of comments along the lines of „It’s amazing what kind and what amounts of data are nowadays collected in agriculture.“
  • The other thing is that the specific modeling setup I presented is likely to be valid since the results obtained with this setup are the ones that would normally be expected. So, it’s not so much the result that REIP49 is the best predictor, but rather that the setup I created may be used for future variable importance assessment as new data come in.

I already guessed the first point, but the second one only occurred to me in hindsight. Seems to happen rather often in research, though.

Mai 20th, 2010

Staying at the Biosphere II site

Biosphere II, Arizona, USA

Biosphere II, Arizona, USA


As planned, I’m currently at the Biosphere II, somewhere in the middle of nowhere in Arizona. The conference is great, there are no parallel sessions, I can actually talk to everyone and I know roughly what everyone’s doing. We had our greenhouse tour yesterday and it was quite impressive to see such a huge structure being completely sealed from the outside.

My previous talk at UW went rather smoothly. When listening to the IDA sessions yesterday I already had a few more ideas to make my thesis somewhat more reviewer-proof — seems like a side-effect of conferences when other people’s work has an inspiring effect on one’s own work.

Mai 14th, 2010

Slides for the talk at the IDA 2010 conference

My slides for the IDA’2010 conference are here:russ2010ida-slides.pdf. My upcoming talk at the University of Waterloo will be meandering along the same lines. The respective publication for the IDA conference is here: Spatial Variable Importance Assessment for Yield Prediction in Precision Agriculture. There’s also the Springerlink URL: http://www.springerlink.com/content/p63pn0561u18r34w/.

Mai 12th, 2010

Invited talk at the University of Waterloo

I’ll be giving a talk at the University of Waterloo, Canada on Monday, May 17th, 2010. The following is from the invitation e-mail to this talk:

„Spatial data mining in precision agriculture“ is the topic of a talk
on Monday May 17 at 2pm in room EV1-1001 by PhD candidate Georg Ruß
from the Computational Intelligence group at the University of
Magdeburg, Germany. Mr. Ruß will introduce us to the utility of
geomatics technologies in precision agriculture, and present novel
spatial data mining tools that can be used to optimize the spatial
allocation of fertilizer and pesticide doses in site-specific crop
management. This involves novel statistical and machine-learning
techniques such as the support vector machine, spatial
cross-validation, and spatial clustering algorithms, as well as lots
and lots of high-resolution geodata.

The invitation was offered by Alexander Brenning. I’ll be traveling further to Tucson/Oracle/Biosphere II/Arizona for the IDA 2010 conference. Further information is going to be given right here in due course.

März 16th, 2010

Two more papers and therefore two more conferences

Our paper on Spatial Variable Importance Assessment for Yield Prediction in Precision Agriculture (with Alexander Brenning) has been accepted into the IDA 2010 conference taking place in Oracle, Arizona, at the Biosphere II site. I’ll take a stopover on the outbound flight to Arizona in Toronto to visit Alex.

And yet another paper on the topic of Data Mining in Precision Agriculture: Management of Spatial Information has been accepted for the IPMU 2010 conference, which will take place at the end of June in Dortmund, Germany. The reviews were quite encouraging and some points were made which I’d already thought of before and which are in my thesis draft.

The difference between these two conferences couldn’t be greater: 20 talks (in three days) for the IDA, and about 300 in five days for the IPMU, with a lot of sessions.

Januar 5th, 2010

R scripts for ICDM’2010

The following is a link to the R scripts which generate the figures used in the ICDM’2010 (to-be-reviewed) paper. The functions for computing the root mean squared error are in 20-*R and 21-*R, where the first is for the non-spatial case and the second is for the spatial analysis, including clustering (which is a one-liner in R, just as many other things). The relevant functions are NonSpatialRegression() and spatialPredictionWithClustering(). The scripts might not be of much use without the data sets, but they may be tailored easily to other data sets. Should you have questions, feel free to drop me a few lines, I’m happy to answer. You might also consider participating in my workshop on Data Mining in Agriculture (DMA’2010).

Link: Rscripts-icdm2010.tar

Januar 5th, 2010

Paper summary for ICDM’2010

The following is a paper summary for the ICDM 2010 conference, which will be held in Berlin during July. It mainly elaborates on the issue of spatial autocorrelation in the agriculture data I’m using. It refers to my previous publications (2008, 2009) at this conference where I presented standard regression approaches using different techniques for the task of yield prediction. It seems these techniques considerably underestimate the prediction error due to spatial autocorrelation. I therefore developed an approach based on k-means clustering to enable yield prediction on spatial data sets. The conference reports from the previous years are here: , 2008, 2009.
Read the rest of this entry »