Georg Ruß' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

Mai 21st, 2010

Some remarks regarding my talk

Again, my talk was really good and it felt like everyone was listening. The questions were more or less standard and can be expected when presenting this type of research to this specific IDA audience. There’s two things which I realised only after the presentation:

  • First, it’s not so much the specifics of the modeling solution that I presented. The most interesting part for the audience is rather the fact that agriculture is turning into a data-driven discipline. I caught a lot of comments along the lines of „It’s amazing what kind and what amounts of data are nowadays collected in agriculture.“
  • The other thing is that the specific modeling setup I presented is likely to be valid since the results obtained with this setup are the ones that would normally be expected. So, it’s not so much the result that REIP49 is the best predictor, but rather that the setup I created may be used for future variable importance assessment as new data come in.

I already guessed the first point, but the second one only occurred to me in hindsight. Seems to happen rather often in research, though.

Mai 20th, 2010

Staying at the Biosphere II site

Biosphere II, Arizona, USA

Biosphere II, Arizona, USA

As planned, I’m currently at the Biosphere II, somewhere in the middle of nowhere in Arizona. The conference is great, there are no parallel sessions, I can actually talk to everyone and I know roughly what everyone’s doing. We had our greenhouse tour yesterday and it was quite impressive to see such a huge structure being completely sealed from the outside.

My previous talk at UW went rather smoothly. When listening to the IDA sessions yesterday I already had a few more ideas to make my thesis somewhat more reviewer-proof — seems like a side-effect of conferences when other people’s work has an inspiring effect on one’s own work.

Mai 14th, 2010

Slides for the talk at the IDA 2010 conference

My slides for the IDA’2010 conference are here:russ2010ida-slides.pdf. My upcoming talk at the University of Waterloo will be meandering along the same lines. The respective publication for the IDA conference is here: Spatial Variable Importance Assessment for Yield Prediction in Precision Agriculture. There’s also the Springerlink URL:

Mai 12th, 2010

Invited talk at the University of Waterloo

I’ll be giving a talk at the University of Waterloo, Canada on Monday, May 17th, 2010. The following is from the invitation e-mail to this talk:

„Spatial data mining in precision agriculture“ is the topic of a talk
on Monday May 17 at 2pm in room EV1-1001 by PhD candidate Georg Ruß
from the Computational Intelligence group at the University of
Magdeburg, Germany. Mr. Ruß will introduce us to the utility of
geomatics technologies in precision agriculture, and present novel
spatial data mining tools that can be used to optimize the spatial
allocation of fertilizer and pesticide doses in site-specific crop
management. This involves novel statistical and machine-learning
techniques such as the support vector machine, spatial
cross-validation, and spatial clustering algorithms, as well as lots
and lots of high-resolution geodata.

The invitation was offered by Alexander Brenning. I’ll be traveling further to Tucson/Oracle/Biosphere II/Arizona for the IDA 2010 conference. Further information is going to be given right here in due course.

März 16th, 2010

Two more papers and therefore two more conferences

Our paper on Spatial Variable Importance Assessment for Yield Prediction in Precision Agriculture (with Alexander Brenning) has been accepted into the IDA 2010 conference taking place in Oracle, Arizona, at the Biosphere II site. I’ll take a stopover on the outbound flight to Arizona in Toronto to visit Alex.

And yet another paper on the topic of Data Mining in Precision Agriculture: Management of Spatial Information has been accepted for the IPMU 2010 conference, which will take place at the end of June in Dortmund, Germany. The reviews were quite encouraging and some points were made which I’d already thought of before and which are in my thesis draft.

The difference between these two conferences couldn’t be greater: 20 talks (in three days) for the IDA, and about 300 in five days for the IPMU, with a lot of sessions.

Februar 26th, 2010

Hierarchical Average-Linkage Clustering with Spatial Constraints

Got it. Last week I finished the programming work for something I’d call hierarchical agglomerative average-linkage clustering with spatial constraints. Seems complicated, sounds complicated, but the R-code is something around 300 lines long. The exact reasoning for why I did what is going to be in one of the next papers. Nevertheless, two illustrative videos emerged.

Februar 10th, 2010


F440, k-means tesselation

F440, k-means tesselation

Read the rest of this entry »

Februar 10th, 2010

Two talks given, covering my thesis

In preparation of my thesis defence, I recently gave two more or less local talks at our university. The first took place on Jan 14th 2010, as part of the DKE-Doktorandenkolloquium. My talk covered the three major parts of my dissertation, namely the introduction of spatial cross-validation for yield prediction, variable importance measurements and a spatial clustering approach tailored to precision agriculture data. My direct supervisor happened to be there and gave me positive feedback and some things to waterproof for a final defence presentation of my thesis. Some questions could be preempted or at least I could prepare (for) the answers beforehand. The slides:slides-russ2010dkekoll.pdf.

Then, in fulfillment of our institute’s formal requirements for obtaining a PhD title, I gave my talk at the Doktorandentag on Feb 02 2010, which is some kind of preparatory talk for the final defence. Again, my talk went much better than the one before, even though it was heavily shortened. I guess there was some talk routine now, after the numerous conferences I went to. It seems that this was the second talk I gave in German during my PhD time. That’s why the slides are in German, too: slides-russ2010dday.pdf.
Read the rest of this entry »

Januar 5th, 2010

R scripts for ICDM’2010

The following is a link to the R scripts which generate the figures used in the ICDM’2010 (to-be-reviewed) paper. The functions for computing the root mean squared error are in 20-*R and 21-*R, where the first is for the non-spatial case and the second is for the spatial analysis, including clustering (which is a one-liner in R, just as many other things). The relevant functions are NonSpatialRegression() and spatialPredictionWithClustering(). The scripts might not be of much use without the data sets, but they may be tailored easily to other data sets. Should you have questions, feel free to drop me a few lines, I’m happy to answer. You might also consider participating in my workshop on Data Mining in Agriculture (DMA’2010).

Link: Rscripts-icdm2010.tar

Januar 5th, 2010

Paper summary for ICDM’2010

The following is a paper summary for the ICDM 2010 conference, which will be held in Berlin during July. It mainly elaborates on the issue of spatial autocorrelation in the agriculture data I’m using. It refers to my previous publications (2008, 2009) at this conference where I presented standard regression approaches using different techniques for the task of yield prediction. It seems these techniques considerably underestimate the prediction error due to spatial autocorrelation. I therefore developed an approach based on k-means clustering to enable yield prediction on spatial data sets. The conference reports from the previous years are here: , 2008, 2009.
Read the rest of this entry »