Georg Ruß’ Blog — matlab, data mining, neural networks …

February 26th, 2010

Hierarchical Average-Linkage Clustering with Spatial Constraints

Got it. Last week I finished the programming work for something I’d call hierarchical agglomerative average-linkage clustering with spatial constraints. Seems complicated, sounds complicated, but the R-code is something around 300 lines long. The exact reasoning for why I did what is going to be in one of the next papers. Nevertheless, two illustrative videos emerged.

February 10th, 2010
February 10th, 2010

Two talks given, covering my thesis

In preparation of my thesis defence, I recently gave two more or less local talks at our university. The first took place on Jan 14th 2010, as part of the DKE-Doktorandenkolloquium. My talk covered the three major parts of my dissertation, namely the introduction of spatial cross-validation for yield prediction, variable importance measurements and a spatial clustering approach tailored to precision agriculture data. My direct supervisor happened to be there and gave me positive feedback and some things to waterproof for a final defence presentation of my thesis. Some questions could be preempted or at least I could prepare (for) the answers beforehand. The slides:slides-russ2010dkekoll.pdf.

Then, in fulfillment of our institute’s formal requirements for obtaining a PhD title, I gave my talk at the Doktorandentag on Feb 02 2010, which is some kind of preparatory talk for the final defence. Again, my talk went much better than the one before, even though it was heavily shortened. I guess there was some talk routine now, after the numerous conferences I went to. It seems that this was the second talk I gave in German during my PhD time. That’s why the slides are in German, too: slides-russ2010dday.pdf.
Read the rest of this entry »

January 5th, 2010

R scripts for ICDM’2010

The following is a link to the R scripts which generate the figures used in the ICDM’2010 (to-be-reviewed) paper. The functions for computing the root mean squared error are in 20-*R and 21-*R, where the first is for the non-spatial case and the second is for the spatial analysis, including clustering (which is a one-liner in R, just as many other things). The relevant functions are NonSpatialRegression() and spatialPredictionWithClustering(). The scripts might not be of much use without the data sets, but they may be tailored easily to other data sets. Should you have questions, feel free to drop me a few lines, I’m happy to answer. You might also consider participating in my workshop on Data Mining in Agriculture (DMA’2010).

Link: Rscripts-icdm2010.tar

January 5th, 2010

Paper summary for ICDM’2010

The following is a paper summary for the ICDM 2010 conference, which will be held in Berlin during July. It mainly elaborates on the issue of spatial autocorrelation in the agriculture data I’m using. It refers to my previous publications (2008, 2009) at this conference where I presented standard regression approaches using different techniques for the task of yield prediction. It seems these techniques considerably underestimate the prediction error due to spatial autocorrelation. I therefore developed an approach based on k-means clustering to enable yield prediction on spatial data sets. The conference reports from the previous years are here: , 2008, 2009.
Read the rest of this entry »

December 14th, 2009

Back from Canberra and off to Cambridge

Seems like I’m with the prefix Ca in the recent list of city names I’ve visited. Anyway, I’m back from Canberra after yet another three flights, including a 20-min bus ride at DXB and a 4-hour train ride within Germany. In hindsight it’s been really useful to present my work (past, present and future) in a comprehensive talk at the Australian Taxation Office. I had around 20 direct listeners, some of which were from The Australian National University and from the Commonwealth Scientific and Industrial Research Organisation. Some additional listeners were connected via a telephone conferencing system around the country.

My direct conversation partners and hosts were Graham Williams and Warwick Graco. I could talk about my ideas at length and got very valuable feedback from them, regarding methodologies and techniques and possible pitfalls. Apart from the business talks, the city of Canberra is really worth a visit — might be due to the fact that I’ve been shown around by these two seasoned guys who really know their city. I also happened to visit the National Gallery of Australia where Masterpieces from Paris are on display — another really worthwile exhibition.

Nevertheless, I’m off to Cambridge tomorrow, for the AI-2009 conference, yet again at freezing Peterhouse College. The slides for my talk are going to be the results of the respective paper, spiced up with some introductory and motivational slides from the ATO talk. The slides: slides-russ2009sgai.pdf.

December 4th, 2009

Slides for my talk at the ATO

I’ll be giving a talk at the Australian Taxation Office, likely to take place on 11th of December at 1100 local time (Canberra, ACT). The slides can be obtained here: slides-russ2009ato.pdf. The abstract is as follows:

Data Mining in Agriculture

In recent years, due to new and affordable technological advances, data
collection has turned into an everyday task. Nowadays, especially with the
advent of the global positioning system and modern farming vehicles, sensors
and equipment, even agriculture has turned into a data-driven discipline
called precision agriculture. However, as in numerous other research and
production areas, collecting data is not sufficient for economic or ecological
well-being. The collected data have to be mined and turned into usable
knowledge.

Therefore, this talk presents some approaches towards data mining in
agriculture. The talk will begin with a short overview about the origins of
the actual agriculture data. The difference between spatial and non-spatial
approaches will be emphasised using an example of yield prediction. Some of
the non-spatial techniques such as clustering, regression and feature
selection may be carried over to spatial approaches. Most of the presented
work considers very recent issues which remain unsolved in this discipline so
far (at least to the speaker’s knowledge). Furthermore, the presented work is
an excerpt of what is going to be the speaker’s PhD thesis, which is likely to
cover Data Mining in Agriculture from a computer scientist’s perspective.

November 19th, 2009

Research Update

At the end of the summer term and into the first few weeks of our university’s winter term I have been able to continue to do my research, much unlike my work in recent years when I shifted back to teaching activities. I’ve been able to fill a complete chapter of what is likely to be my dissertation thesis. Read the rest of this entry »

September 18th, 2009

Workshop Invitation: Data Mining in Agriculture

I have been invited by Petra Perner, the head of the ibai institute which organized the ICDM and MLDM conferences, to hold a workshop on “data mining in agriculture” at next year’s ICDM conference, which will be taking place in July in Berlin.

The website is currently being constructed: http://dma2010.de. The important details are there and the pdf call for papers will be published soon.

July 27th, 2009

Report: MLDM 2009

Last week I also participated in the MLDM 2009, which is a biennial conference for Machine Learning and Data Mining, organised by the same team as the ICDM series. My paper was accepted as a poster presentation and I also chaired a session on association rules, which happens to be strongly related to my diploma thesis. The conference was a bit larger than the ICDM, with around 60 scheduled talks, of which 48 took place due to dropouts. It was a bit more theoretical than the ICDM, but still really worth it since usually the data mining problems were closely motivated by real-world problems.
Read the rest of this entry »