Georg Ruß' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

Dezember 14th, 2009

Back from Canberra and off to Cambridge

Seems like I’m with the prefix Ca in the recent list of city names I’ve visited. Anyway, I’m back from Canberra after yet another three flights, including a 20-min bus ride at DXB and a 4-hour train ride within Germany. In hindsight it’s been really useful to present my work (past, present and future) in a comprehensive talk at the Australian Taxation Office. I had around 20 direct listeners, some of which were from The Australian National University and from the Commonwealth Scientific and Industrial Research Organisation. Some additional listeners were connected via a telephone conferencing system around the country.

My direct conversation partners and hosts were Graham Williams and Warwick Graco. I could talk about my ideas at length and got very valuable feedback from them, regarding methodologies and techniques and possible pitfalls. Apart from the business talks, the city of Canberra is really worth a visit — might be due to the fact that I’ve been shown around by these two seasoned guys who really know their city. I also happened to visit the National Gallery of Australia where Masterpieces from Paris are on display — another really worthwile exhibition.

Nevertheless, I’m off to Cambridge tomorrow, for the AI-2009 conference, yet again at freezing Peterhouse College. The slides for my talk are going to be the results of the respective paper, spiced up with some introductory and motivational slides from the ATO talk. The slides: slides-russ2009sgai.pdf.

Juli 7th, 2009

Paper for SGAI AI-2009 accepted

The paper which I mentioned in the previous post has been accepted for publication at the SGAI AI-2009 conference. The reviewers were rather confident about the paper contents and it seems that my work is quite interesting for computer scientists.

Nevertheless, I’ve started digging somewhat deeper into the issue with spatial autocorrelation which is likely to exist in the georeferenced data sets I’m using. So far, this has usually been neglected and might lead to biased results when regression is carried out. My main idea for my PhD contribution is to develop or find a regression model which does take the spatial autocorrelation into account.

To give you an idea of the data sets and fields I’m working with, here’s a georeferenced plot of the N2 fertilizer on one of the fields during 2007:

N2 dressing on one of the fields in 2007

N2 dressing on one of the fields in 2007

. R is really great for working with (georeferenced) shapefiles.

Mai 28th, 2009

Publication submitted for SGAI AI-2009

Since the Series of AI conferences by the BCS Specialist Group on AI has been useful the last two times, I decided to submit yet another paper there. Again, I’m currently working with the agriculture data for yield prediction.

The question this time is: Which of the features in the data sets I have are actually useful for yield prediction? In recent publications I have done some research into different regression models which enable yield prediction. Taking this a step further, I’m now looking at feature selection. There’s a lot of research in that area (but mostly on classification, not regression) and I’m not quite finished getting to grips with it. Nevertheless, the central story line is emerging. In the SGAI AI-2009 paper I have developed or adapted a feature selection approach which uses forward selection (i.e. starting with an empty set of features and subsequently adding the most promising ones). The subsets are then evaluated using support vector regression and regression trees. In the end, a ranking of the features is presented which points out important (relevant) features and also shows the ones that are redundant or irrelevant.

As mentioned before, I’m now using R for computing, hence the script language will also be changed. I won’t be converting scripts from matlab to R, though. The script for the above SGAI AI-2009 paper (titled: Feature Selection for Wheat Yield Prediction) can be found here: 2009sgai-featuresel.R (or use the script overview)

It also seems as if the central theme of my dissertation will be along the above lines.