Georg Ruß' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

Mai 14th, 2008

Präziser Ackerbau erreicht die F.A.Z.

Inzwischen erreicht das Thema Precision Farming (oder Präziser Ackerbau bzw. Teilflächenspezifische Bewirtschaftung) auch die Massenmedien. In der Frankfurter Allgemeinen Zeitung vom 10.05.2008 ist auf Seite 20 ein Artikel mit der Überschrift Mit GPS in die Ackerfurche erschienen. Die vor allem wirtschaftlichen Vorteile von PF beim Pflügen, Düngen und bei der Aussaat werden klar herausgestellt. Auch die Möglichkeiten von Satellitenfotos werden erwähnt und die damit zu erzielenden teilweise beträchtlichen Kosteneinsparungen sind nicht zu verachten.

Hinzugefügt am 19.05.2008 Ähnlicher Artikel, etwas emotionaler, in der Wirtschaftswoche.

Mai 8th, 2008

Preparing another paper for SGAI AI-2008

I feel as if I had just returned from AI-2007 (at least the expenses were paid just recently by the University) and there’s the deadline approaching for AI-2008, again held at Peterhouse College, UK.
Read the rest of this entry »

April 22nd, 2008

Interesting article on „Science 2.0“

Scientific American just posted an interesting article on Science 2.0, which is namely describing the use of Web2.0-techniques in scientific work. Well, I’ve somehow done that transition already with this blog. There’s also, for larger research groups, the wiki technique to enable closer collaboration. There have been earlier attempts from the MIT, called Open WetWare, which is used to

promote the sharing of information, know-how, and wisdom among researchers and groups who are working in biology & biological engineering.

According to the SciAm article, people publish almost all of their work in progress online, like How-Tos, intermediate results from experiments, publication discussion and so on. This is also my philosophy of working in research. It’s not publish or perish but to get into discussion or high-level arguments, receiving early feedback and promoting useful ideas and hints to other researchers.

At the moment I’m still collecting ideas for data mining of agricultural data. It looks as if it’s going to be research and experimentation with supervised learning methods on those data and, in the process, describing the data flow from the wheat field to the machine learning task and back to the field. The next paper deadline is for the AI-2008, again held at Peterhouse College, Cambridge, UK. I’ll have to sort out the content first, though.

März 31st, 2008

Another paper accepted, ICDM’08

As recent as of March 28th, another paper of my/our work on the agriculture data got accepted. The conference is the Industrial Conference on Data Mining 2008, which will be taking place not far away in Leipzig, Germany. Based on the results and the information from those conferences, I might try to submit advanced work at BCS AI-2008, which will again take place at (frosty) Peterhouse College in Cambridge, UK. I won’t be there for long, though, as offspring is already on his/her way.

März 10th, 2008

Two papers accepted

Two papers of mine have been accepted recently. Both describe the process of data mining with neural networks for agriculture data. Therefore I’m quite confident that  this will be my PhD thesis‘ major point of interest. The two conferences, which also provided worthwile and in-depth reviews, are IPMU 2008, in Málaga, Spain and IFIP AI 2008 in Milano, Italy. Of course, papers have to be redacted and my presentation will have to be prepared, but it’s encouraging nevertheless.

Januar 18th, 2008

More data (analysis) in agriculture

There are quite a few deadlines for publications approaching in January and I will submit another paper detailing some of the recent accomplishments on the agriculture data there. One of the conferences is ICDM 2008, held in Leipzig, Germany. It targets industrial applications of data mining and I felt the paper fit in there quite nicely.

I have also received more data sets from Martin Schneider at Martin-Luther-University of Halle-Wittenberg which will have to be mined. There are quite a lot of interesting tasks to be performed on those data — that requires thorough planning. I probably won’t be able to do that planning until my return from the organizational business trip to Melbourne, starting a cooperation project between our research group and the one that I worked with in 2004/2005.

Read the rest of this entry »

Januar 8th, 2008

The „squared“ in „mean squared error“

The latest experiments with the sports science data always turned out outrageously high errors which most of the time missed the scale of the original attribute by orders of magnitude. After conducting some experiments which returned negative conclusions like:

  • the error is too high, therefore the network cannot be trained on these data or
  • there is some tendency to overfitting when the network size and the learning rate increase, but the error is way too large anyway,

I presented the examples to the sports science people which were also quite surprised about the error’s order of magnitude. When returning back to my office, I had the sudden idea what I had actually shown in the graphs — after taking the sqrt of the mse I ended up with what I had actually wanted to show in the plots: the absolute error, i.e. the absolute of the difference between the predicted result from the network and the actual result. It’s somewhat embarrassing, anyway, but at least I know the cause of the high-error issue.

For the plot below I also got some advice from the sports science people to take out some redundant data (four columns out of 24 are more or less sums of three others each). Now the error is in a region that is more reasonable when a min/max of 841/1004 are expected for the target attribute. The plot shows error vs. the size of the first and second hidden layer of the feed-forward network, respectively.
neuro12, absolute error instead of mse

Dezember 18th, 2007

Back from SGAI2007

I’ve returned from the Specialist’s Group on Artificial Intelligence of the British Computer Society’s meeting at Peterhouse College, Cambridge, UK. The event was quite worthwile, where undoubtedly the best speaker was Max Bramer himself. On Monday he gave a talk to PhD students on „How to survive a viva“. Since he often has been and will be an external supervisor or reviewer of PhD students‘ theses, he told us what he thinks is important in the thesis, how to write it, what and what not to do and so on. His talk was riddled with hilarious real examples of what he saw in thesis drafts or even final theses. That event, even featured as a free event to PhD students, was actually included for the first time at the conference before the main conference began on Tuesday.
Read the rest of this entry »

Dezember 7th, 2007

SGAI 2007

I’ll be presenting a paper about my master’s thesis‘ work at the SGAI 2007 International Conference on Artificial Intelligence. See the technical stream of the conference for my talk about Relevance Feedback for Association Rules by Leveraging Concepts from Information Retrieval. The slides are available here: presentation slides for AI-2007.

Dezember 4th, 2007

Overfitting vs. Cross-Validation

One more experiment with the sports science data clearly shows the issue of overfitting of the neural network. I devised a script that automatically generates networks with two hidden layers of neurons and varies the layers‘ sizes from 2 to 16 systematically, respectively. No matter how the parameters for the learning process were set, the mean squared error plotted against the two layer sizes (right: first hidden layer, left: second hidden layer) is zero as soon as the networks get larger:
Without cross validation, the error is zero with larger networks

However, when applying cross-validation (test set: 1 record, validation set: 1 record, training set: 38 records), the error, especially towards the larger layer sizes, rises. This is a clear sign of overfitting (scales as in the last figure):
With cross validation, one can see the effects of overfitting quite clearly

As usual, the matlab script for this entry, which doesn’t differ much from the latest ones: neuro10-new_plot.m.