Data Mining et al - Georg RuÃŸ' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

PrÃ¤ziser Ackerbau erreicht die F.A.Z.

By russ

Inzwischen erreicht das Thema Precision Farming (oder PrÃ¤ziser Ackerbau bzw. TeilflÃ¤chenspezifische Bewirtschaftung) auch die Massenmedien. In der Frankfurter Allgemeinen Zeitung vom 10.05.2008 ist auf Seite 20 ein Artikel mit der Ãœberschrift Mit GPS in die Ackerfurche erschienen. Die vor allem wirtschaftlichen Vorteile von PF beim PflÃ¼gen, DÃ¼ngen und bei der Aussaat werden klar herausgestellt. Auch die MÃ¶glichkeiten von Satellitenfotos werden erwÃ¤hnt und die damit zu erzielenden teilweise betrÃ¤chtlichen Kosteneinsparungen sind nicht zu verachten.

HinzugefÃ¼gt am 19.05.2008 Ã„hnlicher Artikel, etwas emotionaler, in der Wirtschaftswoche.

Posted in agriculture, Deutsch | Kommentare deaktiviert

Preparing another paper for SGAI AI-2008

By russ

I feel as if I had just returned from AI-2007 (at least the expenses were paid just recently by the University) and there’s the deadline approaching for AI-2008, again held at Peterhouse College, UK.
Read the rest of this entry »

Posted in agriculture, conferences, data mining, English, SGAI AI-2008 | Kommentare deaktiviert

Interesting article on „Science 2.0“

By russ

Scientific American just posted an interesting article on Science 2.0, which is namely describing the use of Web2.0-techniques in scientific work. Well, I’ve somehow done that transition already with this blog. There’s also, for larger research groups, the wiki technique to enable closer collaboration. There have been earlier attempts from the MIT, called Open WetWare, which is used to

promote the sharing of information, know-how, and wisdom among researchers and groups who are working in biology & biological engineering.

According to the SciAm article, people publish almost all of their work in progress online, like How-Tos, intermediate results from experiments, publication discussion and so on. This is also my philosophy of working in research. It’s not publish or perish but to get into discussion or high-level arguments, receiving early feedback and promoting useful ideas and hints to other researchers.

At the moment I’m still collecting ideas for data mining of agricultural data. It looks as if it’s going to be research and experimentation with supervised learning methods on those data and, in the process, describing the data flow from the wheat field to the machine learning task and back to the field. The next paper deadline is for the AI-2008, again held at Peterhouse College, Cambridge, UK. I’ll have to sort out the content first, though.

Posted in agriculture, data mining, English | Kommentare deaktiviert

Another paper accepted, ICDM’08

By russ

As recent as of March 28th, another paper of my/our work on the agriculture data got accepted. The conference is the Industrial Conference on Data Mining 2008, which will be taking place not far away in Leipzig, Germany. Based on the results and the information from those conferences, I might try to submit advanced work at BCS AI-2008, which will again take place at (frosty) Peterhouse College in Cambridge, UK. I won’t be there for long, though, as offspring is already on his/her way.

Posted in agriculture, conferences, data mining, English, ICDM 2008 | Kommentare deaktiviert

Two papers accepted

By russ

Two papers of mine have been accepted recently. Both describe the process of data mining with neural networks for agriculture data. Therefore I’m quite confident that Â this will be my PhD thesis‘ major point of interest. The two conferences, which also provided worthwile and in-depth reviews, are IPMU 2008, in MÃ¡laga, Spain and IFIP AI 2008 in Milano, Italy. Of course, papers have to be redacted and my presentation will have to be prepared, but it’s encouraging nevertheless.

Posted in agriculture, data mining, English | Kommentare deaktiviert

More data (analysis) in agriculture

By russ

There are quite a few deadlines for publications approaching in January and I will submit another paper detailing some of the recent accomplishments on the agriculture data there. One of the conferences is ICDM 2008, held in Leipzig, Germany. It targets industrial applications of data mining and I felt the paper fit in there quite nicely.

I have also received more data sets from Martin Schneider at Martin-Luther-University of Halle-Wittenberg which will have to be mined. There are quite a lot of interesting tasks to be performed on those data — that requires thorough planning. I probably won’t be able to do that planning until my return from the organizational business trip to Melbourne, starting a cooperation project between our research group and the one that I worked with in 2004/2005.

Read the rest of this entry »

Posted in agriculture, conferences, data mining, English, ICDM 2008 | Kommentare deaktiviert

The „squared“ in „mean squared error“

By russ

The latest experiments with the sports science data always turned out outrageously high errors which most of the time missed the scale of the original attribute by orders of magnitude. After conducting some experiments which returned negative conclusions like:

the error is too high, therefore the network cannot be trained on these data or
there is some tendency to overfitting when the network size and the learning rate increase, but the error is way too large anyway,

I presented the examples to the sports science people which were also quite surprised about the error’s order of magnitude. When returning back to my office, I had the sudden idea what I had actually shown in the graphs — after taking the sqrt of the mse I ended up with what I had actually wanted to show in the plots: the absolute error, i.e. the absolute of the difference between the predicted result from the network and the actual result. It’s somewhat embarrassing, anyway, but at least I know the cause of the high-error issue.

For the plot below I also got some advice from the sports science people to take out some redundant data (four columns out of 24 are more or less sums of three others each). Now the error is in a region that is more reasonable when a min/max of 841/1004 are expected for the target attribute. The plot shows error vs. the size of the first and second hidden layer of the feed-forward network, respectively.

Posted in data mining, English, sports science | Kommentare deaktiviert

Back from SGAI2007

By russ

I’ve returned from the Specialist’s Group on Artificial Intelligence of the British Computer Society’s meeting at Peterhouse College, Cambridge, UK. The event was quite worthwile, where undoubtedly the best speaker was Max Bramer himself. On Monday he gave a talk to PhD students on „How to survive a viva“. Since he often has been and will be an external supervisor or reviewer of PhD students‘ theses, he told us what he thinks is important in the thesis, how to write it, what and what not to do and so on. His talk was riddled with hilarious real examples of what he saw in thesis drafts or even final theses. That event, even featured as a free event to PhD students, was actually included for the first time at the conference before the main conference began on Tuesday.
Read the rest of this entry »

Posted in conferences, data mining, English, personal, SGAI AI-2007 | Kommentare deaktiviert

SGAI 2007

By russ

I’ll be presenting a paper about my master’s thesis‘ work at the SGAI 2007 International Conference on Artificial Intelligence. See the technical stream of the conference for my talk about Relevance Feedback for Association Rules by Leveraging Concepts from Information Retrieval. The slides are available here: presentation slides for AI-2007.

Posted in data mining, English | Kommentare deaktiviert

Overfitting vs. Cross-Validation

By russ

One more experiment with the sports science data clearly shows the issue of overfitting of the neural network. I devised a script that automatically generates networks with two hidden layers of neurons and varies the layers‘ sizes from 2 to 16 systematically, respectively. No matter how the parameters for the learning process were set, the mean squared error plotted against the two layer sizes (right: first hidden layer, left: second hidden layer) is zero as soon as the networks get larger:

However, when applying cross-validation (test set: 1 record, validation set: 1 record, training set: 38 records), the error, especially towards the larger layer sizes, rises. This is a clear sign of overfitting (scales as in the last figure):

As usual, the matlab script for this entry, which doesn’t differ much from the latest ones: neuro10-new_plot.m.

Posted in data mining, English, sports science | Kommentare deaktiviert

Georg RuÃŸ' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

Seiten

Kategorien

Our recent book

Data Mining et al — RSS feed

Meta

PrÃ¤ziser Ackerbau erreicht die F.A.Z.

Preparing another paper for SGAI AI-2008

Interesting article on „Science 2.0“

Another paper accepted, ICDM’08

Two papers accepted

More data (analysis) in agriculture

The „squared“ in „mean squared error“

Back from SGAI2007

SGAI 2007

Overfitting vs. Cross-Validation