Georg Ruß' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

August 26th, 2011

ICDM and DMA workshop in NYC next week

Just before I head off into the weekend, the latest update on where I’ll be next week:

Industrial Conference on Data Mining, taking place from Tuesday August 30th until Saturday, September 3rd, in New York City (actually, it’s near Newark Airport [EWR] in New Jersey, but it’s close enough). I’ll be presenting a continuation of my work on HACC-spatial (the hierarchical agglomerative spatially constrained clustering) which I showed at my workshop and the ICDM 2010 conference last year.

Therefore, my talks‘ content will be along similar lines, with similar, but updated slides:

The second talk for my workshop will also contain a few slides about the joint paper with Antonio Mucherino, who won’t be able to come personally, but who contributed a nice survey for my 2nd Workshop on Data Mining in Agriculture.

August 17th, 2011

Navin MiniHomer, gpsbabel

I recently bought a Navin Minihomer for geocaching, geo-logging and wayfinding. Really nice device, and I got it to work under linux using gpsbabel using the following instructions:

  • Zeroth, see if there’s a /dev/ttyUSB* node created when plugging in the device. If not, compile the respective kernel module; it’s under USB support — USB serial converter — Prolific …, the module is called pl2303.
  • First, get the gpsbabel sources (currently 1.4.2) from gpsbabel.org or grab the CVS version
  • For the source (non-cvs) version, apply the patch written by Josef Reisinger and linked in this thread: http://sourceforge.net/mailarchive/attachment.php?list_name=gpsbabel-code&message_id=4CD96A99.9000603%40aol.com&counter=1
  • compile and install
  • have a look at the sources (the patch) to see what functionality is available. That is, look at the files prefixed with miniHomer in the xmldoc directory.
  • Feel free to use the bash script below to use the functions of the Navin Minihomer.
  • Drop me an email with comments, if necessary, email address is in the bash script.

The script below supports

  • minihomertool erase
  • minihomertool set [1-5] latitude longitude
  • minihomertool read
  • minihomertool init
  • minihomertool dump


The first command erases the log, the second can set the appropriate waypoint in the order they appear when cycling through the miniHomer’s menu (House to Bar) with lat/long in decimal degrees separated by spaces, the third reads the device’s log and splits it by day, and the fourth initializes the device to a certain speed (didn’t have to use it so far). The bash script requires setting the path to the gpsbabel (patched) binary and the USB device. It certainly works for me, except that gpsbabel produces strange gpx files where the dates of the points are set to sometime in the year 2031. I don’t care at the moment, it seems to be just a fixed shift. The last (dump) command just grabs the log dump from the logger, writes this to a file and processes it further, even correcting for the somewhat strange date by setting a negative offset of -172032 hours. Gpsbabel segfaults first, but still writes the log (but misses the waypoints in the dump, which I don’t need anyway).

Here’s the script: minihomertool bash script. It’s certainly not perfect, doesn’t care about errors and could clearly be more elegant, but whoever wants to can customize it.

There’s more information on the German znex site: http://www.znex.de/downloads.html.

Juli 14th, 2011

Buch „Computational Intelligence“

In Kürze wird, nach langer Vorbereitungszeit, das Buch „Computational Intelligence“ erscheinen, an dem ich als Koautor beteiligt bin. Insbesondere die buchbegleitenden Webseiten unter computational-intelligence.eu entstammen meiner Feder.

cicover-large

Juni 15th, 2011

First thesis draft submitted

Last week I handed in the first 228-page draft of what’s probably going to be
my thesis. Let’s see what the reviewers say, I hope there are not too many
fundamental issues with that draft.

April 27th, 2011

Thesis status

My thesis proceeds as expected and planned. The second main chapter is finished and off to the first reviewer, while the first main chapter is currently being written. The experiments are currently running on the lab machines (which are much quicker now than half a year ago using R — new hardware) and the plots will be generated soon. Time for applications. Deadlines seem to work :-)

The two latest papers of mine have been accepted at SCAI and ICDM. And there’s another upcoming journal article for (likely) GeoInformatica and the upcoming book of our working group on Computational Intelligence.

Those were the days …

März 31st, 2011

Yet another talk at MLU

On Tuesday I gave another talk at the MLU with a remixed auditorium and I
received a lot of additional input for my work and my PhD thesis. There’s a lot
of geospatial analysis to be done, as pointed out by Joachim Spilke.

The two main tasks clarified on Tuesday for the first half of my thesis revolve
around the continuation of Georg Weigert’s work on the yield (potential)
prediction. The first is whether it’s actually necessary to consider the
spatial information in the regression, i.e. whether the spatial
cross-validation I’ve developed is necessary and useful in practice. The second
is which regression model is to be chosen in a practical setup. Currently, it’s
a neural network, but if a different technique turns out to produce better
(whatever „better“ means) predictions, that might be tried.

Februar 28th, 2011

Slides for talk at MLU Halle

Tomorrow’s going to be my second talk in German at MLU Halle. Here are the slides: russ2011mlu-slides.

And there’s also a video of the clustering here: http://www.youtube.com/watch?v=Xk7eT4-F2Fg In short, the video compares a spatial clustering on the precision agriculture data I have, using four variables (P, pH, Mg, K) and low spatial contiguity (left) as well as high spatial contiguity (right). The clustering is hierarchical agglomerative with an initial tessellation of the field into 250 clusters which are subsequently merged. The clustering has been implemented in R (generating .png files of each plot) with subsequent video encoding with ImageMagick (convert) and Mplayer (mencoder). Nice demo, I guess.

Februar 7th, 2011

Third thesis reviewer is set

My thesis‘ third reviewer is fixed: Prof. Peter Wagner from Martin-Luther-Universität Halle-Wittenberg, whom I indirectly received my data from, via Martin Schneider, who’s now at Agri Con GmbH, Jahna.

Januar 17th, 2011

Two really useful R books

Looking back on the work I’ve done so far (finding a thesis topic, finding data, finding tools) I can definitely recommend the two books below. They’re R-related and they contain a lot of examples which still help in implementing the ideas I have. The first is Modern Applied Statistics with S (Venables/Ripley) and the other one is Applied Spatial Data Analysis with R (Bivand/Pebesma/Gómez-Rubio) from the „Use R!“ series. It’s just perfect to look up things in those books which you might need in your current implementation. Besides, there’s still the R mailing lists to ask your questions and the authors of the above books are typically present at those lists.

If you prefer a bookstore, look out for these on the shelves:

Januar 13th, 2011

Plan for 2011

This year’s going to be the deadline for my PhD thesis. It seems like I’m nearly there. A few things are left to be done and upcoming.

At the moment there’s a journal article for GeoInformatica about the spatial variable importance stuff I’ve developed based on Alex‘ suggestions. The results are such that they (at the moment) fit nicely into my thesis. Anyway, the results have an open outcome, but it looks as if an additional variable Elevation introduced into the regression models for yield prediction has a major influence on the quality of the prediction itself. I’m going to run a few more data sets through the models and see whether I keep getting similar results and will further try to falsify my hypothesis.

Then there’s the SCAI 2011 conference in Trondheim, where I’ll hand in an article about the spatial clustering I’ve developed. And I’ll try to meet a few other people while I’m there to see if there are any further postdoc opportunities at NTNU.

My own workshop, Data Mining in Agriculture 2011, is going to take place in conjunction with ICDM’2011, which is going to be held in NYC, US.

And then there’s the book Computational Intelligence of which I’m an author, due by the end of March.