Georg Ruß' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

August 7th, 2012

HACC-spatial scripts

Since I’ve been asked quite often recently to publish the HACC-spatial scripts for the respective algorithm I developed in my PhD thesis, this post will list them and give a few explanations.

For now, here’s the zip file with R scripts:

From the thesis abstract:

The second task is concerned with management zone delineation. Based on a literature
review of existing approaches, a lack of exploratory algorithms for this task is concluded, in
both the precision agriculture and the computer science domains. Hence, a novel algorithm
(HACC-spatial) is developed, fulfilling the requirements posed in the literature. It is based
on hierarchical agglomerative clustering incorporating a spatial constraint. The spatial
contiguity of the management zones is the key parameter in this approach. Furthermore,
hierarchical clustering offers a simple and appealing way to explore the data sets under
study, which is one of the main goals of data mining.

The thesis itself can be found here: PhD thesis (32MB pdf), the algorithm is described on pdf page 124 (print page 114): hacc-spatial-algorithm.pdf.

Further explanations and shorter descriptions are to be found in two publications, available in fulltext: Exploratory Hierarchical Clustering for Management Zone Delineation in Precision Agriculture and Machine Learning Methods for Spatial Clustering on Precision Agriculture Data.

Let me know if there are questions, comments or even successful results when applying the algorithm to your data sets.

There are also two youtube videos of the clustering (with an additional pre-clustering step, the “inital phase”): F440-REIP32-movie.avi and F611-REIP32-movie.avi. It’s probably the end of both videos where it gets interesting. Compare the plots for the REIP32 variable of the F440 and F611 data sets (F440: PhD pdf page 185 (clustering on page 138) and F611: PhD pdf page 195).

Important points

  • The algorithm was designed to work with spatial data sets: each data record/point in the data set represents a vector of values which also has a location in space (2D/3D).
  • The data points should be spatially roughly uniformly distributed (probably with high density, although that doesn’t really matter). That is, it does not and cannot rely on density differences in the geospatial data distribution.
  • The input structure for the R scripts is a spatialPointsDataFrame with variables. The algorithm (the function) allows to select particular variable(s) for clustering. I.e. you may use multiple variables for clustering.
  • The algorithm is definitely not optimized for speed. It served my purposes well, but may take a while to run on your data.
  • The contiguity factor cf is subject to experimentation.

Apart from that, there’s not much to comment (yet). Let me know about questions or issues and I may be able to fix them or list further requirements here.


April 29th, 2012

HTTP-Refresh :-)

Es geht bei weiter.

März 13th, 2012

Fotos der Verteidigung

Fotos der Verteidigung gibt’s hier: Album Dissertationsverteidigung

Die Dissertation ist jetzt bei der Bibliothek publiziert und auch hier online zu finden:

Februar 24th, 2012

Verteidigung erfolgreich

Am gestrigen 23.02.2012 habe ich erfolgreich meine Dissertation mit dem Titel “Spatial Data Mining in Precision Agriculture” verteidigt. Hier die Vortragsfolien dazu.

Januar 12th, 2012

Slides for Application Lecture at UFZ

Tomorrow’s going to be my first (true) application lecture, at UFZ Leipzig. The slides are here:UFZ, Application Lecture 13.01.2012

Dezember 20th, 2011

Star X18i rooted

My novel Android Smartphone Star X18i (ordered from that particular shop), running Android 2.3.4, has been rooted using the instructions in this thread: (method 2b, using the zergRush exploit, v3). Nice work for the script, I just copied the instructions step-by-step. The phone itself seems to be a Sony Xperia X10 clone.

Read the rest of this entry »

November 29th, 2011

Environmental Data Mining

It just occurred to me that I should probably further develop my research profile and find an appropriate umbrella term that best covers my research interests. A quick suggestion including a definition would be Environmental Data Mining to describe the task of finding interesting, novel and potentially useful knowledge (=data mining) in georeferenced (spatial) and temporal multi-layered data sets (=environmental data). I haven’t done any research on this umbrella term yet (search engines provided but a few hits, but if I stay in research, this is probably where I’d try to be headed. Computer science is (to me) an ancillary science that needs specific applications and builds/provides solutions to specific tasks based on actual data sets collected in practice. And R is the best tool for this :-)

(this merits a new category at the top level)

November 21st, 2011

Dissertation eingereicht

"Spatial Data Mining in Precision Agriculture"


Mit dem offiziellen Datum vom 23.11.2011 habe ich heute meine Dissertation eingereicht. Jetzt ist der Fakultätsrat dran, dann sind es die Gutachter und wenn alles glatt läuft, bin ich bei der Verteidigung dran. Vorbehaltlich der Genehmigung durch den Fakultätsrat findet die Verteidigung am 23.02.2011, 15 Uhr, in 29-301 statt. Der Dissertationstitel entspricht der Überschrift dieses Blogs.

Fürs Binden habe ich übrigens 42 EUR bezahlt. Das kann doch kein Zufall sein!

September 28th, 2011

Three/Eight weeks to go

There’s a preliminary deadline for handing in my thesis. I’m not sure whether I’m going to make it, but 2011-10-19 has been set as my personal deadline, before the thesis draft goes again to my supervisor. Lots of improvements yet to do. The official one is then 2011-11-23. The site’s header has been changed to reflect the actual topic of my PhD thesis.

September 24th, 2011

An Update on the MiniHomer script

Well, I updated the minihomer script described in this post. The issue was that the old script just output one GPX file describing the complete miniHomer’s log — currently around 23MB, comprising two continents. Although the GPX file is internally grouped into tracks, it’s quite cumbersome and slow to open with viking (which is not viking’s fault). Therefore, I added a few lines to use the gpxsplitter python script at the very end in the “dump” option. It simply takes the existing gpx file and outputs single files containing just one track, which is exactly what I wanted.

Here’s the script: minihomertool, version 2011-09-24