Data Mining et al - Georg RuÃŸ' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

HACC-spatial scripts

By russ

Since I’ve been asked quite often recently to publish the HACC-spatial scripts for the respective algorithm I developed in my PhD thesis, this post will list them and give a few explanations.

For now, here’s the zip file with R scripts: hacc-spatial.zip.

From the thesis abstract:

The second task is concerned with management zone delineation. Based on a literature
review of existing approaches, a lack of exploratory algorithms for this task is concluded, in
both the precision agriculture and the computer science domains. Hence, a novel algorithm
(HACC-spatial) is developed, fulfilling the requirements posed in the literature. It is based
on hierarchical agglomerative clustering incorporating a spatial constraint. The spatial
contiguity of the management zones is the key parameter in this approach. Furthermore,
hierarchical clustering offers a simple and appealing way to explore the data sets under
study, which is one of the main goals of data mining.

The thesis itself can be found here: PhD thesis (32MB pdf), the algorithm is described on pdf page 124 (print page 114): hacc-spatial-algorithm.pdf.

Further explanations and shorter descriptions are to be found in two publications, available in fulltext: Exploratory Hierarchical Clustering for Management Zone Delineation in Precision Agriculture and Machine Learning Methods for Spatial Clustering on Precision Agriculture Data.

Let me know if there are questions, comments or even successful results when applying the algorithm to your data sets.

There are also two youtube videos of the clustering (with an additional pre-clustering step, the „inital phase“): F440-REIP32-movie.avi and F611-REIP32-movie.avi. It’s probably the end of both videos where it gets interesting. Compare the plots for the REIP32 variable of the F440 and F611 data sets (F440: PhD pdf page 185 (clustering on page 138) and F611: PhD pdf page 195).

Important points

The algorithm was designed to work with spatial data sets: each data record/point in the data set represents a vector of values which also has a location in space (2D/3D).
The data points should be spatially roughly uniformly distributed (probably with high density, although that doesn’t really matter). That is, it does not and cannot rely on density differences in the geospatial data distribution.
The input structure for the R scripts is a spatialPointsDataFrame with variables. The algorithm (the function) allows to select particular variable(s) for clustering. I.e. you may use multiple variables for clustering.
The algorithm is definitely not optimized for speed. It served my purposes well, but may take a while to run on your data.
The contiguity factor cf is subject to experimentation.

Apart from that, there’s not much to comment (yet). Let me know about questions or issues and I may be able to fix them or list further requirements here.

mail: researchblog@georgruss.ch

Posted in agriculture, data mining, English, R | Kommentare deaktiviert

Star X18i rooted

By russ

My novel Android Smartphone Star X18i (ordered from that particular shop), running Android 2.3.4, has been rooted using the instructions in this thread: http://forum.xda-developers.com/showthread.php?p=18615502#post8615502 (method 2b, using the zergRush exploit, v3). Nice work for the script, I just copied the instructions step-by-step. The phone itself seems to be a Sony Xperia X10 clone.

Read the rest of this entry »

Posted in English, personal | Kommentare deaktiviert

Environmental Data Mining

By russ

It just occurred to me that I should probably further develop my research profile and find an appropriate umbrella term that best covers my research interests. A quick suggestion including a definition would be Environmental Data Mining to describe the task of finding interesting, novel and potentially useful knowledge (=data mining) in georeferenced (spatial) and temporal multi-layered data sets (=environmental data). I haven’t done any research on this umbrella term yet (search engines provided but a few hits, but if I stay in research, this is probably where I’d try to be headed. Computer science is (to me) an ancillary science that needs specific applications and builds/provides solutions to specific tasks based on actual data sets collected in practice. And R is the best tool for this :-)

(this merits a new category at the top level)

Posted in data mining, English, Environmental Data Mining, R | Kommentare deaktiviert

Dissertation eingereicht

By russ

Mit dem offiziellen Datum vom 23.11.2011 habe ich heute meine Dissertation eingereicht. Jetzt ist der FakultÃ¤tsrat dran, dann sind es die Gutachter und wenn alles glatt lÃ¤uft, bin ich bei der Verteidigung dran. Vorbehaltlich der Genehmigung durch den FakultÃ¤tsrat findet die Verteidigung am 23.02.2011, 15 Uhr, in 29-301 statt. Der Dissertationstitel entspricht der Ãœberschrift dieses Blogs.

FÃ¼rs Binden habe ich Ã¼brigens 42 EUR bezahlt. Das kann doch kein Zufall sein!

Posted in agriculture, data mining, Deutsch | Kommentare deaktiviert

Three/Eight weeks to go

By russ

There’s a preliminary deadline for handing in my thesis. I’m not sure whether I’m going to make it, but 2011-10-19 has been set as my personal deadline, before the thesis draft goes again to my supervisor. Lots of improvements yet to do. The official one is then 2011-11-23. The site’s header has been changed to reflect the actual topic of my PhD thesis.

Posted in conferences | Kommentare deaktiviert

An Update on the MiniHomer script

By russ

Well, I updated the minihomer script described in this post. The issue was that the old script just output one GPX file describing the complete miniHomer’s log — currently around 23MB, comprising two continents. Although the GPX file is internally grouped into tracks, it’s quite cumbersome and slow to open with viking (which is not viking’s fault). Therefore, I added a few lines to use the gpxsplitter python script at the very end in the „dump“ option. It simply takes the existing gpx file and outputs single files containing just one track, which is exactly what I wanted.

Here’s the script: minihomertool, version 2011-09-24

Posted in conferences | Kommentare deaktiviert

Georg RuÃŸ' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

Seiten

Kategorien

Our recent book

Data Mining et al — RSS feed

Meta

HACC-spatial scripts

HTTP-Refresh :-)

Fotos der Verteidigung

Verteidigung erfolgreich

Slides for Application Lecture at UFZ

Star X18i rooted

Environmental Data Mining

Dissertation eingereicht

Three/Eight weeks to go

An Update on the MiniHomer script