My PhD thesis is titled Spatial Data Mining in Precision Agriculture and focuses on the following two topics:
- Spatial Variable Importance for Precision Agriculture Data
- Management Zone Delineation as a Spatial Clustering Problem
Here’s the thesis: Dissertation: Spatial Data Mining in Precision Agriculture (32 MB .pdf due to images)
Here’s the English abstract (copy-n-pasted from the official pdf document)
Technological advances are nowadays often based on improvements in information and data
processing capabilities. Even modern agriculture is to a large extent based on adequate data
processing, since the usage of novel information devices, GPS-based georeferenced data
collection and high-resolution spatial data sets have become standard modes of operation,
turning the once uniform site management into site-specific management as one of the
most important sub-fields in precision agriculture. On the one hand, the resulting data
sets clearly provide the foundations for economic and ecologic improvements. On the other
hand, these data sets pose novel challenges for spatial data mining. Two specific tasks are
explored in this study: spatial variable importance and management zone delineation.
The foundations of this thesis are data originating in site-specific management operations.
They typically include electrical conductivity readings, fertilizer applications, soil
sampling results, vegetation indicators and yield measurements. These variables are georeferenced,
i.e. for a particular point of the site under study the variables and their values are
known at a certain spatial resolution. These spatial data sets are furthermore augmented
with digital elevation models from which terrain attributes such as slope, wetness index and
curvatures are derived.
The first of the tasks is concerned with yield prediction and based on an existing dissertation
in this area. Yield prediction is handled as a multivariate regression task using
spatial data sets. However, taking the spatial relationships of the data sets into account
requires some changes in the standard cross-validation to make it aware of spatial relationships
in the data sets. Based on this addition, the question can be answered which of a
variety of regression models are best suited for yield prediction. Eventually the regression
models help to estimate which of the variables are important for yield prediction using
permutation-based variable importance measures.
The second task is concerned with management zone delineation. Based on a literature
review of existing approaches, a lack of exploratory algorithms for this task is concluded, in
both the precision agriculture and the computer science domains. Hence, a novel algorithm
(HACC-spatial) is developed, fulfilling the requirements posed in the literature. It is based
on hierarchical agglomerative clustering incorporating a spatial constraint. The spatial
contiguity of the management zones is the key parameter in this approach. Furthermore,
hierarchical clustering offers a simple and appealing way to explore the data sets under
study, which is one of the main goals of data mining.