Data Mining et al - Georg RuÃŸ' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

Report of participation at SGAI 2008

By russ

From 10th to 11th of December, 2008, I participated in the Twenty-eighth SGAI International Conference on Artificial Intelligence. My talk was on the usage of self-organizing maps for data mining, using the agriculture data as a basis. I didn’t really get any important feedback right after my talk but I could talk to some of the academics who all said that my topic is very relevant and worth working on. Regarding my PhD, the main concern I’ll have to take into account is that it is not supposed to be a PhD in agriculture, but in computer science, so I can’t get away with just applying some data mining techniques to some agriculture data. But my ideas are still there and my thesis structure has also emerged.

For reference reasons, my slides for last week’s conference: slides-2008sgai.

Posted in agriculture, conferences, data mining, English, SGAI AI-2008 | Kommentare deaktiviert

Four models to be compared [update]

By russ

Same procedure as in recent posts … I have four models to be compared, I’ve put them into this script (which runs about an hour on my machine) and the results can be seen below. The simple regression tree is the worst model, but takes almost no time to compute. The RBF takes longest, but the SVM is still better and quicker. I’ll probably run this script on different data, to see how it performs there.

Read the rest of this entry »

Posted in agriculture, conferences, data mining, English, ICDM 2009 | Kommentare deaktiviert

Regression tree results

By russ

Another very simple idea to construct a model from data (with one continuous target variable) is that of using a regression tree. Matlab provides classregtree functions, aimed at constructing a regression tree. I ran it on my data with this script. The performance graphs are shown below, as well as an exemplary regression tree. There have been no optimisations so far, but the tree has been pruned to reduce its complexity for viewing.

Posted in agriculture, conferences, data mining, English, ICDM 2009 | Kommentare deaktiviert

Update: MLP vs. SVM vs. RBF

By russ

In the previous article on the MLP vs. SVM vs. RBF comparison the RBF performed worse than the other two. Well, even after doing some optimisation on the RBF parameters (hidden layer size), it is still continuously worse than SVM and MLP, although the margin is smaller.

Posted in agriculture, conferences, data mining, English, ICDM 2009 | Kommentare deaktiviert

RBF parameters

By russ

Since the size of the hidden layer of the RBF network seems to be the most important parameter, I’ve run a short simulation that outputs a graph for the network’s performance (mae, rmse), plotted against the hidden layer’s size. As expected, the curve turns out flat with larger numbers of neurons. A good tradeoff seems to fix the size at 70 neurons (for the given data set, of course).

(I could have plotted them into one figure, but I was too lazy to change the script.)

I’d like to mention that the cross validation partitioning step was done just once and the network’s parameter was varied just for this one data split. This might be a problem, but, as we saw in the previous post, the three models I’ve trained all perform similar, with similar ups and downs in performance over different data partitions. It therefore should be justified to run the RBF parameter experiment just on one split.

Posted in agriculture, conferences, data mining, English, ICDM 2009 | Kommentare deaktiviert

MLP vs. SVM vs. RBF

By russ

Yet another neural network, the radial basis function (RBF) network was used as a function approximation to compare against the MLP and SVM models. The parameter settings for the RBF have not been optimised so far. I simply ran it against the MLP/SVM on the same cross validation data. The results can be obtained from the following two graphics:

The script for the above graphics is online.

At the moment I’m running some simulations to determine the size of the hidden layer of the RBF network, as this seems to be the most important parameter. The matlab implementation of the RBF network also takes some time to incrementally add neurons up to a maximum number (user-specified).

Posted in agriculture, conferences, data mining, English, ICDM 2009 | Kommentare deaktiviert

SVM vs. MLP (reversed result, using normalization)

By russ

In the previous article I arrived at the result that the SVM performs slightly worse than the MLP neural network, each with more or less optimal configurations. Well, that was the preliminary result; I added normalization into the script and the outcome is the other way around. See the graphs below, the SVM is now consistently better than the MLP. I’ll have to check this result on other data sets, though.
Read the rest of this entry »

Posted in agriculture, data mining, English | Kommentare deaktiviert

Preliminary model comparison: MLP vs. SVM

By russ

After figuring out some of the SVM parameters, I did a comparison of an MLP (feedforward neural network) technique vs. the SVM (support vector regression) technique for use as a predictor. The data were split into train/test set at a ratio of 9/1, both the SVM and the MLP were trained with those data and this was repeated a few (20) times. It turns out that the neural network seems to perform better and oscillates less over the trial runs. The following figures tell the tale more precisely:
Read the rest of this entry »

Posted in agriculture, data mining, English | Kommentare deaktiviert

Figuring out SVM parameters

By russ

The last few days saw me experimenting with one particular data set and different parameters of the SVM regression model for those data. For the data set at hand, I figured epsilon, the width of the error pipe to be 0.3 and the standard deviation of the rbf kernel to be 12. Other kernels won’t work on those data and I’ll have to do a comparison of those results with the number of support vectors that are a further parameter that constitutes the models.
Read the rest of this entry »

Posted in agriculture, data mining, English | Kommentare deaktiviert

SVM script updated for new Matlab version

By russ

The updated script that uses SVMTorch in regression mode uses the cvpartition function from the statistics toolbox in Matlab R2008a, which I happened to install today. Seems that my splitCV script is deprecated now.

Further info: I added a page on the left that provides easy access to some of the matlab scripts that I’ve created so far.

Posted in agriculture, data mining, English | Kommentare deaktiviert

Georg RuÃŸ' PhD Blog — R, clustering, regression, all on spatial data, hence it's:

Seiten

Kategorien

Our recent book

Data Mining et al — RSS feed

Meta

Report of participation at SGAI 2008

Four models to be compared [update]

Regression tree results

Update: MLP vs. SVM vs. RBF

RBF parameters

MLP vs. SVM vs. RBF

SVM vs. MLP (reversed result, using normalization)

Preliminary model comparison: MLP vs. SVM

Figuring out SVM parameters

SVM script updated for new Matlab version