Since the Series of AI conferences by the BCS Specialist Group on AI has been useful the last two times, I decided to submit yet another paper there. Again, I’m currently working with the agriculture data for yield prediction.
The question this time is: Which of the features in the data sets I have are actually useful for yield prediction? In recent publications I have done some research into different regression models which enable yield prediction. Taking this a step further, I’m now looking at feature selection. There’s a lot of research in that area (but mostly on classification, not regression) and I’m not quite finished getting to grips with it. Nevertheless, the central story line is emerging. In the SGAI AI-2009 paper I have developed or adapted a feature selection approach which uses forward selection (i.e. starting with an empty set of features and subsequently adding the most promising ones). The subsets are then evaluated using support vector regression and regression trees. In the end, a ranking of the features is presented which points out important (relevant) features and also shows the ones that are redundant or irrelevant.
As mentioned before, I’m now using R for computing, hence the script language will also be changed. I won’t be converting scripts from matlab to R, though. The script for the above SGAI AI-2009 paper (titled: Feature Selection for Wheat Yield Prediction) can be found here: 2009sgai-featuresel.R (or use the script overview)
It also seems as if the central theme of my dissertation will be along the above lines.