A first go at big data: using a naive density estimator and a Bayes classifiers
There are some really great tutorials (actually lecture notes) by Andrew Moore. You can find them on his website
There is a useful set repository with data sets, provided by the UCI Machine Learning Repository. You can find it here.
I'm using PostGreSQL the manage the data. It all runs pretty fine on my BeagleBone Black.
There are still a number of todos and fixes left. For instance: there need to be some testing and statistics on prediction accuracy. We also need to upgrade the user friendlyness. Perhaps make a nice http post / rest interdace?