MLlib is Spark’s machine learning (ML) library. It’s goal is to make practical machine learning scalable and easy.
I tried to make a complete step by step classification example using the Iris flower data set using the BeakerX Jupyter kernel which covers the following steps
- Setup
- Data Preparation
- Testing and Prediction
- Validation
The example is written in Scala but you could use any other language which is supported by the JVM.
My example can be found the this GIST
I leave it up to you to replace the classifier with e.g. NaiveBayes or with a MultilayerPerceptronClassifier. The MLib Programming Guide contains the right level of information and is easy to use.
0 Comments