Analysis of a multiclass image dataset using Scikit-learn
Project link: https://github.com/HL-Boisvert/Data_Mining_Portfolio
Dataset link: https://www.kaggle.com/andrewmvd/animal-faces
This dataset consists of 16130 512*512 images in 3 classes: cat, dog and wild animal.
Several interesting conclusions were drawn from this challenging dataset:
- Using Pearson correlation it is possible to determine with pixels have the best correlation score and thus optimize the training:
- The classes were determined to not be linearly separable
- Classification using k-means clustering is very inefficient due to the properties of the dataset (35% accuracy on testing dataset):
- Classifying using random forests algorithms works very well, as does the Multi-Layers Perceptron classifier (respectively 75% and 80% accuracy on testing dataset).
- However classifying using Convolutional Neural Networks gives the best results, reaching 95% accuracy with optimum meta-parameters.