The data is obtained from https://www.kaggle.com/jsphyg/weather-dataset-rattle-package. The aim of the task was to predict next-day rain in Australia, but predicting whether or not the following day will be cloudy
The data contained multiple attributes and a class attribute “CloudTomorrow”
Pre-processing was required to make the data set suitable for the model. Within this project, there were 5 classification models used to precict “cloudiness” to then calculate the confidence of each case by producing an ROC curve for each classifier.
There were also attempts at creating a decision tree by hand while also touching on Aritifical Neural Network classifiers and how it would perform within this dataset
Full report can be found on my github (private): https://github.com/TingHanGan/FIT3152_Assignment2.
## AUC (%)
## Decision Tree 0.716593
## Naïve Bayes 0.7202077
## Bagging 0.7333656
## Boosting 0.6941912
## Random Forest 0.7398179
By producing an ROC curve for each classifier and a corresponding AUC table, we are able to quickly identify which classifier would perform the best on average. Thus, it is seen here that Random Forest would be the best classifier in this case.