Building and evaluating Naive Bayes classifier with WEKA

This is a followup post from previous where we were calculating Naive Bayes prediction on given data set. This time I want to demonstrate how all this can be implemented using WEKA application.

WEKA_GUI

For those who doesn’t know what WEKA is I highly recommend visiting their website and getting latest release. It is really powerful machine learning software written in Java. You can find plenty of tutorials in youtube on how to get started with WEKA. So I wont get in to details. I’m sure you’ll be able to follow anyway.

Preparing data for classification

We are going to use same data set as in previous example with weather features temperature and humidity and class yes/no for playing golf.

Data is stored in arff file format specific for WEKA software and looks like this:

@relation 'weather.symbolic-weka.filters.unsupervised.attribute.Remove-R1,4'
@attribute temperature {hot,mild,cool}
@attribute humidity {high,normal}
@attribute play {yes,no}
@data
hot,high,no

hot,high,no

hot,high,yes

mild,high,yes

cool,normal,yes

cool,normal,no

cool,normal,yes

mild,high,no

cool,normal,yes

mild,normal,yes

mild,normal,yes

mild,high,yes

hot,normal,yes

mild,high,no

Here we can see the attribute denominators: temperature, humidity and play followed by data table. Using this data set we are going to train Naive Bayes model and then apply this model to new data with temperature cool and humidity high to see to which class it will be assigned.

First of all in WEKA explorer Preprocess tab we need to open our ARFF data file:

WEKA_load_data

Here we can see the basic statistics of attributes. If you click Edit button, the new Viewer window with data table will be loaded.

weather_data_table

In viewer you can edit data as you like and then you can always save new data set with Save button in explorer. We will do so when we will create test set with cool and high parameter values. For this we just delete all lines of data except first one and edit vales to looks like this:

weather_test_data_table

Select nothing on play attribute because we don’t know it yet.

Click OK and then Save data as separate file. The file should look like this:

@relation 'weather.symbolic-weka.filters.unsupervised.attribute.Remove-R1,4'

@attribute temperature {hot,mild,cool}
@attribute humidity {high,normal}
@attribute play {yes,no}

@data
cool,high,?

Question “?” mark is a standard way of representing missing value in WEKA.

Building Naive Bayes model

Now that we have data prepared we can proceed on building model. Load full weather data set again in explorer and then go to Classify tab.

Here you need to press Choose Classifier button and from tree menu select NaiveBayes. Be sure that Play attribute is selected as class selector and then press Start button to build model.

WEKA_Naive_Bayes_model_build

Model outputs some information on hos accurate it classifies and other parameters.

Correctly Classified Instances 9 64.2857 %

Incorrectly Classified Instances 5 35.7143 %

You can see that on given data set the accuracy of classifier is about 64%. So keep in mind that you shouldn’t always take the results as given. To get better results you might want to try different classifiers or preprocess data even further. We won’t get in to this right now. We need to demonstrate the usage of model on new upcoming data.

Evaluating classifier with test set

Now when we have model we need to load our test data we’ve created before. For this select Supplied test set and click button Set.

WEKA_load_test_set

Click More Options where in new window choose PlainText from Output predictions as follows:

WEKA_test_set_more_options

Then click left mouse button on recently created model on result list and select Re-evaluate model on current test set.

weka_reevaluate_model

And you should see the prediction for your given data cool and hot like this:

=== Predictions on user test set ===

    inst#     actual  predicted error prediction
        1        1:?      1:yes       0.531

As you can see it has been predicted as yes with error 53.1%. In previous analytical example we’ve got 50% error on prediction.

Leave a Reply

Your email address will not be published. Required fields are marked *