4. Using the MEKA wrapper¶
The MEKA project provides an open source implementation of methods for multi-label learning and evaluation. In multi-label classification, we want to predict multiple output variables for each input instance.
MEKA is based on the WEKA Machine Learning Toolkit; it includes dozens of multi-label methods from the scientific literature, as well as a wrapper to the related MULAN framework.
An introduction to multi-label classification and MEKA is given in a JMLR MLOSS-track paper. Note that while MEKA is GPL-licensed, using this wrapper does not incur GPL limitations on your code.
4.1. Setting up MEKA¶
In order to use the scikit-multilearn interface to MEKA you need to have JAVA and MEKA installed. Paths to both are passed to the class’s constructor. The current version supports meka 1.9.1+
The currently officially supported MEKA version is 1.9.2.
You can download it using the :fun:download_meka
, the function
returns path to MEKA classes.
In [1]:
from skmultilearn.ext import download_meka
meka_classpath = download_meka()
meka_classpath
MEKA 1.9.2 found, not downloading
Out[1]:
'/home/niedakh/scikit_ml_learn_data/meka/meka-release-1.9.2/lib/'
If you want to use a different version, just pass the version number as
an argument to download_meka
.
Note that you will need to have liac-arff
installed if you want to
use the MEKA wrapper, you can get them using: pip install liac-arff
.
You will also need Java. You can pass a path to the Java binary in MEKA
wrapper constructor. In Python 2.7 the pip package whichcraft
is
used to detect the location of java executables if no path is provided
to the constructor. You can install it via pip install whichcraft
.
In Python 3 whichcraft
is not used, java path will be found using
the standard library.
4.2. Using MEKA via scikit-multilearn¶
Starting from scikit-multilearn 0.0.2
the meka wrapper is available
from skmultilearn.ext
(ext as in external) and is a fully
scikit-compatible multi-label classifier.
To use the interface class start with importing skmultilearn’s module,
then create an object of the Meka
class using the constructor and
perform the standard fit & predict scenario.
Let’s load up some data to see how it works.
In [2]:
from skmultilearn.dataset import load_dataset
X_train, y_train, _, _ = load_dataset('scene', 'train')
X_test, y_test, _, _ = load_dataset('scene', 'test')
scene - exists, not redownloading
scene - exists, not redownloading
Now that we have a data set let’s classify it using MEKA and WEKA! If you are new to the MEKA and WEKA stack you can find available classifiers under the following links:
Let’s start by importing :class:Meka
and constructing a MEKA wrapper
classifier:
In [3]:
from skmultilearn.ext import Meka
meka = Meka(
meka_classifier = "meka.classifiers.multilabel.BR", # Binary Relevance
weka_classifier = "weka.classifiers.bayes.NaiveBayesMultinomial", # with Naive Bayes single-label classifier
meka_classpath = meka_classpath, #obtained via download_meka
java_command = '/usr/bin/java' # path to java executable
)
meka
Out[3]:
Meka(java_command='/usr/bin/java',
meka_classifier='meka.classifiers.multilabel.BR',
meka_classpath='/home/niedakh/scikit_ml_learn_data/meka/meka-release-1.9.2/lib/',
weka_classifier='weka.classifiers.bayes.NaiveBayesMultinomial')
Where:
meka_classifier
is the MEKA classifier classweka_classifier
is the WEKA base classifier class if usedjava_command
is the path to javameka_classpath
is the path to where meka.jar and weka.jar are located, usually they come together in meka releases, so this points to thelib
subfolder of the folder wheremeka-<version>-realease.zip
file was unzipped. If not provided the path is taken from environmental variable:MEKA_CLASSPATH
Now let’s train and test the classifier - we’ll se what level of hamming loss do we get?
In [4]:
X_train
Out[4]:
<1211x294 sparse matrix of type '<class 'numpy.float64'>'
with 351805 stored elements in LInked List format>
In [5]:
meka.fit(X_train, y_train)
predictions = meka.predict(X_test)
In [6]:
from sklearn.metrics import hamming_loss
In [7]:
hamming_loss(y_test, predictions)
Out[7]:
0.14659977703455965
4.3. Citing meka¶
@article{MEKA,
author = {Read, Jesse and Reutemann, Peter and Pfahringer, Bernhard and Holmes, Geoff},
title = {{MEKA}: A Multi-label/Multi-target Extension to {Weka}},
journal = {Journal of Machine Learning Research},
year = {2016},
volume = {17},
number = {21},
pages = {1--5},
url = {http://jmlr.org/papers/v17/12-164.html},
}
@article{Hall:2009:WDM:1656274.1656278,
author = {Hall, Mark and Frank, Eibe and Holmes, Geoffrey and Pfahringer, Bernhard and Reutemann, Peter and Witten, Ian H.},
title = {The WEKA Data Mining Software: An Update},
journal = {SIGKDD Explor. Newsl.},
issue_date = {June 2009},
volume = {11},
number = {1},
month = nov,
year = {2009},
issn = {1931-0145},
pages = {10--18},
numpages = {9},
url = {http://doi.acm.org/10.1145/1656274.1656278},
doi = {10.1145/1656274.1656278},
acmid = {1656278},
publisher = {ACM},
address = {New York, NY, USA},
}