4. Using the MEKA wrapper

The MEKA project provides an open source implementation of methods for multi-label learning and evaluation. In multi-label classification, we want to predict multiple output variables for each input instance.

MEKA is based on the WEKA Machine Learning Toolkit; it includes dozens of multi-label methods from the scientific literature, as well as a wrapper to the related MULAN framework.

An introduction to multi-label classification and MEKA is given in a JMLR MLOSS-track paper. Note that while MEKA is GPL-licensed, using this wrapper does not incur GPL limitations on your code.

4.1. Setting up MEKA

In order to use the scikit-multilearn interface to MEKA you need to have JAVA and MEKA installed. Paths to both are passed to the class’s constructor. The current version supports meka 1.9.1+

The currently officially supported MEKA version is 1.9.2.

You can download it using the :fun:download_meka, the function returns path to MEKA classes.

In [1]:
from skmultilearn.ext import download_meka

meka_classpath = download_meka()
meka_classpath
MEKA 1.9.2 found, not downloading
Out[1]:
'/home/niedakh/scikit_ml_learn_data/meka/meka-release-1.9.2/lib/'

If you want to use a different version, just pass the version number as an argument to download_meka.

Note that you will need to have liac-arff installed if you want to use the MEKA wrapper, you can get them using: pip install liac-arff.

You will also need Java. You can pass a path to the Java binary in MEKA wrapper constructor. In Python 2.7 the pip package whichcraft is used to detect the location of java executables if no path is provided to the constructor. You can install it via pip install whichcraft. In Python 3 whichcraft is not used, java path will be found using the standard library.

4.2. Using MEKA via scikit-multilearn

Starting from scikit-multilearn 0.0.2 the meka wrapper is available from skmultilearn.ext (ext as in external) and is a fully scikit-compatible multi-label classifier.

To use the interface class start with importing skmultilearn’s module, then create an object of the Meka class using the constructor and perform the standard fit & predict scenario.

Let’s load up some data to see how it works.

In [2]:
from skmultilearn.dataset import load_dataset

X_train, y_train, _, _ = load_dataset('scene', 'train')
X_test,  y_test, _, _ = load_dataset('scene', 'test')
scene - exists, not redownloading
scene - exists, not redownloading

Now that we have a data set let’s classify it using MEKA and WEKA! If you are new to the MEKA and WEKA stack you can find available classifiers under the following links:

Let’s start by importing :class:Meka and constructing a MEKA wrapper classifier:

In [3]:
from skmultilearn.ext import Meka

meka = Meka(
        meka_classifier = "meka.classifiers.multilabel.BR", # Binary Relevance
        weka_classifier = "weka.classifiers.bayes.NaiveBayesMultinomial", # with Naive Bayes single-label classifier
        meka_classpath = meka_classpath, #obtained via download_meka
        java_command = '/usr/bin/java' # path to java executable
)
meka
Out[3]:
Meka(java_command='/usr/bin/java',
   meka_classifier='meka.classifiers.multilabel.BR',
   meka_classpath='/home/niedakh/scikit_ml_learn_data/meka/meka-release-1.9.2/lib/',
   weka_classifier='weka.classifiers.bayes.NaiveBayesMultinomial')

Where:

  • meka_classifier is the MEKA classifier class
  • weka_classifier is the WEKA base classifier class if used
  • java_command is the path to java
  • meka_classpath is the path to where meka.jar and weka.jar are located, usually they come together in meka releases, so this points to the lib subfolder of the folder where meka-<version>-realease.zip file was unzipped. If not provided the path is taken from environmental variable: MEKA_CLASSPATH

Now let’s train and test the classifier - we’ll se what level of hamming loss do we get?

In [4]:
X_train
Out[4]:
<1211x294 sparse matrix of type '<class 'numpy.float64'>'
    with 351805 stored elements in LInked List format>
In [5]:
meka.fit(X_train, y_train)
predictions = meka.predict(X_test)
In [6]:
from sklearn.metrics import hamming_loss
In [7]:
hamming_loss(y_test, predictions)
Out[7]:
0.14659977703455965

4.3. Citing meka

@article{MEKA,
    author = {Read, Jesse and Reutemann, Peter and Pfahringer, Bernhard and Holmes, Geoff},
    title = {{MEKA}: A Multi-label/Multi-target Extension to {Weka}},
    journal = {Journal of Machine Learning Research},
    year = {2016},
    volume = {17},
    number = {21},
    pages = {1--5},
    url = {http://jmlr.org/papers/v17/12-164.html},
}

@article{Hall:2009:WDM:1656274.1656278,
    author = {Hall, Mark and Frank, Eibe and Holmes, Geoffrey and Pfahringer, Bernhard and Reutemann, Peter and Witten, Ian H.},
    title = {The WEKA Data Mining Software: An Update},
    journal = {SIGKDD Explor. Newsl.},
    issue_date = {June 2009},
    volume = {11},
    number = {1},
    month = nov,
    year = {2009},
    issn = {1931-0145},
    pages = {10--18},
    numpages = {9},
    url = {http://doi.acm.org/10.1145/1656274.1656278},
    doi = {10.1145/1656274.1656278},
    acmid = {1656278},
    publisher = {ACM},
    address = {New York, NY, USA},
}