{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using the MEKA wrapper\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [MEKA](http://waikato.github.io/meka/) project provides an open source implementation of methods for multi-label learning and evaluation. In multi-label classification, we want to predict multiple output variables for each input instance. \n", "\n", "MEKA is based on the [WEKA](http://www.cs.waikato.ac.nz/ml/weka/) Machine Learning Toolkit; it includes dozens of multi-label methods from the scientific literature, as well as a wrapper to the related [MULAN](http://mulan.sourceforge.net/) framework.\n", "\n", "An introduction to multi-label classification and MEKA is given in a [JMLR MLOSS-track paper](http://jmlr.org/papers/volume17/12-164/12-164.pdf). Note that while MEKA is GPL-licensed, using this wrapper does not incur GPL limitations on your code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up MEKA\n", "\n", "In order to use the scikit-multilearn interface to MEKA you need to have JAVA and MEKA installed. Paths to both are passed to the class's constructor. **The current version supports meka 1.9.1+**\n", "\n", "The currently officially supported MEKA version is [1.9.2](https://github.com/Waikato/meka/releases/tag/meka-1.9.2). \n", "\n", "You can download it using the :fun:`download_meka`, the function returns path to MEKA classes. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MEKA 1.9.2 found, not downloading\n" ] }, { "data": { "text/plain": [ "'/home/niedakh/scikit_ml_learn_data/meka/meka-release-1.9.2/lib/'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from skmultilearn.ext import download_meka\n", "\n", "meka_classpath = download_meka()\n", "meka_classpath" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to use a different version, just pass the version number as an argument to `download_meka`.\n", "\n", "Note that you will need to have ``liac-arff`` installed if you want to use the MEKA wrapper, you can get them using: ``pip install liac-arff``.\n", "\n", "You will also need Java. You can pass a path to the Java binary in MEKA wrapper constructor. In Python 2.7 the pip package ``whichcraft`` is used to detect the location of java executables if no path is provided to the constructor. You can install it via ``pip install whichcraft``. In Python 3 ``whichcraft`` is not used, java path will be found using the standard library.\n", "\n", "## Using MEKA via scikit-multilearn \n", "\n", "Starting from scikit-multilearn ``0.0.2`` the meka wrapper is available from ``skmultilearn.ext`` (ext as in external) and is a fully scikit-compatible multi-label classifier.\n", "\n", "To use the interface class start with importing skmultilearn's module, then create an object of the ``Meka`` class using the constructor and perform the standard fit & predict scenario. \n", "\n", "Let's load up some data to see how it works." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "scene - exists, not redownloading\n", "scene - exists, not redownloading\n" ] } ], "source": [ "from skmultilearn.dataset import load_dataset\n", "\n", "X_train, y_train, _, _ = load_dataset('scene', 'train')\n", "X_test, y_test, _, _ = load_dataset('scene', 'test')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have a data set let's classify it using MEKA and WEKA! If you are new to the MEKA and WEKA stack you can find available classifiers under the following links:\n", "\n", "- [WEKA base classifier list](http://weka.sourceforge.net/doc.dev/weka/classifiers/Classifier.html)\n", "- [MEKA multi-label classifier list](http://waikato.github.io/meka/methods/)\n", "\n", "Let's start by importing :class:`Meka` and constructing a MEKA wrapper classifier:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Meka(java_command='/usr/bin/java',\n", " meka_classifier='meka.classifiers.multilabel.BR',\n", " meka_classpath='/home/niedakh/scikit_ml_learn_data/meka/meka-release-1.9.2/lib/',\n", " weka_classifier='weka.classifiers.bayes.NaiveBayesMultinomial')" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from skmultilearn.ext import Meka\n", "\n", "meka = Meka(\n", " meka_classifier = \"meka.classifiers.multilabel.BR\", # Binary Relevance\n", " weka_classifier = \"weka.classifiers.bayes.NaiveBayesMultinomial\", # with Naive Bayes single-label classifier\n", " meka_classpath = meka_classpath, #obtained via download_meka\n", " java_command = '/usr/bin/java' # path to java executable\n", ")\n", "meka" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Where:\n", "\n", "- ``meka_classifier`` is the MEKA classifier class \n", "- ``weka_classifier`` is the WEKA base classifier class if used\n", "- ``java_command`` is the path to java\n", "- ``meka_classpath`` is the path to where meka.jar and weka.jar are located, usually they come together in meka releases, so this points to the ``lib`` subfolder of the folder where ``meka--realease.zip`` file was unzipped. If not provided the path is taken from environmental variable: ``MEKA_CLASSPATH``\n", "\n", "Now let's train and test the classifier - we'll se what level of hamming loss do we get?\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<1211x294 sparse matrix of type ''\n", "\twith 351805 stored elements in LInked List format>" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X_train" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "meka.fit(X_train, y_train)\n", "predictions = meka.predict(X_test)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from sklearn.metrics import hamming_loss" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.14659977703455965" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hamming_loss(y_test, predictions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Citing meka\n", "\n", "```latex\n", "@article{MEKA,\n", " author = {Read, Jesse and Reutemann, Peter and Pfahringer, Bernhard and Holmes, Geoff},\n", " title = {{MEKA}: A Multi-label/Multi-target Extension to {Weka}},\n", " journal = {Journal of Machine Learning Research},\n", " year = {2016},\n", " volume = {17},\n", " number = {21},\n", " pages = {1--5},\n", " url = {http://jmlr.org/papers/v17/12-164.html},\n", "}\n", "\n", "@article{Hall:2009:WDM:1656274.1656278,\n", " author = {Hall, Mark and Frank, Eibe and Holmes, Geoffrey and Pfahringer, Bernhard and Reutemann, Peter and Witten, Ian H.},\n", " title = {The WEKA Data Mining Software: An Update},\n", " journal = {SIGKDD Explor. Newsl.},\n", " issue_date = {June 2009},\n", " volume = {11},\n", " number = {1},\n", " month = nov,\n", " year = {2009},\n", " issn = {1931-0145},\n", " pages = {10--18},\n", " numpages = {9},\n", " url = {http://doi.acm.org/10.1145/1656274.1656278},\n", " doi = {10.1145/1656274.1656278},\n", " acmid = {1656278},\n", " publisher = {ACM},\n", " address = {New York, NY, USA},\n", "}\n", "\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }