scikit-multilearn multi-label classification for python

pip install scikit-multilearn

Native Python implementation

A native Python implementation of a variety of multi-label classification algorithms.

Interface to Meka

For reference purposes and integration needs a Meka wrapper class is implemented. Thus providing access to all methods available in meka, mulan and weka - the reference standard of the field.

Builds upon giants!

Team up with the power of numpy and scikit. You can use scikit-learn base classifiers as scikit-multilearn's classifiers. Scikit-multilearn classifiers follow the API of scikit-learn classifiers.

Free as in BSD

The licencing model follows scikit's BSD licence, to allow maximum interopability. Repository is set up on github at scikit-multilearn/scikit-multilearn

Join US!

This project has been started by niedakh for the purpose of experiments for his PhD with lots of to Data Science Group at the Wrocław University of Technology. The Python world needs a multi-label classification library, help build one & join our team!

If you want to help - join us at scikit-multilearn-dev.

What's new in 0.0.5? (released 2017-02-25)

Cite US!

If you use scikit-multilearn in your research and publish it, please consider citing us, it will help us get funding for making the library better. The paper is available on arXiv, to cite it try the following Bibtex:

author = {{Szyma{\'n}ski}, P. and {Kajdanowicz}, T.},
title = "{A scikit-based Python environment for performing multi-label classification}",
journal = {ArXiv e-prints},
archivePrefix = "arXiv",
eprint = {1702.01460},
primaryClass = "cs.LG",
keywords = {Computer Science - Learning, Computer Science - Mathematical Software},
year = 2017,
month = feb,

Supported classifiers

Classifier name Module Class name Relevant paper Available since version
Binary Relevance kNN skmultilearn.adapt BinaryRelevanceKNN An Empirical Study of Lazy Multilabel Classification Algorithms 0.0.4
k Nearest Neighbours multi-label classifier skmultilearn.adapt MLkNN ML-KNN: A lazy learning approach to multi-label learning 0.0.4
Binary Relevance skmultilearn.problem_transform BinaryRelevance 0.0.1
Classifier Chains skmultilearn.problem_transform ClassifierChains Classifier chains for multi-label classification 0.0.2
Label Powerset skmultilearn.problem_transform LabelPowerset 0.0.1
Stochastic Blockmodel based label space clusterer skmultilearn.cluster GraphToolCooccurenceClusterer 0.0.3
iGraph community detection based (modularity, walktrap, infomap) label space clusterer skmultilearn.cluster IGraphLabelCooccurenceClusterer How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification? 0.0.1
Label space partitioner using a clusterer from skmultilearn.cluster skmultilearn.ensemble LabelSpacePartitioningClassifier How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification? 0.0.3
Distinct Random k-Labelsets skmultilearn.ensemble RakelD Random k-Labelsets for Multilabel Classification 0.0.1
Overlaping Random k-Labelsets skmultilearn.ensemble RakelO Random k-Labelsets for Multilabel Classification 0.0.1
Wrapper for MEKA skmultilearn.ext MEKA MEKA: A Multi-label/Multi-target Extension to WEKA 0.0.1
Hierarchical ARAM Neural Network (HARAM) skmultilearn.neurofuzzy MLARAM HARAM: A Hierarchical ARAM Neural Network for Large-Scale Text Classification 0.0.4

Junior tasks

These are the tasks, links to github issues and papers that are related to classifier implementation and needs/plans, that have not been undertaken yet. If you want to take them, write a comment on a relevant issue.

Classifier name Relevant paper Github issue in scikit-multilearn
Bayes Optimal Probabilistic Classifier Chains Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains #20
Selective Ensemble of Classifier Chains Selective Ensemble of Classifier Chains #13
ML-c4.5 Knowledge Discovery in Multi-label Phenotype Data #10
QWML Efficient voting prediction for pairwise multilabel classification #9
Calibrated Label Ranking Multilabel classification via calibrated label ranking #8
Hierarchy Of Mul-tilabel classifiERs (HOMER) Effective and Efficient Multilabel Classification in Domains with Large Number of Labels #2