RAkELd: random label space partitioning with Label Powerset

class skmultilearn.ensemble.RakelD(base_classifier=None, labelset_size=3, base_classifier_require_dense=None)[source]

Bases: skmultilearn.base.base.MLClassifierBase

Distinct RAndom k-labELsets multi-label classifier.

Divides the label space in to equal partitions of size k, trains a Label Powerset classifier per partition and predicts by summing the result of all trained classifiers.

Parameters:
  • base_classifier (sklearn.base) – the base classifier that will be used in a class, will be automatically put under self.classifier for future access.
  • base_classifier_require_dense ([bool, bool]) – whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under self.require_dense
  • labelset_size (int) – the desired size of each of the partitions, parameter k according to paper Default is 3, according to paper it has the best results
_label_count

the number of labels the classifier is fit to, set by fit()

Type:int
model_count_

the number of sub classifiers trained, set by fit()

Type:int
classifier_

the underneath classifier that perform the label space partitioning using a random clusterer skmultilearn.ensemble.RandomLabelSpaceClusterer

Type:skmultilearn.ensemble.LabelSpacePartitioningClassifier

References

If you use this class please cite the paper introducing the method:

@ARTICLE{5567103,
    author={G. Tsoumakas and I. Katakis and I. Vlahavas},
    journal={IEEE Transactions on Knowledge and Data Engineering},
    title={Random k-Labelsets for Multilabel Classification},
    year={2011},
    volume={23},
    number={7},
    pages={1079-1089},
    doi={10.1109/TKDE.2010.164},
    ISSN={1041-4347},
    month={July},
}

Examples

Here’s a simple example of how to use this class with a base classifier from scikit-learn to teach non-overlapping classifiers each trained on at most four labels:

from sklearn.naive_bayes import GaussianNB
from skmultilearn.ensemble import RakelD

classifier = RakelD(
    base_classifier=GaussianNB(),
    base_classifier_require_dense=[True, True],
    labelset_size=4
)

classifier.fit(X_train, y_train)
prediction = classifier.predict(X_train, y_train)
fit(X, y)[source]

Fit classifier to multi-label data

Parameters:
  • X (numpy.ndarray or scipy.sparse) – input features, can be a dense or sparse matrix of size (n_samples, n_features)
  • y (numpy.ndaarray or scipy.sparse {0,1}) – binary indicator matrix with label assignments, shape (n_samples, n_labels)
Returns:

Return type:

fitted instance of self

predict(X)[source]

Predict label assignments

Parameters:X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape (n_samples, n_features)
Returns:binary indicator matrix with label assignments with shape (n_samples, n_labels)
Return type:scipy.sparse of int
predict_proba(X)[source]

Predict label probabilities

Parameters:X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape (n_samples, n_features)
Returns:binary indicator matrix with probability of label assignment with shape (n_samples, n_labels)
Return type:scipy.sparse of float