RAkELo: random overlapping label space division with Label Powerset¶
-
class
skmultilearn.ensemble.
RakelO
(base_classifier=None, model_count=None, labelset_size=3, base_classifier_require_dense=None)[source]¶ Bases:
skmultilearn.base.base.MLClassifierBase
Overlapping RAndom k-labELsets multi-label classifier
Divides the label space in to m subsets of size k, trains a Label Powerset classifier for each subset and assign a label to an instance if more than half of all classifiers (majority) from clusters that contain the label assigned the label to the instance.
Parameters: - base_classifier (
BaseEstimator
) – scikit-learn compatible base classifier, will be set under self.classifier.classifier. - base_classifier_require_dense ([bool, bool]) – whether the base classifier requires [input, output] matrices in dense representation. Will be automatically set under self.classifier.require_dense
- labelset_size (int) – the desired size of each of the partitions, parameter k according to paper. According to paper, the best parameter is 3, so it’s set as default Will be automatically set under self.labelset_size
- model_count (int) – the desired number of classifiers, parameter m according to paper.
According to paper, the best value for this parameter is 2M (being M the number of labels)
Will be automatically set under
self.model_count_
.
-
classifier
¶ the voting classifier initialized with
LabelPowerset
multi-label classifier with base_classifier andRandomLabelSpaceClusterer
Type: MajorityVotingClassifier
References
If you use this class please cite the paper introducing the method:
@ARTICLE{5567103, author={G. Tsoumakas and I. Katakis and I. Vlahavas}, journal={IEEE Transactions on Knowledge and Data Engineering}, title={Random k-Labelsets for Multilabel Classification}, year={2011}, volume={23}, number={7}, pages={1079-1089}, doi={10.1109/TKDE.2010.164}, ISSN={1041-4347}, month={July}, }
Examples
Here’s a simple example of how to use this class with a base classifier from scikit-learn to teach 6 classifiers each trained on a quarter of labels, which is sure to overlap:
from sklearn.naive_bayes import GaussianNB from skmultilearn.ensemble import RakelO classifier = RakelO( base_classifier=GaussianNB(), base_classifier_require_dense=[True, True], labelset_size=y_train.shape[1] // 4, model_count_=6 ) classifier.fit(X_train, y_train) prediction = classifier.predict(X_train, y_train)
-
fit
(X, y)[source]¶ Fits classifier to training data
Parameters: - X (array_like,
numpy.matrix
orscipy.sparse
matrix, shape=(n_samples, n_features)) – input feature matrix - y (array_like,
numpy.matrix
orscipy.sparse
matrix of {0, 1}, shape=(n_samples, n_labels)) – binary indicator matrix with label assignments
Returns: fitted instance of self
Return type: self
- X (array_like,
-
predict
(X)[source]¶ Predict labels for X
Parameters: X (array_like, numpy.matrix
orscipy.sparse
matrix, shape=(n_samples, n_features)) – input feature matrixReturns: binary indicator matrix with label assignments Return type: scipy.sparse
matrix of {0, 1}, shape=(n_samples, n_labels)
-
predict_proba
(X)[source]¶ Predict probabilities of label assignments for X
Parameters: X (array_like, numpy.matrix
orscipy.sparse
matrix, shape=(n_samples, n_features)) – input feature matrixReturns: matrix with label assignment probabilities Return type: scipy.sparse
matrix of float in [0.0, 1.0], shape=(n_samples, n_labels)
- base_classifier (