Label Powerset

Label Powerset¶

class skmultilearn.problem_transform.LabelPowerset(classifier=None, require_dense=None)[source]¶

Bases: skmultilearn.base.problem_transformation.ProblemTransformationBase

Transform multi-label problem to a multi-class problem

Label Powerset is a problem transformation approach to multi-label classification that transforms a multi-label problem to a multi-class problem with 1 multi-class classifier trained on all unique label combinations found in the training data.

The method maps each combination to a unique combination id number, and performs multi-class classification using the classifier as multi-class classifier and combination ids as classes.

Parameters:

Parameters:	classifier (`BaseEstimator`) – scikit-learn compatible base classifier require_dense ([bool, bool], optional) – whether the base classifier requires dense representations for input features and classes/labels matrices in fit/predict. If value not provided, sparse representations are used if base classifier is an instance of `skmultilearn.base.MLClassifierBase` and dense otherwise.

classifier (BaseEstimator) – scikit-learn compatible base classifier
require_dense ([bool, bool], optional) – whether the base classifier requires dense representations for input features and classes/labels matrices in fit/predict. If value not provided, sparse representations are used if base classifier is an instance of skmultilearn.base.MLClassifierBase and dense otherwise.

unique_combinations_¶

mapping from label combination as string to label combination id transform:() via fit()

Type:	Dict[str, int]

reverse_combinations_¶

label combination id ordered list to list of label indexes for a given combination transform:() via fit()

Type:	List[List[int]]

Notes

Note

n_classes in this document denotes the number of unique label combinations present in the training y passed to fit(), in practice it is equal to len(self.unique_combinations)

Examples

An example use case for Label Powerset with an sklearn.ensemble.RandomForestClassifier base classifier which supports sparse input:

from skmultilearn.problem_transform import LabelPowerset
from sklearn.ensemble import RandomForestClassifier

# initialize LabelPowerset multi-label classifier with a RandomForest
classifier = ClassifierChain(
    classifier = RandomForestClassifier(n_estimators=100),
    require_dense = [False, True]
)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

Another way to use this classifier is to select the best scenario from a set of multi-class classifiers used with Label Powerset, this can be done using cross validation grid search. In the example below, the model with highest accuracy results is selected from either a sklearn.ensemble.RandomForestClassifier or sklearn.naive_bayes.MultinomialNB base classifier, alongside with best parameters for that base classifier.

from skmultilearn.problem_transform import LabelPowerset
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier

parameters = [
    {
        'classifier': [MultinomialNB()],
        'classifier__alpha': [0.7, 1.0],
    },
    {
        'classifier': [RandomForestClassifier()],
        'classifier__criterion': ['gini', 'entropy'],
        'classifier__n_estimators': [10, 20, 50],
    },
]

clf = GridSearchCV(LabelPowerset(), parameters, scoring='accuracy')
clf.fit(x, y)

print (clf.best_params_, clf.best_score_)

# result
# {
#   'classifier': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
#             max_depth=None, max_features='auto', max_leaf_nodes=None,
#             min_impurity_decrease=0.0, min_impurity_split=None,
#             min_samples_leaf=1, min_samples_split=2,
#             min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=1,
#             oob_score=False, random_state=None, verbose=0,
#             warm_start=False), 'classifier__criterion': 'gini', 'classifier__n_estimators': 50
# } 0.16

fit(X, y)[source]¶

Fits classifier to training data

Parameters:	X (array_like, `numpy.matrix` or `scipy.sparse` matrix, shape=(n_samples, n_features)) – input feature matrix y (array_like, `numpy.matrix` or `scipy.sparse` matrix of {0, 1}, shape=(n_samples, n_labels)) – binary indicator matrix with label assignments
Returns:	fitted instance of self
Return type:	self

Notes

Note

Input matrices are converted to sparse format internally if a numpy representation is passed

inverse_transform(y)[source]¶

Transforms multi-class assignment to multi-label

Transforms a mutli-label problem into a single-label multi-class problem where each label combination is a separate class.

Parameters:	y (numpy.ndarray of {0, … , n_classes-1}, shape=(n_samples,)) – binary indicator matrix with label assignments
Returns:	binary indicator matrix with label assignments
Return type:	`scipy.sparse` matrix of {0, 1}, shape=(n_samples, n_labels)

predict(X)[source]¶

Predict labels for X

Parameters:	X (array_like, `numpy.matrix` or `scipy.sparse` matrix, shape=(n_samples, n_features)) – input feature matrix
Returns:	binary indicator matrix with label assignments
Return type:	`scipy.sparse` matrix of {0, 1}, shape=(n_samples, n_labels)

predict_proba(X)[source]¶

Predict probabilities of label assignments for X

Parameters:	X (array_like, `numpy.matrix` or `scipy.sparse` matrix, shape=(n_samples, n_features)) – input feature matrix
Returns:	matrix with label assignment probabilities
Return type:	`scipy.sparse` matrix of float in [0.0, 1.0], shape=(n_samples, n_labels)

transform(y)[source]¶

Transform multi-label output space to multi-class

Transforms a mutli-label problem into a single-label multi-class problem where each label combination is a separate class.

Parameters:	y (array_like, `numpy.matrix` or `scipy.sparse` matrix of {0, 1}, shape=(n_samples, n_labels)) – binary indicator matrix with label assignments
Returns:	a multi-class output space vector
Return type:	numpy.ndarray of {0, … , n_classes-1}, shape=(n_samples,)