Label Powerset¶
-
class
skmultilearn.problem_transform.
LabelPowerset
(classifier=None, require_dense=None)[source]¶ Bases:
skmultilearn.base.problem_transformation.ProblemTransformationBase
Transform multi-label problem to a multi-class problem
Label Powerset is a problem transformation approach to multi-label classification that transforms a multi-label problem to a multi-class problem with 1 multi-class classifier trained on all unique label combinations found in the training data.
The method maps each combination to a unique combination id number, and performs multi-class classification using the classifier as multi-class classifier and combination ids as classes.
Parameters: - classifier (
BaseEstimator
) – scikit-learn compatible base classifier - require_dense ([bool, bool], optional) – whether the base classifier requires dense representations
for input features and classes/labels matrices in fit/predict.
If value not provided, sparse representations are used if base classifier is
an instance of
skmultilearn.base.MLClassifierBase
and dense otherwise.
-
unique_combinations_
¶ mapping from label combination as string to label combination id
transform:()
viafit()
Type: Dict[str, int]
-
reverse_combinations_
¶ label combination id ordered list to list of label indexes for a given combination
transform:()
viafit()
Type: List[List[int]]
Notes
Note
n_classes in this document denotes the number of unique label combinations present in the training y passed to
fit()
, in practice it is equal tolen(self.unique_combinations)
Examples
An example use case for Label Powerset with an
sklearn.ensemble.RandomForestClassifier
base classifier which supports sparse input:from skmultilearn.problem_transform import LabelPowerset from sklearn.ensemble import RandomForestClassifier # initialize LabelPowerset multi-label classifier with a RandomForest classifier = ClassifierChain( classifier = RandomForestClassifier(n_estimators=100), require_dense = [False, True] ) # train classifier.fit(X_train, y_train) # predict predictions = classifier.predict(X_test)
Another way to use this classifier is to select the best scenario from a set of multi-class classifiers used with Label Powerset, this can be done using cross validation grid search. In the example below, the model with highest accuracy results is selected from either a
sklearn.ensemble.RandomForestClassifier
orsklearn.naive_bayes.MultinomialNB
base classifier, alongside with best parameters for that base classifier.from skmultilearn.problem_transform import LabelPowerset from sklearn.model_selection import GridSearchCV from sklearn.naive_bayes import MultinomialNB from sklearn.ensemble import RandomForestClassifier parameters = [ { 'classifier': [MultinomialNB()], 'classifier__alpha': [0.7, 1.0], }, { 'classifier': [RandomForestClassifier()], 'classifier__criterion': ['gini', 'entropy'], 'classifier__n_estimators': [10, 20, 50], }, ] clf = GridSearchCV(LabelPowerset(), parameters, scoring='accuracy') clf.fit(x, y) print (clf.best_params_, clf.best_score_) # result # { # 'classifier': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', # max_depth=None, max_features='auto', max_leaf_nodes=None, # min_impurity_decrease=0.0, min_impurity_split=None, # min_samples_leaf=1, min_samples_split=2, # min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=1, # oob_score=False, random_state=None, verbose=0, # warm_start=False), 'classifier__criterion': 'gini', 'classifier__n_estimators': 50 # } 0.16
-
fit
(X, y)[source]¶ Fits classifier to training data
Parameters: - X (array_like,
numpy.matrix
orscipy.sparse
matrix, shape=(n_samples, n_features)) – input feature matrix - y (array_like,
numpy.matrix
orscipy.sparse
matrix of {0, 1}, shape=(n_samples, n_labels)) – binary indicator matrix with label assignments
Returns: fitted instance of self
Return type: self
Notes
Note
Input matrices are converted to sparse format internally if a numpy representation is passed
- X (array_like,
-
inverse_transform
(y)[source]¶ Transforms multi-class assignment to multi-label
Transforms a mutli-label problem into a single-label multi-class problem where each label combination is a separate class.
Parameters: y (numpy.ndarray of {0, … , n_classes-1}, shape=(n_samples,)) – binary indicator matrix with label assignments Returns: binary indicator matrix with label assignments Return type: scipy.sparse
matrix of {0, 1}, shape=(n_samples, n_labels)
-
predict
(X)[source]¶ Predict labels for X
Parameters: X (array_like, numpy.matrix
orscipy.sparse
matrix, shape=(n_samples, n_features)) – input feature matrixReturns: binary indicator matrix with label assignments Return type: scipy.sparse
matrix of {0, 1}, shape=(n_samples, n_labels)
-
predict_proba
(X)[source]¶ Predict probabilities of label assignments for X
Parameters: X (array_like, numpy.matrix
orscipy.sparse
matrix, shape=(n_samples, n_features)) – input feature matrixReturns: matrix with label assignment probabilities Return type: scipy.sparse
matrix of float in [0.0, 1.0], shape=(n_samples, n_labels)
-
transform
(y)[source]¶ Transform multi-label output space to multi-class
Transforms a mutli-label problem into a single-label multi-class problem where each label combination is a separate class.
Parameters: y (array_like, numpy.matrix
orscipy.sparse
matrix of {0, 1}, shape=(n_samples, n_labels)) – binary indicator matrix with label assignmentsReturns: a multi-class output space vector Return type: numpy.ndarray of {0, … , n_classes-1}, shape=(n_samples,)
- classifier (