Network-based label space partition ensemble classification

Network-based label space partition ensemble classification¶

class skmultilearn.ensemble.LabelSpacePartitioningClassifier(classifier=None, clusterer=None, require_dense=None)[source]¶

Bases: skmultilearn.problem_transform.br.BinaryRelevance

Partition label space and classify each subspace separately

This classifier performs classification by:

1. partitioning the label space into separate, smaller multi-label sub problems, using the supplied label space clusterer

training an instance of the supplied base mult-label classifier for each label space subset in the partition
predicting the result with each of subclassifiers and returning the sum of their results

Parameters:

Parameters:	classifier (`BaseEstimator`) – the base classifier that will be used in a class, will be automatically put under `self.classifier`. clusterer (`LabelSpaceClustererBase`) – object that partitions the output space, will be automatically put under `self.clusterer`. require_dense ([bool, bool]) – whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under `self.require_dense`.

classifier (BaseEstimator) – the base classifier that will be used in a class, will be automatically put under self.classifier.
clusterer (LabelSpaceClustererBase) – object that partitions the output space, will be automatically put under self.clusterer.
require_dense ([bool, bool]) – whether the base classifier requires [input, output] matrices in dense representation, will be automatically put under self.require_dense.

model_count_¶

number of trained models, in this classifier equal to the number of partitions

Type:	int

partition_¶

list of lists of label indexes, used to index the output space matrix, set in _generate_partition() via fit()

Type:	List[List[int]], shape=(model_count_,)

classifiers¶

list of classifiers trained per partition, set in fit()

Type:	List[`BaseEstimator`], shape=(model_count_,)

References

If you use this clusterer please cite the clustering paper:

@Article{datadriven,
    author = {Szymański, Piotr and Kajdanowicz, Tomasz and Kersting, Kristian},
    title = {How Is a Data-Driven Approach Better than Random Choice in
    Label Space Division for Multi-Label Classification?},
    journal = {Entropy},
    volume = {18},
    year = {2016},
    number = {8},
    article_number = {282},
    url = {http://www.mdpi.com/1099-4300/18/8/282},
    issn = {1099-4300},
    doi = {10.3390/e18080282}
}

Examples

Here’s an example of building a partitioned ensemble of Classifier Chains

from skmultilearn.ensemble import MajorityVotingClassifier
from skmultilearn.cluster import FixedLabelSpaceClusterer
from skmultilearn.problem_transform import ClassifierChain
from sklearn.naive_bayes import GaussianNB

classifier = MajorityVotingClassifier(
    clusterer = FixedLabelSpaceClusterer(clusters = [[1,3,4], [0, 2, 5]]),
    classifier = ClassifierChain(classifier=GaussianNB())
)
classifier.fit(X_train,y_train)
predictions = classifier.predict(X_test)

More advanced examples can be found in the label relations exploration guide

predict(X)[source]¶

Predict labels for X

Parameters:	X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape `(n_samples, n_features)`
Returns:	binary indicator matrix with label assignments with shape `(n_samples, n_labels)`
Return type:	scipy.sparse of int