skmultilearn.cluster.igraph module

skmultilearn.cluster.igraph module¶

class skmultilearn.cluster.IGraphLabelGraphClusterer(graph_builder, method)[source]¶

Bases: skmultilearn.cluster.base.LabelGraphClustererBase

Clusters the label space using igraph community detection methods

This clusterer constructs an igraph representation of the Label Graph generated by graph builder and detects communities in it using community detection methods from the igraph library. Detected communities are converted to a label space clustering. The approach has been described in this paper concerning data-driven label space division.

Parameters:

graph_builder (a GraphBuilderBase inherited transformer) – the graph builder to provide the adjacency matrix and weight map for the underlying graph

method (string) –

the community detection method to use, this clusterer supports the following community detection methods:

Method name string	Description
fastgreedy	Detecting communities with largest modularity using incremental greedy search
infomap	Detecting communities through information flow compressing simulated via random walks
label_propagation	Detecting communities from colorings via multiple label propagation on the graph
leading_eigenvector	Detecting communities with largest modularity through adjacency matrix eigenvectors
multilevel	Recursive communitiy detection with largest modularity step by step maximization
walktrap	Finding communities by trapping many random walks

graph_¶: igraph.Graph – the igraph Graph object containing the graph representation of graph builder’s adjacency matrix and weights

weights_¶: { ‘weight’ : list of values in edge order of graph edges } – edge weights stored in a format recognizable by the igraph module

Note

This clusterer is GPL-licenced and will taint your code with GPL restrictions.

References

If you use this clusterer please cite the igraph paper and the clustering paper:

@Article{igraph,
    title = {The igraph software package for complex network research},
    author = {Gabor Csardi and Tamas Nepusz},
    journal = {InterJournal},
    volume = {Complex Systems},
    pages = {1695},
    year = {2006},
    url = {http://igraph.org},
}

@Article{datadriven,
    author = {Szymański, Piotr and Kajdanowicz, Tomasz and Kersting, Kristian},
    title = {How Is a Data-Driven Approach Better than Random Choice in
    Label Space Division for Multi-Label Classification?},
    journal = {Entropy},
    volume = {18},
    year = {2016},
    number = {8},
    article_number = {282},
    url = {http://www.mdpi.com/1099-4300/18/8/282},
    issn = {1099-4300},
    doi = {10.3390/e18080282}
}

Examples

An example code for using this clusterer with a classifier looks like this:

from sklearn.ensemble import RandomForestClassifier
from skmultilearn.problem_transform import LabelPowerset
from skmultilearn.cluster import IGraphLabelGraphClusterer, LabelCooccurrenceGraphBuilder
from skmultilearn.ensemble import LabelSpacePartitioningClassifier

# construct base forest classifier
base_classifier = RandomForestClassifier(n_estimators=1000)

# construct a graph builder that will include
# label relations weighted by how many times they
# co-occurred in the data, without self-edges
graph_builder = LabelCooccurrenceGraphBuilder(
    weighted = True,
    include_self_edges = False
)

# setup problem transformation approach with sparse matrices for random forest
problem_transform_classifier = LabelPowerset(classifier=base_classifier,
    require_dense=[False, False])

# setup the clusterer to use, we selected the fast greedy modularity-maximization approach
clusterer = IGraphLabelGraphClusterer(graph_builder=graph_builder, method='fastgreedy')

# setup the ensemble metaclassifier
classifier = LabelSpacePartitioningClassifier(problem_transform_classifier, clusterer)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

For more use cases see the label relations exploration guide.

fit_predict(X, y)[source]¶

Performs clustering on y and returns list of label lists

Builds a label graph using the provided graph builder’s transform method on y and then detects communities using the selected method.

Sets self.weights_ and self.graph_.

Parameters:	X (None) – currently unused, left for scikit compatibility y (scipy.sparse) – label space of shape `(n_samples, n_labels)`
Returns:	label space division, each sublist represents labels that are in that community
Return type:	arrray of arrays of label indexes (numpy.ndarray)