skmultilearn.cluster.graphtool module¶

class
skmultilearn.cluster.
StochasticBlockModel
(nested, use_degree_correlation, allow_overlap, weight_model)[source]¶ Bases:
object
A Stochastic Blockmodel fit to Label Graph
This contains a stochastic block model instance constructed for a block model variant specified in parameters. It can be fit to an instance of a graph and set of weights. More information on how to select parameters can be found in the extensive introduction into Stochastic Block Models in graphtool documentation.
Parameters:  nested (boolean) – whether to build a nested Stochastic Block Model or the regular variant,
will be automatically put under
self.nested
.  use_degree_correlation (boolean) – whether to correct for degree correlation in modeling, will be automatically
put under
self.use_degree_correlation
.  allow_overlap (boolean) – whether to allow overlapping clusters or not, will be automatically
put under
self.allow_overlap
.  weight_model (string or None) – decide whether to generate a weighted or unweighted graph,
will be automatically put under
self.weight_model
.

model_
¶ an instance of the fitted model obtained from graphtool
Type: graph_tool.inference.BlockState or its subclass

fit_predict
(graph, weights)[source]¶ Fits model to a given graph and weights list
Sets
self.model_
to the state of graphtool’s Stochastic Block Model the after fitting.
graph
¶ the graph to fit the model to
Type: graphtool.Graph

weights
¶ the property map: edge > weight (double) to fit the model to, if weighted variant is selected
Type: graphtool.EdgePropertyMap<double>
Returns: partition of labels, each sublist contains label indices related to label positions in y
Return type: numpy.ndarray 
 nested (boolean) – whether to build a nested Stochastic Block Model or the regular variant,
will be automatically put under

class
skmultilearn.cluster.
GraphToolLabelGraphClusterer
(graph_builder, model)[source]¶ Bases:
skmultilearn.cluster.base.LabelGraphClustererBase
Fits a Stochastic Block Model to the Label Graph and infers the communities
This clusterer clusters the label space using by fitting a stochastic block model to the label network and inferring the community structure using graphtool. The obtained community structure is returned as the label clustering. More information on the inference itself can be found in the extensive introduction into Stochastic Block Models in graphtool documentation.
Parameters:  graph_builder (a GraphBuilderBase inherited transformer) – the graph builder to provide the adjacency matrix and weight map for the underlying graph
 model (StochasticBlockModel) – the desired stochastic block model variant to use

graph_
¶ object representing a label cooccurence graph
Type: graphtool.Graph

weights_
¶ edge weights defined by graph builder stored in a graphtool compatible format
Type: graphtool.EdgeProperty<double>
Note
This functionality is still undergoing research.
Note
This clusterer is GPLlicenced and will taint your code with GPL restrictions.
References
If you use this class please cite:
Examples
An example code for using this clusterer with a classifier looks like this:
from sklearn.ensemble import RandomForestClassifier from skmultilearn.problem_transform import LabelPowerset from skmultilearn.cluster import IGraphLabelGraphClusterer, LabelCooccurrenceGraphBuilder from skmultilearn.ensemble import LabelSpacePartitioningClassifier # construct base forest classifier base_classifier = RandomForestClassifier(n_estimators=1000) # construct a graph builder that will include # label relations weighted by how many times they # cooccurred in the data, without selfedges graph_builder = LabelCooccurrenceGraphBuilder( weighted = True, include_self_edges = False ) # select parameters for the model, we fit a flat, # nondegree correlated, partitioning model # which will use fit the normal distribution as the weights model model = StochasticBlockModel( nested=False, use_degree_correlation=True, allow_overlap=False, weight_model='realnormal' ) # setup problem transformation approach with sparse matrices for random forest problem_transform_classifier = LabelPowerset(classifier=base_classifier, require_dense=[False, False]) # setup the clusterer to use, we selected the fast greedy modularitymaximization approach clusterer = GraphToolLabelGraphClusterer(graph_builder=graph_builder, model=model) # setup the ensemble metaclassifier classifier = LabelSpacePartitioningClassifier(problem_transform_classifier, clusterer) # train classifier.fit(X_train, y_train) # predict predictions = classifier.predict(X_test)
For more use cases see the label relations exploration guide.

fit_predict
(X, y)[source]¶ Performs clustering on y and returns list of label lists
Builds a label graph using the provided graph builder’s transform method on y and then detects communities using the selected method.
Sets
self.weights_
andself.graph_
.Parameters:  X (None) – currently unused, left for scikit compatibility
 y (scipy.sparse) – label space of shape
(n_samples, n_labels)
Returns: label space division, each sublist represents labels that are in that community
Return type: arrray of arrays of label indexes (numpy.ndarray)