skmultilearn.cluster.graphtool module¶
-
class
skmultilearn.cluster.
StochasticBlockModel
(nested, use_degree_correlation, allow_overlap, weight_model)[source]¶ Bases:
object
A Stochastic Blockmodel fit to Label Graph
This contains a stochastic block model instance constructed for a block model variant specified in parameters. It can be fit to an instance of a graph and set of weights. More information on how to select parameters can be found in the extensive introduction into Stochastic Block Models in graphtool documentation.
Parameters: - nested (boolean) – whether to build a nested Stochastic Block Model or the regular variant,
will be automatically put under
self.nested
. - use_degree_correlation (boolean) – whether to correct for degree correlation in modeling, will be automatically
put under
self.use_degree_correlation
. - allow_overlap (boolean) – whether to allow overlapping clusters or not, will be automatically
put under
self.allow_overlap
. - weight_model (string or None) – decide whether to generate a weighted or unweighted graph,
will be automatically put under
self.weight_model
.
-
model_
¶ an instance of the fitted model obtained from graph-tool
Type: graph_tool.inference.BlockState or its subclass
-
fit_predict
(graph, weights)[source]¶ Fits model to a given graph and weights list
Sets
self.model_
to the state of graphtool’s Stochastic Block Model the after fitting.-
graph
¶ the graph to fit the model to
Type: graphtool.Graph
-
weights
¶ the property map: edge -> weight (double) to fit the model to, if weighted variant is selected
Type: graphtool.EdgePropertyMap<double>
Returns: partition of labels, each sublist contains label indices related to label positions in y
Return type: numpy.ndarray -
- nested (boolean) – whether to build a nested Stochastic Block Model or the regular variant,
will be automatically put under
-
class
skmultilearn.cluster.
GraphToolLabelGraphClusterer
(graph_builder, model)[source]¶ Bases:
skmultilearn.cluster.base.LabelGraphClustererBase
Fits a Stochastic Block Model to the Label Graph and infers the communities
This clusterer clusters the label space using by fitting a stochastic block model to the label network and inferring the community structure using graph-tool. The obtained community structure is returned as the label clustering. More information on the inference itself can be found in the extensive introduction into Stochastic Block Models in graphtool documentation.
Parameters: - graph_builder (a GraphBuilderBase inherited transformer) – the graph builder to provide the adjacency matrix and weight map for the underlying graph
- model (StochasticBlockModel) – the desired stochastic block model variant to use
-
graph_
¶ object representing a label co-occurence graph
Type: graphtool.Graph
-
weights_
¶ edge weights defined by graph builder stored in a graphtool compatible format
Type: graphtool.EdgeProperty<double>
Note
This functionality is still undergoing research.
Note
This clusterer is GPL-licenced and will taint your code with GPL restrictions.
References
If you use this class please cite:
Examples
An example code for using this clusterer with a classifier looks like this:
from sklearn.ensemble import RandomForestClassifier from skmultilearn.problem_transform import LabelPowerset from skmultilearn.cluster import IGraphLabelGraphClusterer, LabelCooccurrenceGraphBuilder from skmultilearn.ensemble import LabelSpacePartitioningClassifier # construct base forest classifier base_classifier = RandomForestClassifier(n_estimators=1000) # construct a graph builder that will include # label relations weighted by how many times they # co-occurred in the data, without self-edges graph_builder = LabelCooccurrenceGraphBuilder( weighted = True, include_self_edges = False ) # select parameters for the model, we fit a flat, # non-degree correlated, partitioning model # which will use fit the normal distribution as the weights model model = StochasticBlockModel( nested=False, use_degree_correlation=True, allow_overlap=False, weight_model='real-normal' ) # setup problem transformation approach with sparse matrices for random forest problem_transform_classifier = LabelPowerset(classifier=base_classifier, require_dense=[False, False]) # setup the clusterer to use, we selected the fast greedy modularity-maximization approach clusterer = GraphToolLabelGraphClusterer(graph_builder=graph_builder, model=model) # setup the ensemble metaclassifier classifier = LabelSpacePartitioningClassifier(problem_transform_classifier, clusterer) # train classifier.fit(X_train, y_train) # predict predictions = classifier.predict(X_test)
For more use cases see the label relations exploration guide.
-
fit_predict
(X, y)[source]¶ Performs clustering on y and returns list of label lists
Builds a label graph using the provided graph builder’s transform method on y and then detects communities using the selected method.
Sets
self.weights_
andself.graph_
.Parameters: - X (None) – currently unused, left for scikit compatibility
- y (scipy.sparse) – label space of shape
(n_samples, n_labels)
Returns: label space division, each sublist represents labels that are in that community
Return type: arrray of arrays of label indexes (numpy.ndarray)