skmultilearn.cluster.graphtool module

skmultilearn.cluster.graphtool module¶

class skmultilearn.cluster.StochasticBlockModel(nested, use_degree_correlation, allow_overlap, weight_model)[source]¶

Bases: object

A Stochastic Blockmodel fit to Label Graph

This contains a stochastic block model instance constructed for a block model variant specified in parameters. It can be fit to an instance of a graph and set of weights. More information on how to select parameters can be found in the extensive introduction into Stochastic Block Models in graphtool documentation.

Parameters:

Parameters:	nested (boolean) – whether to build a nested Stochastic Block Model or the regular variant, will be automatically put under `self.nested`. use_degree_correlation (boolean) – whether to correct for degree correlation in modeling, will be automatically put under `self.use_degree_correlation`. allow_overlap (boolean) – whether to allow overlapping clusters or not, will be automatically put under `self.allow_overlap`. weight_model (string or None) – decide whether to generate a weighted or unweighted graph, will be automatically put under `self.weight_model`.

nested (boolean) – whether to build a nested Stochastic Block Model or the regular variant, will be automatically put under self.nested.
use_degree_correlation (boolean) – whether to correct for degree correlation in modeling, will be automatically put under self.use_degree_correlation.
allow_overlap (boolean) – whether to allow overlapping clusters or not, will be automatically put under self.allow_overlap.
weight_model (string or None) – decide whether to generate a weighted or unweighted graph, will be automatically put under self.weight_model.

model_¶

an instance of the fitted model obtained from graph-tool

Type:	graph_tool.inference.BlockState or its subclass

fit_predict(graph, weights)[source]¶

Fits model to a given graph and weights list

Sets self.model_ to the state of graphtool’s Stochastic Block Model the after fitting.

graph¶

the graph to fit the model to

Type:	graphtool.Graph

weights¶

the property map: edge -> weight (double) to fit the model to, if weighted variant is selected

Type:	graphtool.EdgePropertyMap<double>

Returns:	partition of labels, each sublist contains label indices related to label positions in `y`
Return type:	numpy.ndarray

class skmultilearn.cluster.GraphToolLabelGraphClusterer(graph_builder, model)[source]¶

Bases: skmultilearn.cluster.base.LabelGraphClustererBase

Fits a Stochastic Block Model to the Label Graph and infers the communities

This clusterer clusters the label space using by fitting a stochastic block model to the label network and inferring the community structure using graph-tool. The obtained community structure is returned as the label clustering. More information on the inference itself can be found in the extensive introduction into Stochastic Block Models in graphtool documentation.

Parameters:	graph_builder (a GraphBuilderBase inherited transformer) – the graph builder to provide the adjacency matrix and weight map for the underlying graph model (StochasticBlockModel) – the desired stochastic block model variant to use

graph_¶

object representing a label co-occurence graph

Type:	graphtool.Graph

weights_¶

edge weights defined by graph builder stored in a graphtool compatible format

Type:	graphtool.EdgeProperty<double>

Note

This functionality is still undergoing research.

Note

This clusterer is GPL-licenced and will taint your code with GPL restrictions.

References

If you use this class please cite:

Examples

An example code for using this clusterer with a classifier looks like this:

from sklearn.ensemble import RandomForestClassifier
from skmultilearn.problem_transform import LabelPowerset
from skmultilearn.cluster import IGraphLabelGraphClusterer, LabelCooccurrenceGraphBuilder
from skmultilearn.ensemble import LabelSpacePartitioningClassifier

# construct base forest classifier
base_classifier = RandomForestClassifier(n_estimators=1000)

# construct a graph builder that will include
# label relations weighted by how many times they
# co-occurred in the data, without self-edges
graph_builder = LabelCooccurrenceGraphBuilder(
    weighted = True,
    include_self_edges = False
)

# select parameters for the model, we fit a flat,
# non-degree correlated, partitioning model
# which will use fit the normal distribution as the weights model
model = StochasticBlockModel(
    nested=False,
    use_degree_correlation=True,
    allow_overlap=False,
    weight_model='real-normal'
)

# setup problem transformation approach with sparse matrices for random forest
problem_transform_classifier = LabelPowerset(classifier=base_classifier,
    require_dense=[False, False])

# setup the clusterer to use, we selected the fast greedy modularity-maximization approach
clusterer = GraphToolLabelGraphClusterer(graph_builder=graph_builder, model=model)

# setup the ensemble metaclassifier
classifier = LabelSpacePartitioningClassifier(problem_transform_classifier, clusterer)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

For more use cases see the label relations exploration guide.

fit_predict(X, y)[source]¶

Performs clustering on y and returns list of label lists

Builds a label graph using the provided graph builder’s transform method on y and then detects communities using the selected method.

Sets self.weights_ and self.graph_.

Parameters:	X (None) – currently unused, left for scikit compatibility y (scipy.sparse) – label space of shape `(n_samples, n_labels)`
Returns:	label space division, each sublist represents labels that are in that community
Return type:	arrray of arrays of label indexes (numpy.ndarray)