skmultilearn.cluster package

The skmultilearn.cluster module gathers label space clustering methods.

class skmultilearn.cluster.LabelSpaceClustererBase[source]

Bases: future.types.newobject.newobject

An abstract base class for Label Space clustering

Implement it in your classifier according to Developing a label space clusterer.

fit_predict(X, y)[source]

Abstract method for clustering label space

Implement it in your classifier according to Developing a label space clusterer.

Raises:NotImplementedError – this is just an abstract method
class skmultilearn.cluster.LabelCooccurenceClustererBase(weighted=None, include_self_edges=None)[source]

Bases: skmultilearn.cluster.base.LabelSpaceClustererBase

Base class providing API and common functions for all label cooccurence based multi-label classifiers.

Parameters:

weighted: boolean

Decide whether to generate a weighted or unweighted graph.

include_self_edges : boolean

Decide whether to include self-edge i.e. label 1 - label 1 in co-occurrence graph

generate_coocurence_adjacency_matrix(y)[source]

Generate adjacency matrix from label matrix

This function generates a weighted or unweighted cooccurence graph based on input binary label vectors and sets it to self.coocurence_graph

Parameters:

y (dense or sparse matrix of {0, 1} (n_samples, n_labels)) – binary indicator matrix with label assignments

Returns:

edge_map: dict{ (int, int) : float }

Returns a dict of weights

class skmultilearn.cluster.GraphToolCooccurenceClusterer(weighted=None, include_self_edges=None, allow_overlap=None, n_iters=100, n_init_iters=10, use_degree_corr=None, model_selection_criterium='mean_field', verbose=False, equlibrate_options={})[source]

Bases: skmultilearn.cluster.base.LabelCooccurenceClustererBase

Clusters the label space using graph tool’s stochastic block modelling community detection method

Parameters:

weighted: boolean

Decide whether to generate a weighted or unweighted graph.

include_self_edges : boolean

Decide whether to include self-edge i.e. label 1 - label 1 in co-occurrence graph

allow_overlap: boolean

Allow overlapping of clusters or not.

n_iters : int

Number of iterations to perform in sweeping

n_init_iters: int

Number of iterations to perform

use_degree_corr: None or bool

Whether to use a degree correlated stochastic blockmodel, or not - if None, it is selected based on selection criterium

model_selection_criterium: ‘mean_field’ or ‘bethe’

Approach to use in case

verbose: bool

Be verbose about the output

equlibrate_options: dict

additional options to pass to graphtool’s mcmc_equilibrate

fit_predict(X, y)[source]

Performs clustering on y and returns list of label lists

Builds a label coocurence_graph using LabelCooccurenceClustererBase.generate_coocurence_adjacency_matrix() on y and then detects communities using graph tool’s stochastic block modeling.

Parameters:

X : sparse matrix (n_samples, n_features), feature space, not used in this clusterer

y : sparse matrix (n_samples, n_labels), label space

Returns:

partition: list of lists : list of lists label indexes, each sublist represents labels that are in that community

generate_coocurence_graph()[source]

Constructs the label coocurence graph

This function constructs a graph-tool graphtool.Graph object representing the label cooccurence graph. Run after self.edge_map has been populated using LabelCooccurenceClustererBase.generate_coocurence_adjacency_matrix() on y in fit_predict.

The graph is available as self.coocurence_graph, and a weight double graphtool.PropertyMap on edges is set as self.weights.

Edge weights are all 1.0 if self.weighted is false, otherwise they contain the number of samples that are labelled with the two labels present in the edge.

Returns:g : graphtool.Graph object representing a label co-occurence graph
predict_communities(deg_corr)[source]
class skmultilearn.cluster.IGraphLabelCooccurenceClusterer(method=None, weighted=None, include_self_edges=None)[source]

Bases: skmultilearn.cluster.base.LabelCooccurenceClustererBase

Clusters the label space using igraph community detection methods

Parameters:

method : enum from IGraphLabelCooccurenceClusterer.METHODS

the igraph community detection method that will be used

weighted: boolean

Decide whether to generate a weighted or unweighted graph.

include_self_edges : boolean

Decide whether to include self-edge i.e. label 1 - label 1 in co-occurrence graph

METHODS = {'fastgreedy': <function <lambda>>, 'walktrap': <function <lambda>>, 'infomap': <function <lambda>>, 'label_propagation': <function <lambda>>, 'leading_eigenvector': <function <lambda>>, 'multilevel': <function <lambda>>}
fit_predict(X, y)[source]

Performs clustering on y and returns list of label lists

Builds a label coocurence_graph using LabelCooccurenceClustererBase.generate_coocurence_adjacency_matrix() on y and then detects communities using a selected method.

Parameters:

X : sparse matrix (n_samples, n_features), feature space, not used in this clusterer

y : sparse matrix (n_samples, n_labels), label space

Returns:

partition: list of lists : list of lists label indexes, each sublist represents labels that are in that community

class skmultilearn.cluster.MatrixLabelSpaceClusterer(clusterer=None, pass_input_space=False)[source]

Bases: skmultilearn.cluster.base.LabelSpaceClustererBase

Clusters the label space using a matrix-based clusterer

Parameters:
  • clusterer – a clonable instance of a scikit-compatible matrix-based clusterer
  • bool (pass_input_space) – whether to take X into consideration upon clustering, use only if you know that the clusterer can handle two parameters for clustering
fit_predict(X, y)[source]

Cluster the output space

Uses the fit_predict method of provided clusterer to perform label space division.

Returns:partition of labels, each sublist contains label indices related to label positions in y
Return type:nd.array of nd.arrays
Returns:this is just an abstract method

Submodules

skmultilearn.cluster.base module

class skmultilearn.cluster.base.LabelCooccurenceClustererBase(weighted=None, include_self_edges=None)[source]

Bases: skmultilearn.cluster.base.LabelSpaceClustererBase

Base class providing API and common functions for all label cooccurence based multi-label classifiers.

Parameters:

weighted: boolean

Decide whether to generate a weighted or unweighted graph.

include_self_edges : boolean

Decide whether to include self-edge i.e. label 1 - label 1 in co-occurrence graph

generate_coocurence_adjacency_matrix(y)[source]

Generate adjacency matrix from label matrix

This function generates a weighted or unweighted cooccurence graph based on input binary label vectors and sets it to self.coocurence_graph

Parameters:

y (dense or sparse matrix of {0, 1} (n_samples, n_labels)) – binary indicator matrix with label assignments

Returns:

edge_map: dict{ (int, int) : float }

Returns a dict of weights

class skmultilearn.cluster.base.LabelSpaceClustererBase[source]

Bases: future.types.newobject.newobject

An abstract base class for Label Space clustering

Implement it in your classifier according to Developing a label space clusterer.

fit_predict(X, y)[source]

Abstract method for clustering label space

Implement it in your classifier according to Developing a label space clusterer.

Raises:NotImplementedError – this is just an abstract method

skmultilearn.cluster.graphtool module

class skmultilearn.cluster.graphtool.GraphToolCooccurenceClusterer(weighted=None, include_self_edges=None, allow_overlap=None, n_iters=100, n_init_iters=10, use_degree_corr=None, model_selection_criterium='mean_field', verbose=False, equlibrate_options={})[source]

Bases: skmultilearn.cluster.base.LabelCooccurenceClustererBase

Clusters the label space using graph tool’s stochastic block modelling community detection method

Parameters:

weighted: boolean

Decide whether to generate a weighted or unweighted graph.

include_self_edges : boolean

Decide whether to include self-edge i.e. label 1 - label 1 in co-occurrence graph

allow_overlap: boolean

Allow overlapping of clusters or not.

n_iters : int

Number of iterations to perform in sweeping

n_init_iters: int

Number of iterations to perform

use_degree_corr: None or bool

Whether to use a degree correlated stochastic blockmodel, or not - if None, it is selected based on selection criterium

model_selection_criterium: ‘mean_field’ or ‘bethe’

Approach to use in case

verbose: bool

Be verbose about the output

equlibrate_options: dict

additional options to pass to graphtool’s mcmc_equilibrate

fit_predict(X, y)[source]

Performs clustering on y and returns list of label lists

Builds a label coocurence_graph using LabelCooccurenceClustererBase.generate_coocurence_adjacency_matrix() on y and then detects communities using graph tool’s stochastic block modeling.

Parameters:

X : sparse matrix (n_samples, n_features), feature space, not used in this clusterer

y : sparse matrix (n_samples, n_labels), label space

Returns:

partition: list of lists : list of lists label indexes, each sublist represents labels that are in that community

generate_coocurence_graph()[source]

Constructs the label coocurence graph

This function constructs a graph-tool graphtool.Graph object representing the label cooccurence graph. Run after self.edge_map has been populated using LabelCooccurenceClustererBase.generate_coocurence_adjacency_matrix() on y in fit_predict.

The graph is available as self.coocurence_graph, and a weight double graphtool.PropertyMap on edges is set as self.weights.

Edge weights are all 1.0 if self.weighted is false, otherwise they contain the number of samples that are labelled with the two labels present in the edge.

Returns:g : graphtool.Graph object representing a label co-occurence graph
predict_communities(deg_corr)[source]

skmultilearn.cluster.igraph module

class skmultilearn.cluster.igraph.IGraphLabelCooccurenceClusterer(method=None, weighted=None, include_self_edges=None)[source]

Bases: skmultilearn.cluster.base.LabelCooccurenceClustererBase

Clusters the label space using igraph community detection methods

Parameters:

method : enum from IGraphLabelCooccurenceClusterer.METHODS

the igraph community detection method that will be used

weighted: boolean

Decide whether to generate a weighted or unweighted graph.

include_self_edges : boolean

Decide whether to include self-edge i.e. label 1 - label 1 in co-occurrence graph

METHODS = {'fastgreedy': <function <lambda>>, 'walktrap': <function <lambda>>, 'infomap': <function <lambda>>, 'label_propagation': <function <lambda>>, 'leading_eigenvector': <function <lambda>>, 'multilevel': <function <lambda>>}
fit_predict(X, y)[source]

Performs clustering on y and returns list of label lists

Builds a label coocurence_graph using LabelCooccurenceClustererBase.generate_coocurence_adjacency_matrix() on y and then detects communities using a selected method.

Parameters:

X : sparse matrix (n_samples, n_features), feature space, not used in this clusterer

y : sparse matrix (n_samples, n_labels), label space

Returns:

partition: list of lists : list of lists label indexes, each sublist represents labels that are in that community

skmultilearn.cluster.matrix module

class skmultilearn.cluster.matrix.MatrixLabelSpaceClusterer(clusterer=None, pass_input_space=False)[source]

Bases: skmultilearn.cluster.base.LabelSpaceClustererBase

Clusters the label space using a matrix-based clusterer

Parameters:
  • clusterer

    a clonable instance of a scikit-compatible matrix-based clusterer

  • bool (pass_input_space) – whether to take X into consideration upon clustering, use only if you know that the clusterer can handle two parameters for clustering
fit_predict(X, y)[source]

Cluster the output space

Uses the fit_predict method of provided clusterer to perform label space division.

Returns:partition of labels, each sublist contains label indices related to label positions in y
Return type:nd.array of nd.arrays
Returns:this is just an abstract method