skmultilearn.cluster.random module

class skmultilearn.cluster.RandomLabelSpaceClusterer(cluster_size, cluster_count, allow_overlap)[source]

Bases: skmultilearn.cluster.base.LabelSpaceClustererBase

Randomly divides the label space into equally-sized clusters

This method divides the label space by drawing without replacement a desired number of equally sized subsets of label space, in a partitioning or overlapping scheme.

Parameters:
  • cluster_size (int) – desired size of a single cluster, will be automatically put under self.cluster_size.
  • cluster_count (int) – number of clusters to divide into, will be automatically put under self.cluster_count.
  • allow_overlap (bool) – whether to allow overlapping clusters or not, will be automatically put under self.allow_overlap.

Examples

The following code performs random label space partitioning.

from skmultilearn.cluster import RandomLabelSpaceClusterer

# assume X,y contain the data, example y contains 5 labels
cluster_count = 2
cluster_size = y.shape[1]//cluster_count # == 2
clr = RandomLabelSpaceClusterer(cluster_size, cluster_count, allow_overlap=False)
clr.fit_predict(X,y)
# Result:
# array([list([0, 4]), list([2, 3]), list([1])], dtype=object)

Note that the leftover labels that did not fit in cluster_size x cluster_count classifiers will be appended to an additional last cluster of size at most cluster_size - 1.

You can also use this class to get a random division of the label space, even with multiple overlaps:

from skmultilearn.cluster import RandomLabelSpaceClusterer

cluster_size = 3
cluster_count = 5
clr = RandomLabelSpaceClusterer(cluster_size, cluster_count, allow_overlap=True)
clr.fit_predict(X,y)

# Result
# array([[2, 1, 3],
#        [3, 0, 4],
#        [2, 3, 1],
#        [2, 3, 4],
#        [3, 4, 0],
#        [3, 0, 2]])

Note that you will never get the same label subset twice.

fit_predict(X, y)[source]

Cluster the output space

Parameters:
  • X (currently unused, left for scikit compatibility) –
  • y (scipy.sparse) – label space of shape (n_samples, n_labels)
Returns:

label space division, each sublist represents labels that are in that community

Return type:

arrray of arrays of label indexes (numpy.ndarray)