skmultilearn.cluster.random module¶
-
class
skmultilearn.cluster.
RandomLabelSpaceClusterer
(cluster_size, cluster_count, allow_overlap)[source]¶ Bases:
skmultilearn.cluster.base.LabelSpaceClustererBase
Randomly divides the label space into equally-sized clusters
This method divides the label space by drawing without replacement a desired number of equally sized subsets of label space, in a partitioning or overlapping scheme.
Parameters: - cluster_size (int) – desired size of a single cluster, will be automatically
put under
self.cluster_size
. - cluster_count (int) – number of clusters to divide into, will be automatically
put under
self.cluster_count
. - allow_overlap (bool) – whether to allow overlapping clusters or not, will be automatically
put under
self.allow_overlap
.
Examples
The following code performs random label space partitioning.
from skmultilearn.cluster import RandomLabelSpaceClusterer # assume X,y contain the data, example y contains 5 labels cluster_count = 2 cluster_size = y.shape[1]//cluster_count # == 2 clr = RandomLabelSpaceClusterer(cluster_size, cluster_count, allow_overlap=False) clr.fit_predict(X,y) # Result: # array([list([0, 4]), list([2, 3]), list([1])], dtype=object)
Note that the leftover labels that did not fit in cluster_size x cluster_count classifiers will be appended to an additional last cluster of size at most cluster_size - 1.
You can also use this class to get a random division of the label space, even with multiple overlaps:
from skmultilearn.cluster import RandomLabelSpaceClusterer cluster_size = 3 cluster_count = 5 clr = RandomLabelSpaceClusterer(cluster_size, cluster_count, allow_overlap=True) clr.fit_predict(X,y) # Result # array([[2, 1, 3], # [3, 0, 4], # [2, 3, 1], # [2, 3, 4], # [3, 4, 0], # [3, 0, 2]])
Note that you will never get the same label subset twice.
-
fit_predict
(X, y)[source]¶ Cluster the output space
Parameters: - X (currently unused, left for scikit compatibility) –
- y (scipy.sparse) – label space of shape
(n_samples, n_labels)
Returns: label space division, each sublist represents labels that are in that community
Return type: arrray of arrays of label indexes (numpy.ndarray)
- cluster_size (int) – desired size of a single cluster, will be automatically
put under