Binary Relevance kNN

class skmultilearn.adapt.BRkNNaClassifier(k=10)[source]

Binary Relevance multi-label classifier based on k-Nearest Neighbors method.

This version of the classifier assigns the labels that are assigned to at least half of the neighbors.

Parameters:k (int) – number of neighbours
knn_

the nearest neighbors single-label classifier used underneath

Type:an instance of sklearn.NearestNeighbors
neighbors_

k neighbors of each sample

Type:array of arrays of int, shape = (n_samples, k)
confidences_

label assignment confidences

Type:matrix of int, shape = (n_samples, n_labels)

References

If you use this method please cite the relevant paper:

@inproceedings{EleftheriosSpyromitros2008,
   author = {Eleftherios Spyromitros, Grigorios Tsoumakas, Ioannis Vlahavas},
   booktitle = {Proc. 5th Hellenic Conference on Artificial Intelligence (SETN 2008)},
   title = {An Empirical Study of Lazy Multilabel Classification Algorithms},
   year = {2008},
   location = {Syros, Greece}
}

Examples

Here’s a very simple example of using BRkNNaClassifier with a fixed number of neighbors:

from skmultilearn.adapt import BRkNNaClassifier

classifier = BRkNNaClassifier(k=3)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

You can also use GridSearchCV to find an optimal set of parameters:

from skmultilearn.adapt import BRkNNaClassifier
from sklearn.model_selection import GridSearchCV

parameters = {'k': range(1,3)}
score = 'f1_macro'

clf = GridSearchCV(BRkNNaClassifier(), parameters, scoring=score)
clf.fit(X, y)
fit(X, y)

Fit classifier with training data

Internally this method uses a sparse CSC representation for y (scipy.sparse.csc_matrix).

Parameters:
  • X (numpy.ndarray or scipy.sparse) – input features, can be a dense or sparse matrix of size (n_samples, n_features)
  • y (numpy.ndaarray or scipy.sparse {0,1}) – binary indicator matrix with label assignments.
Returns:

fitted instance of self

Return type:

self

get_params(deep=True)

Get parameters to sub-objects

Introspection of classifier for search models like cross-validation and grid search.

Parameters:deep (bool) – if True all params will be introspected also and appended to the output dictionary.
Returns:out – dictionary of all parameters and their values. If deep=True the dictionary also holds the parameters of the parameters.
Return type:dict
predict(X)

Predict labels for X

Parameters:X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape (n_samples, n_features)
Returns:binary indicator matrix with label assignments with shape (n_samples, n_labels)
Return type:scipy.sparse of int
score(X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
  • X (array-like, shape = (n_samples, n_features)) – Test samples.
  • y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True labels for X.
  • sample_weight (array-like, shape = [n_samples], optional) – Sample weights.
Returns:

score – Mean accuracy of self.predict(X) wrt. y.

Return type:

float

set_params(**parameters)

Propagate parameters to sub-objects

Set parameters as returned by get_params. Please see this link.

class skmultilearn.adapt.BRkNNbClassifier(k=10)[source]

Binary Relevance multi-label classifier based on k-Nearest Neighbors method.

This version of the classifier assigns the most popular m labels of the neighbors, where m is the average number of labels assigned to the object’s neighbors.

Parameters:k (int) – number of neighbours
knn_

the nearest neighbors single-label classifier used underneath

Type:an instance of sklearn.NearestNeighbors
neighbors_

k neighbors of each sample

Type:array of arrays of int, shape = (n_samples, k)
confidences_

label assignment confidences

Type:matrix of int, shape = (n_samples, n_labels)

References

If you use this method please cite the relevant paper:

@inproceedings{EleftheriosSpyromitros2008,
   author = {Eleftherios Spyromitros, Grigorios Tsoumakas, Ioannis Vlahavas},
   booktitle = {Proc. 5th Hellenic Conference on Artificial Intelligence (SETN 2008)},
   title = {An Empirical Study of Lazy Multilabel Classification Algorithms},
   year = {2008},
   location = {Syros, Greece}
}

Examples

Here’s a very simple example of using BRkNNbClassifier with a fixed number of neighbors:

from skmultilearn.adapt import BRkNNbClassifier

classifier = BRkNNbClassifier(k=3)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

You can also use GridSearchCV to find an optimal set of parameters:

from skmultilearn.adapt import BRkNNbClassifier
from sklearn.model_selection import GridSearchCV

parameters = {'k': range(1,3)}
score = 'f1-macro

clf = GridSearchCV(BRkNNbClassifier(), parameters, scoring=score)
clf.fit(X, y)
fit(X, y)

Fit classifier with training data

Internally this method uses a sparse CSC representation for y (scipy.sparse.csc_matrix).

Parameters:
  • X (numpy.ndarray or scipy.sparse) – input features, can be a dense or sparse matrix of size (n_samples, n_features)
  • y (numpy.ndaarray or scipy.sparse {0,1}) – binary indicator matrix with label assignments.
Returns:

fitted instance of self

Return type:

self

get_params(deep=True)

Get parameters to sub-objects

Introspection of classifier for search models like cross-validation and grid search.

Parameters:deep (bool) – if True all params will be introspected also and appended to the output dictionary.
Returns:out – dictionary of all parameters and their values. If deep=True the dictionary also holds the parameters of the parameters.
Return type:dict
predict(X)

Predict labels for X

Parameters:X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape (n_samples, n_features)
Returns:binary indicator matrix with label assignments with shape (n_samples, n_labels)
Return type:scipy.sparse of int
score(X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
  • X (array-like, shape = (n_samples, n_features)) – Test samples.
  • y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True labels for X.
  • sample_weight (array-like, shape = [n_samples], optional) – Sample weights.
Returns:

score – Mean accuracy of self.predict(X) wrt. y.

Return type:

float

set_params(**parameters)

Propagate parameters to sub-objects

Set parameters as returned by get_params. Please see this link.