Binary Relevance kNN¶
-
class
skmultilearn.adapt.
BRkNNaClassifier
(k=10)[source]¶ Binary Relevance multi-label classifier based on k-Nearest Neighbors method.
This version of the classifier assigns the labels that are assigned to at least half of the neighbors.
Parameters: k (int) – number of neighbours -
knn_
¶ the nearest neighbors single-label classifier used underneath
Type: an instance of sklearn.NearestNeighbors
-
neighbors_
¶ k neighbors of each sample
Type: array of arrays of int, shape = (n_samples, k)
-
confidences_
¶ label assignment confidences
Type: matrix of int, shape = (n_samples, n_labels)
References
If you use this method please cite the relevant paper:
@inproceedings{EleftheriosSpyromitros2008, author = {Eleftherios Spyromitros, Grigorios Tsoumakas, Ioannis Vlahavas}, booktitle = {Proc. 5th Hellenic Conference on Artificial Intelligence (SETN 2008)}, title = {An Empirical Study of Lazy Multilabel Classification Algorithms}, year = {2008}, location = {Syros, Greece} }
Examples
Here’s a very simple example of using BRkNNaClassifier with a fixed number of neighbors:
from skmultilearn.adapt import BRkNNaClassifier classifier = BRkNNaClassifier(k=3) # train classifier.fit(X_train, y_train) # predict predictions = classifier.predict(X_test)
You can also use
GridSearchCV
to find an optimal set of parameters:from skmultilearn.adapt import BRkNNaClassifier from sklearn.model_selection import GridSearchCV parameters = {'k': range(1,3)} score = 'f1_macro' clf = GridSearchCV(BRkNNaClassifier(), parameters, scoring=score) clf.fit(X, y)
-
fit
(X, y)¶ Fit classifier with training data
Internally this method uses a sparse CSC representation for y (
scipy.sparse.csc_matrix
).Parameters: - X (numpy.ndarray or scipy.sparse) – input features, can be a dense or sparse matrix of size
(n_samples, n_features)
- y (numpy.ndaarray or scipy.sparse {0,1}) – binary indicator matrix with label assignments.
Returns: fitted instance of self
Return type: self
- X (numpy.ndarray or scipy.sparse) – input features, can be a dense or sparse matrix of size
-
get_params
(deep=True)¶ Get parameters to sub-objects
Introspection of classifier for search models like cross-validation and grid search.
Parameters: deep (bool) – if True
all params will be introspected also and appended to the output dictionary.Returns: out – dictionary of all parameters and their values. If deep=True
the dictionary also holds the parameters of the parameters.Return type: dict
-
predict
(X)¶ Predict labels for X
Parameters: X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape (n_samples, n_features)
Returns: binary indicator matrix with label assignments with shape (n_samples, n_labels)
Return type: scipy.sparse of int
-
score
(X, y, sample_weight=None)¶ Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: - X (array-like, shape = (n_samples, n_features)) – Test samples.
- y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True labels for X.
- sample_weight (array-like, shape = [n_samples], optional) – Sample weights.
Returns: score – Mean accuracy of self.predict(X) wrt. y.
Return type:
-
-
class
skmultilearn.adapt.
BRkNNbClassifier
(k=10)[source]¶ Binary Relevance multi-label classifier based on k-Nearest Neighbors method.
This version of the classifier assigns the most popular m labels of the neighbors, where m is the average number of labels assigned to the object’s neighbors.
Parameters: k (int) – number of neighbours -
knn_
¶ the nearest neighbors single-label classifier used underneath
Type: an instance of sklearn.NearestNeighbors
-
neighbors_
¶ k neighbors of each sample
Type: array of arrays of int, shape = (n_samples, k)
-
confidences_
¶ label assignment confidences
Type: matrix of int, shape = (n_samples, n_labels)
References
If you use this method please cite the relevant paper:
@inproceedings{EleftheriosSpyromitros2008, author = {Eleftherios Spyromitros, Grigorios Tsoumakas, Ioannis Vlahavas}, booktitle = {Proc. 5th Hellenic Conference on Artificial Intelligence (SETN 2008)}, title = {An Empirical Study of Lazy Multilabel Classification Algorithms}, year = {2008}, location = {Syros, Greece} }
Examples
Here’s a very simple example of using BRkNNbClassifier with a fixed number of neighbors:
from skmultilearn.adapt import BRkNNbClassifier classifier = BRkNNbClassifier(k=3) # train classifier.fit(X_train, y_train) # predict predictions = classifier.predict(X_test)
You can also use
GridSearchCV
to find an optimal set of parameters:from skmultilearn.adapt import BRkNNbClassifier from sklearn.model_selection import GridSearchCV parameters = {'k': range(1,3)} score = 'f1-macro clf = GridSearchCV(BRkNNbClassifier(), parameters, scoring=score) clf.fit(X, y)
-
fit
(X, y)¶ Fit classifier with training data
Internally this method uses a sparse CSC representation for y (
scipy.sparse.csc_matrix
).Parameters: - X (numpy.ndarray or scipy.sparse) – input features, can be a dense or sparse matrix of size
(n_samples, n_features)
- y (numpy.ndaarray or scipy.sparse {0,1}) – binary indicator matrix with label assignments.
Returns: fitted instance of self
Return type: self
- X (numpy.ndarray or scipy.sparse) – input features, can be a dense or sparse matrix of size
-
get_params
(deep=True)¶ Get parameters to sub-objects
Introspection of classifier for search models like cross-validation and grid search.
Parameters: deep (bool) – if True
all params will be introspected also and appended to the output dictionary.Returns: out – dictionary of all parameters and their values. If deep=True
the dictionary also holds the parameters of the parameters.Return type: dict
-
predict
(X)¶ Predict labels for X
Parameters: X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape (n_samples, n_features)
Returns: binary indicator matrix with label assignments with shape (n_samples, n_labels)
Return type: scipy.sparse of int
-
score
(X, y, sample_weight=None)¶ Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: - X (array-like, shape = (n_samples, n_features)) – Test samples.
- y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True labels for X.
- sample_weight (array-like, shape = [n_samples], optional) – Sample weights.
Returns: score – Mean accuracy of self.predict(X) wrt. y.
Return type:
-