Multilabel k Nearest Neighbours¶
-
class
skmultilearn.adapt.
MLkNN
(k=10, s=1.0, ignore_first_neighbours=0)[source]¶ kNN classification method adapted for multi-label classification
MLkNN builds uses k-NearestNeighbors find nearest examples to a test class and uses Bayesian inference to select assigned labels.
Parameters: -
knn_
¶ the nearest neighbors single-label classifier used underneath
Type: an instance of sklearn.NearestNeighbors
Note
If you don’t know what
ignore_first_neighbours
does, the default is safe. Please see this issue.References
If you use this classifier please cite the original paper introducing the method:
@article{zhang2007ml, title={ML-KNN: A lazy learning approach to multi-label learning}, author={Zhang, Min-Ling and Zhou, Zhi-Hua}, journal={Pattern recognition}, volume={40}, number={7}, pages={2038--2048}, year={2007}, publisher={Elsevier} }
Examples
Here’s a very simple example of using MLkNN with a fixed number of neighbors:
from skmultilearn.adapt import MLkNN classifier = MLkNN(k=3) # train classifier.fit(X_train, y_train) # predict predictions = classifier.predict(X_test)
You can also use
GridSearchCV
to find an optimal set of parameters:from skmultilearn.adapt import MLkNN from sklearn.model_selection import GridSearchCV parameters = {'k': range(1,3), 's': [0.5, 0.7, 1.0]} score = 'f1_macro' clf = GridSearchCV(MLkNN(), parameters, scoring=score) clf.fit(X, y) print (clf.best_params_, clf.best_score_) # output ({'k': 1, 's': 0.5}, 0.78988303374297597)
-
fit
(X, y)[source]¶ Fit classifier with training data
Parameters: - X (numpy.ndarray or scipy.sparse) – input features, can be a dense or sparse matrix of size
(n_samples, n_features)
- y (numpy.ndaarray or scipy.sparse {0,1}) – binary indicator matrix with label assignments.
Returns: fitted instance of self
Return type: self
- X (numpy.ndarray or scipy.sparse) – input features, can be a dense or sparse matrix of size
-
get_params
(deep=True)¶ Get parameters to sub-objects
Introspection of classifier for search models like cross-validation and grid search.
Parameters: deep (bool) – if True
all params will be introspected also and appended to the output dictionary.Returns: out – dictionary of all parameters and their values. If deep=True
the dictionary also holds the parameters of the parameters.Return type: dict
-
predict
(X)[source]¶ Predict labels for X
Parameters: X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape (n_samples, n_features)
Returns: binary indicator matrix with label assignments with shape (n_samples, n_labels)
Return type: scipy.sparse matrix of int
-
predict_proba
(X)[source]¶ Predict probabilities of label assignments for X
Parameters: X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape (n_samples, n_features)
Returns: binary indicator matrix with label assignment probabilities with shape (n_samples, n_labels)
Return type: scipy.sparse matrix of int
-
score
(X, y, sample_weight=None)¶ Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: - X (array-like, shape = (n_samples, n_features)) – Test samples.
- y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True labels for X.
- sample_weight (array-like, shape = [n_samples], optional) – Sample weights.
Returns: score – Mean accuracy of self.predict(X) wrt. y.
Return type:
-