# Multilabel k Nearest Neighbours¶

class skmultilearn.adapt.MLkNN(k=10, s=1.0, ignore_first_neighbours=0)[source]

kNN classification method adapted for multi-label classification

MLkNN builds uses k-NearestNeighbors find nearest examples to a test class and uses Bayesian inference to select assigned labels.

Parameters: k (int) – number of neighbours of each input instance to take into account s (float (default is 1.0)) – the smoothing parameter ignore_first_neighbours (int (default is 0)) – ability to ignore first N neighbours, useful for comparing with other classification software.
knn_

an instance of sklearn.NearestNeighbors – the nearest neighbors single-label classifier used underneath

Note

If you don’t know what ignore_first_neighbours does, the default is safe. Please see this issue.

References

If you use this classifier please cite the original paper introducing the method:

@article{zhang2007ml,
title={ML-KNN: A lazy learning approach to multi-label learning},
author={Zhang, Min-Ling and Zhou, Zhi-Hua},
journal={Pattern recognition},
volume={40},
number={7},
pages={2038--2048},
year={2007},
publisher={Elsevier}
}


Examples

Here’s a very simple example of using MLkNN with a fixed number of neighbors:

from skmultilearn.adapt import MLkNN

classifier = MLkNN(k=3)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)


You can also use GridSearchCV to find an optimal set of parameters:

from skmultilearn.adapt import MLkNN
from sklearn.model_selection import GridSearchCV

parameters = {'k': range(1,3), 's': [0.5, 0.7, 1.0]}
score = 'f1-macro

clf = GridSearchCV(MLkNN(), parameters, scoring=score)
clf.fit(X, y)

print clf.best_params_, clf.best_score_

# output
({'k': 1, 's': 0.5}, 0.78988303374297597)

fit(X, y)[source]

Fit classifier with training data

Parameters: X (numpy.ndarray or scipy.sparse) – input features, can be a dense or sparse matrix of size (n_samples, n_features) y (numpy.ndaarray or scipy.sparse {0,1}) – binary indicator matrix with label assignments. fitted instance of self self
get_params(deep=True)

Get parameters to sub-objects

Introspection of classifier for search models like cross-validation and grid search.

Parameters: deep (bool) – if True all params will be introspected also and appended to the output dictionary. out – dictionary of all parameters and their values. If deep=True the dictionary also holds the parameters of the parameters. dict
predict(X)[source]

Predict labels for X

Parameters: X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape (n_samples, n_features) binary indicator matrix with label assignments with shape (n_samples, n_labels) scipy.sparse matrix of int
predict_proba(X)[source]

Predict probabilities of label assignments for X

Parameters: X (numpy.ndarray or scipy.sparse.csc_matrix) – input features of shape (n_samples, n_features) binary indicator matrix with label assignment probabilities with shape (n_samples, n_labels) scipy.sparse matrix of int
score(X, y, sample_weight=None)

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters: X (array-like, shape = (n_samples, n_features)) – Test samples. y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True labels for X. sample_weight (array-like, shape = [n_samples], optional) – Sample weights. score – Mean accuracy of self.predict(X) wrt. y. float
set_params(**parameters)

Propagate parameters to sub-objects

Set parameters as returned by get_params. Please see this link.