skmultilearn package

Machine learning module for Python

skmultilearn is a Python module integrating classical machine learning algorithms in the tightly-knit world of scientific Python packages (numpy, scipy, matplotlib). It aims to provide simple and efficient solutions to learning problems that are accessible to everybody and reusable in various contexts: machine-learning as a versatile tool for science and engineering. See http://scikit-learn.org for complete documentation.

Submodules

skmultilearn.dataset module

skmultilearn.dataset.available_data_sets()[source]

Lists available data sets and their variants

Returns:list of available data sets and their variants
Rtype dict[set_name] with list of variants:
 
skmultilearn.dataset.clear_data_home(data_home=None)[source]

Delete all the content of the data home cache.

Parameters:string or None (data_home) – the path to the directory in which scikit-multilearn data sets should be stored
skmultilearn.dataset.download_dataset(set_name, variant)[source]

Downloads a data set

Parameters:
Returns:

path to the downloaded data set file

skmultilearn.dataset.get_data_home(data_home=None)[source]

Return the path of the scikit-multilearn data dir.

This folder is used by some large dataset loaders to avoid downloading the data several times.

By default the data dir is set to a folder named ‘scikit_ml_learn_data’ in the user home folder.

Alternatively, it can be set by the ‘SCIKIT_ML_LEARN_DATA’ environment variable or programmatically by giving an explicit folder path. The ‘~’ symbol is expanded to the user home folder.

If the folder does not already exist, it is automatically created.

Parameters:string or None (data_home) – the path to the directory in which scikit-multilearn data sets should be stored, if None the path is generated as stated above
Returns:the path to the data home
Return type:string
skmultilearn.dataset.get_dataset_list()[source]

Loads data set list

The format of the list is a follows:

  • each row corresponds to a variant of a data set
  • variants include: train, test and undivided, note that sometimes data
    sets are not provided in train, test division by their authors
  • in each row column 0 is the md5, column 1 is the file name available
    under get_download_base_url()
skmultilearn.dataset.get_download_base_url()[source]

Returns base URL for data sets.

skmultilearn.dataset.load_dataset(set_name, variant)[source]

Loads a selected variant of the given data set

Parameters:
Returns:

the loaded multilabel data set variant in the scikit-multilearn format, see data_sets

skmultilearn.dataset.load_dataset_dump(filename)[source]

Loads a compressed data set dump

Parameters:

filename : string

Path to dump file, if without .bz2, the .bz2 extension will be appended.

Returns:

data: dictionary {‘X’: array-like of array-likes, ‘y’: array-like of binary label vectors }

The dictionary containing the data frame, with ‘X’ key storing the input space array-like of input feature vectors and ‘y’ storing labels assigned to each input vector, as a binary indicator vector (i.e. if 5th position has value 1 then the input vector has label no. 5)

skmultilearn.dataset.load_from_arff(filename, labelcount, endian='big', input_feature_type='float', encode_nominal=True, load_sparse=False, return_attribute_definitions=False)[source]

Method for loading ARFF files as numpy array

Parameters:

filename : string

Path to ARFF file

labelcount: integer

Number of labels in the ARFF file

endian: string{“big”, “little”}

Whether the ARFF file contains labels at the beginning of the attributes list (“big” endianness, MEKA format) or at the end (“little” endianness, MULAN format)

input_feature_type: numpy.type as string

The desire type of the contents of the return ‘X’ array-likes, default ‘i8’, should be a numpy type, see http://docs.scipy.org/doc/numpy/user/basics.types.html

encode_nominal: boolean

Whether convert categorical data into numeric factors - required for some scikit classifiers that can’t handle non-numeric input featuers.

load_sparse: boolean

Whether to read arff file as a sparse file format, liac-arff breaks if sparse reading is enabled for non-sparse ARFFs.

Returns:

X: scipy sparse matrix with input_feature_type elements,

y: scipy sparse matrix of binary label indicator matrix

skmultilearn.dataset.save_dataset_dump(filename, input_space, labels, feature_names, label_names)[source]

Saves a compressed data set dump

Parameters:

filename : string

Path to dump file, if without .bz2, the .bz2 extension will be appended.

input_space: array-like of array-likes

Input space array-like of input feature vectors

labels: array-like of binary label vectors

Array-like of labels assigned to each input vector, as a binary indicator vector (i.e. if 5th position has value 1 then the input vector has label no. 5)

feature_names: array-like

optional, names of features

label_names: array-like

optional, names of labels

skmultilearn.dataset.save_to_arff(X, y, endian='little', save_sparse=True)[source]

Method for dumping data to ARFF files

Parameters:

filename : string

Path to ARFF file

labelcount: integer

Number of labels in the ARFF file

endian: string{“big”, “little”}

Whether the ARFF file contains labels at the beginning of the attributes list (“big” endianness, MEKA format) or at the end (“little” endianness, MULAN format)

save_sparse: boolean

Whether to read arff file as a sparse file format, liac-arff breaks if sparse reading is enabled for non-sparse ARFFs.

Returns:

string: the ARFF dump string

skmultilearn.repeat_classifier module

class skmultilearn.repeat_classifier.RepeatClassifier[source]

Bases: skmultilearn.base.base.MLClassifierBase

Simple classifier for handling cases where

fit(X, y)[source]
predict(X)[source]

skmultilearn.utils module

skmultilearn.utils.get_matrix_in_format(original_matrix, matrix_format)[source]
skmultilearn.utils.matrix_creation_function_for_format(sparse_format)[source]