Independent cross-validation (chamois cvi)#

usage: chamois cvi [-h] -f FEATURES -c CLASSES
                   [--min-class-occurrences MIN_CLASS_OCCURRENCES]
                   [--min-feature-occurrences MIN_FEATURE_OCCURRENCES]
                   [--min-class-groups MIN_CLASS_GROUPS]
                   [--min-feature-groups MIN_FEATURE_GROUPS]
                   [--min-cluster-length MIN_CLUSTER_LENGTH]
                   [--min-genes MIN_GENES] [--mismatch]
                   [--model {ridge,logistic,dummy,rf}] [--alpha ALPHA]
                   [--variance VARIANCE] [-k KFOLDS]
                   [--sampling {kennard-stone,group,random}] -o OUTPUT
                   [--metrics METRICS] [--report REPORT]

Input#

Mandatory input files required by the command.

-f, --features

The feature table in HDF5 format to use for training the predictor.

-c, --classes

The classes table in HDF5 format to use for training the predictor.

Preprocessing#

Parameters controling data preprocessing, including features and labels filtering.

--min-class-occurrences

The minimum of occurences for a class to be retained.

Default: 0

--min-feature-occurrences

The minimum of occurences for a feature to be retained.

Default: 0

--min-class-groups

The minimum number of groups for a class to be retained.

Default: 5

--min-feature-groups

The minimum number of groups for a feature to be retained.

Default: 5

--min-cluster-length

The nucleotide length threshold for retaining a cluster.

Default: 0

--min-genes

The gene count threshold for retaining a cluster.

Default: 0

--mismatch

Whether to correct mismatching observations.

Default: False

Training#

Hyperparameters to use for training the model.

--model

Possible choices: ridge, logistic, dummy, rf

The kind of model to train.

Default: 'logistic'

--alpha

The strength of the parameters regularization.

Default: 1.0

--variance

The variance threshold for filtering features.

Cross-validation#

Parameters controlling the cross-validation.

-k, --kfolds

The number of cross-validation folds to run.

Default: 5

--sampling

Possible choices: kennard-stone, group, random

The algorithm to use for partitioning folds.

Default: 'group'

Output#

Mandatory and optional outputs.

-o, --output

The path where to write the probabilities for each test fold.

--metrics

The path to an optional metrics file to write in DVC/JSON format.

--report

An optional file where to generate a label-wise evaluation report.