Cross-validation (chamois cv)#

usage: chamois cv [-h] -f FEATURES -c CLASSES
                  [--min-class-occurrences MIN_CLASS_OCCURRENCES]
                  [--min-feature-occurrences MIN_FEATURE_OCCURRENCES]
                  [--min-class-groups MIN_CLASS_GROUPS]
                  [--min-feature-groups MIN_FEATURE_GROUPS]
                  [--min-cluster-length MIN_CLUSTER_LENGTH]
                  [--min-genes MIN_GENES] [--mismatch]
                  [--model {ridge,logistic,dummy,rf}] [--alpha ALPHA]
                  [--variance VARIANCE] [-k KFOLDS]
                  [--sampling {random,group,kennard-stone}] -o OUTPUT
                  [--metrics METRICS] [--report REPORT]
                  [--best-model BEST_MODEL]

Input#

Mandatory input files required by the command.

-f, --features

The feature table in HDF5 format to use for training the predictor.

-c, --classes

The classes table in HDF5 format to use for training the predictor.

Preprocessing#

Parameters controling data preprocessing, including features and labels filtering.

--min-class-occurrences

The minimum of occurences for a class to be retained.

Default: 0

--min-feature-occurrences

The minimum of occurences for a feature to be retained.

Default: 0

--min-class-groups

The minimum number of groups for a class to be retained.

Default: 5

--min-feature-groups

The minimum number of groups for a feature to be retained.

Default: 5

--min-cluster-length

The nucleotide length threshold for retaining a cluster.

Default: 0

--min-genes

The gene count threshold for retaining a cluster.

Default: 0

--mismatch

Whether to correct mismatching observations.

Default: False

Training#

Hyperparameters to use for training the model.

--model

Possible choices: ridge, logistic, dummy, rf

The kind of model to train.

Default: 'logistic'

--alpha

The strength of the parameters regularization.

Default: 1.0

--variance

The variance threshold for filtering features.

Cross-validation#

Parameters controlling the cross-validation.

-k, --kfolds

The number of cross-validation folds to run.

Default: 5

--sampling

Possible choices: random, group, kennard-stone

The algorithm to use for partitioning folds.

Default: 'group'

Output#

Mandatory and optional outputs.

-o, --output

The path where to write the probabilities for each test fold.

--metrics

The path to an optional metrics file to write in DVC/JSON format.

--report

An optional file where to generate a label-wise evaluation report.

--best-model

An optional file where to write the model with highest macro-average-precision.