Cross-validation (`chamois cv`)#

usage: chamois cv [-h] -f FEATURES -c CLASSES
                  [--min-class-occurrences MIN_CLASS_OCCURRENCES]
                  [--min-feature-occurrences MIN_FEATURE_OCCURRENCES]
                  [--min-class-groups MIN_CLASS_GROUPS]
                  [--min-feature-groups MIN_FEATURE_GROUPS]
                  [--min-cluster-length MIN_CLUSTER_LENGTH]
                  [--min-genes MIN_GENES] [--mismatch]
                  [--model {ridge,logistic,dummy,rf}] [--alpha ALPHA]
                  [--variance VARIANCE] [-k KFOLDS]
                  [--sampling {random,group,kennard-stone}] -o OUTPUT
                  [--metrics METRICS] [--report REPORT]
                  [--best-model BEST_MODEL]

Input#

Mandatory input files required by the command.

-f, --features: The feature table in HDF5 format to use for training the predictor.
-c, --classes: The classes table in HDF5 format to use for training the predictor.

Preprocessing#

Parameters controling data preprocessing, including features and labels filtering.

--min-class-occurrences

The minimum of occurences for a class to be retained.

Default: 0

--min-feature-occurrences

The minimum of occurences for a feature to be retained.

Default: 0

--min-class-groups

The minimum number of groups for a class to be retained.

Default: 5

--min-feature-groups

The minimum number of groups for a feature to be retained.

Default: 5

--min-cluster-length

The nucleotide length threshold for retaining a cluster.

Default: 0

--min-genes

The gene count threshold for retaining a cluster.

Default: 0

--mismatch

Whether to correct mismatching observations.

Default: False

Training#

Hyperparameters to use for training the model.

--model

Possible choices: ridge, logistic, dummy, rf

The kind of model to train.

Default: 'logistic'

--alpha

The strength of the parameters regularization.

Default: 1.0

--variance

The variance threshold for filtering features.