Cross-validation (chamois cv)#
usage: chamois cv [-h] -f FEATURES -c CLASSES
[--min-class-occurrences MIN_CLASS_OCCURRENCES]
[--min-feature-occurrences MIN_FEATURE_OCCURRENCES]
[--min-class-groups MIN_CLASS_GROUPS]
[--min-feature-groups MIN_FEATURE_GROUPS]
[--min-cluster-length MIN_CLUSTER_LENGTH]
[--min-genes MIN_GENES] [--mismatch]
[--model {ridge,logistic,dummy,rf}] [--alpha ALPHA]
[--variance VARIANCE] [-k KFOLDS]
[--sampling {kennard-stone,group,random}] -o OUTPUT
[--metrics METRICS] [--report REPORT]
[--best-model BEST_MODEL]
Input#
Mandatory input files required by the command.
- -f, --features
The feature table in HDF5 format to use for training the predictor.
- -c, --classes
The classes table in HDF5 format to use for training the predictor.
Preprocessing#
Parameters controling data preprocessing, including features and labels filtering.
- --min-class-occurrences
The minimum of occurences for a class to be retained.
Default:
0- --min-feature-occurrences
The minimum of occurences for a feature to be retained.
Default:
0- --min-class-groups
The minimum number of groups for a class to be retained.
Default:
5- --min-feature-groups
The minimum number of groups for a feature to be retained.
Default:
5- --min-cluster-length
The nucleotide length threshold for retaining a cluster.
Default:
0- --min-genes
The gene count threshold for retaining a cluster.
Default:
0- --mismatch
Whether to correct mismatching observations.
Default:
False
Training#
Hyperparameters to use for training the model.
- --model
Possible choices: ridge, logistic, dummy, rf
The kind of model to train.
Default:
'logistic'- --alpha
The strength of the parameters regularization.
Default:
1.0- --variance
The variance threshold for filtering features.
Cross-validation#
Parameters controlling the cross-validation.
- -k, --kfolds
The number of cross-validation folds to run.
Default:
5- --sampling
Possible choices: kennard-stone, group, random
The algorithm to use for partitioning folds.
Default:
'group'
Output#
Mandatory and optional outputs.
- -o, --output
The path where to write the probabilities for each test fold.
- --metrics
The path to an optional metrics file to write in DVC/JSON format.
- --report
An optional file where to generate a label-wise evaluation report.
- --best-model
An optional file where to write the model with highest macro-average-precision.