Train (chamois train)#

usage: chamois train [-h] -f FEATURES -c CLASSES
                     [--min-class-occurrences MIN_CLASS_OCCURRENCES]
                     [--min-feature-occurrences MIN_FEATURE_OCCURRENCES]
                     [--min-class-groups MIN_CLASS_GROUPS]
                     [--min-feature-groups MIN_FEATURE_GROUPS]
                     [--min-cluster-length MIN_CLUSTER_LENGTH]
                     [--min-genes MIN_GENES] [--mismatch]
                     [--model {ridge,logistic,dummy,rf}] [--alpha ALPHA]
                     [--variance VARIANCE] -o OUTPUT [--metrics METRICS]

Input#

Mandatory input files required by the command.

-f, --features

The feature table in HDF5 format to use for training the predictor.

-c, --classes

The classes table in HDF5 format to use for training the predictor.

Preprocessing#

Parameters controling data preprocessing, including features and labels filtering.

--min-class-occurrences

The minimum of occurences for a class to be retained.

Default: 0

--min-feature-occurrences

The minimum of occurences for a feature to be retained.

Default: 0

--min-class-groups

The minimum number of groups for a class to be retained.

Default: 5

--min-feature-groups

The minimum number of groups for a feature to be retained.

Default: 5

--min-cluster-length

The nucleotide length threshold for retaining a cluster.

Default: 0

--min-genes

The gene count threshold for retaining a cluster.

Default: 0

--mismatch

Whether to correct mismatching observations.

Default: False

Training#

Hyperparameters to use for training the model.

--model

Possible choices: ridge, logistic, dummy, rf

The kind of model to train.

Default: 'logistic'

--alpha

The strength of the parameters regularization.

Default: 1.0

--variance

The variance threshold for filtering features.

Output#

Mandatory and optional outputs.

-o, --output

The path where to write the trained model in JSON format.

--metrics

The path to an optional metrics file to write in DVC/JSON format.