Contributing#

For bug fixes or new features, please file an issue before submitting a pull request. If the change is not trivial, it may be best to wait for feedback.

Generating data#

The Makefile in the project repository handles the generation of the training and benchmarking datasets. Simply running make will re-generate the mibig-2.0 and mibig-3.1 datasets, including the features.hdf5 and classes.hdf5 needed to train CHAMOIS.

Managing versions#

As it is a good project management practice, we should follow semantic versioning, so remember the following:

  • As long as the model predicts the same thing, retraining/updating the model should be considered a non-breaking change, so you should bump the MINOR version of the program.

  • Upgrading the internal HMMs could potentially change the output but won’t break the program, they should be treated as non-breaking change, so you should bump the MINOR version of the program.

  • If the model changes prediction (e.g. predicted classes change), then you should bump the MAJOR version of the program as it it a breaking change.

  • Changes in the code should be treated following semver the usual way.

  • Changed in the CLI should be treated as changed in the API (e.g. a new CLI option or a new command bumps the MINOR version, removal of an option bumps the MAJOR version).

Upgrading the internal model#

After having trained a new version of the model, simply copy the resulting JSON file to chamois/predictor/predictor.json.

Upgrading the internal HMMs#

To download the subset of Pfam HMMs required by the model, simply run:

$ python setup.py clean download_pfam --inplace