ChemicalOntologyPredictor#

class chamois.predictor.ChemicalOntologyPredictor(ontology, n_jobs=None, max_iter=100, model='logistic', alpha=1.0, variance=None, seed=0)#

A model for predicting chemical hierarchy from BGC compositions.

classmethod load(file)#

Load a trained predictor from a JSON file.

classmethod trained()#

Load the trained predictor embedded in CHAMOIS.

__init__(ontology, n_jobs=None, max_iter=100, model='logistic', alpha=1.0, variance=None, seed=0)#

Create a new, uninitialized model.

Parameters:
  • ontology (Ontology) – The ontology object corresponding the classes to predict.

  • n_jobs (int or None) – The number of jobs to use to train in parallel.

  • max_iter (int) – The maximum number of iterations to run to converge the linear models.

  • mode (str) – The model architecture to use, either ridge for L2-regularized linear regression, logistic for L1-regularized logistic regression, or dummy for dummy predictors using random guessing based on the support of each class.

  • alpha (float) – The regularization strength (used for logistic and ridge models).

  • variance (float or None) – If given, the variance threshold to use to filter the features using the feature selection procedure of VarianceThreshold.

checksum(hasher=None)#

Compute a checksum from the values of the learnt parameters.

fit(X, Y, groups=None)#

Fit the model on the given data.

Parameters:
information_content(Y)#

Compute the information content of a prediction.

The information content for an annotation subgraph \(ic(T)\) is defined as the sum of information accretion \(ia(i)\) for every node \(i\) of the subgraph \(T\). Information accretion is computed from partial probabilities extracted from the training set:

\[ia(T) = \sum_{i \in T}{ - log_2(P(i | \mathcal{P}(i))) }\]

where \(\mathcal{P}(i)\) is the parent of node \(i\) in the ontology graph.

Parameters:

Y (numpy.ndarray of shape (n_samples, n_classes)) – The array of predicted class labels for which to compute

Returns:

numpy.ndarray of shape (n_samples,) – The computed information content for each sample prediction.

References

Clark WT, Radivojac P. Information-theoretic evaluation of predicted ontological annotations. Bioinformatics. 2013;29(13):i53-i61. doi:10.1093/bioinformatics/btt228.

predict(X, propagate=True)#

Predict the classes for the given features.

Parameters:
predict_probas(X, propagate=True)#

Predict class probabilities for the given features.

Parameters:
propagate(Y)#

Propagate the probabilities from leaves to nodes.

This method ensures that the probabilities produced for the whole hierarchy are consistent by overriding the probabilities of parent nodes with that of their child class if it is higher.

save(file)#

Save the trained predictor the a JSON file.