ChemicalOntologyPredictor#
- class chamois.predictor.ChemicalOntologyPredictor(ontology, n_jobs=None, max_iter=100, model='logistic', alpha=1.0, variance=None, seed=0)#
A model for predicting chemical hierarchy from BGC compositions.
- classmethod load(file)#
Load a trained predictor from a JSON file.
- classmethod trained()#
Load the trained predictor embedded in CHAMOIS.
- __init__(ontology, n_jobs=None, max_iter=100, model='logistic', alpha=1.0, variance=None, seed=0)#
Create a new, uninitialized model.
- Parameters:
ontology (
Ontology) – The ontology object corresponding the classes to predict.n_jobs (
intorNone) – The number of jobs to use to train in parallel.max_iter (
int) – The maximum number of iterations to run to converge the linear models.mode (
str) – The model architecture to use, eitherridgefor L2-regularized linear regression,logisticfor L1-regularized logistic regression, ordummyfor dummy predictors using random guessing based on the support of each class.alpha (
float) – The regularization strength (used forlogisticandridgemodels).variance (
floatorNone) – If given, the variance threshold to use to filter the features using the feature selection procedure ofVarianceThreshold.
- checksum(hasher=None)#
Compute a checksum from the values of the learnt parameters.
- fit(X, Y, groups=None)#
Fit the model on the given data.
- Parameters:
X (
AnnData) – The feature matrix, either as a rawnumpy.ndarray, or as a compositional matrix built withchamois.compositions.build_compositions.Y (
AnnData) – The classes matrix, either as a rawnumpy.ndarray, or as a multi-label binary matrix.
- information_content(Y)#
Compute the information content of a prediction.
The information content for an annotation subgraph \(ic(T)\) is defined as the sum of information accretion \(ia(i)\) for every node \(i\) of the subgraph \(T\). Information accretion is computed from partial probabilities extracted from the training set:
\[ia(T) = \sum_{i \in T}{ - log_2(P(i | \mathcal{P}(i))) }\]where \(\mathcal{P}(i)\) is the parent of node \(i\) in the ontology graph.
- Parameters:
Y (
numpy.ndarrayof shape (n_samples, n_classes)) – The array of predicted class labels for which to compute- Returns:
numpy.ndarrayof shape (n_samples,) – The computed information content for each sample prediction.
References
Clark WT, Radivojac P. Information-theoretic evaluation of predicted ontological annotations. Bioinformatics. 2013;29(13):i53-i61. doi:10.1093/bioinformatics/btt228.
- predict(X, propagate=True)#
Predict the classes for the given features.
- Parameters:
X (
AnnData) – The feature matrix, either as a rawnumpy.ndarray, or as a compositional matrix built withchamois.compositions.build_compositions.propagate (
bool) – Whether to ensure consistency of the predicted probabilities with thepropagatemethod.
- predict_probas(X, propagate=True)#
Predict class probabilities for the given features.
- Parameters:
X (
AnnData) – The feature matrix, either as a rawnumpy.ndarray, or as a compositional matrix built withchamois.compositions.build_compositions.propagate (
bool) – Whether to ensure consistency of the predicted probabilities with thepropagatemethod.
- propagate(Y)#
Propagate the probabilities from leaves to nodes.
This method ensures that the probabilities produced for the whole hierarchy are consistent by overriding the probabilities of parent nodes with that of their child class if it is higher.
- save(file)#
Save the trained predictor the a JSON file.