Basic functionalities#

The easiest way to get a prediction with CHAMOIS is to run the chamois predict command with a query BGC given as a GenBank record. For now, let’s use BGC0000703, the MIBiG BGC producing kanamycin in Streptomyces kanamyceticus. The record was pre-downloaded from MIBiG in GenBank format.

Note

This notebook calls the CHAMOIS CLI with the chamois.cli.run function. This is equivalent to calling the chamois command line in your shell, it’s only done here to integrate with the documentation generator. For instance, calling:

chamois.cli.run(["predict"])

is equivalent to running

$ chamois predict

in the console.

[1]:
import chamois.cli
chamois.__version__
[1]:
'0.2.0'

Running predictions#

Use the chamois predict command to run ChemOnt class predictions with CHAMOIS:

[2]:
# $ chamois predict -i data/BGC0000703.4.gbk -o data/BGC0000703.4.hdf5
chamois.cli.run(["predict", "-i", "data/BGC0000703.4.gbk", "-o", "data/BGC0000703.4.hdf5"])
     Loading embedded model
     Loading BGCs from 'data/BGC0000703.4.gbk'
     Warning install "ipywidgets" for Jupyter support
      Loaded 1 BGCs from 'data/BGC0000703.4.gbk'
     Finding genes with Pyrodigal
       Found 30 proteins in 1 clusters
   Searching protein domains with HMMER
       Found 60 domains under inclusion threshold in 30 proteins
  Predicting chemical class probabilities
   Computing information content of predictions
      Saving result probabilities to 'data/BGC0000703.4.hdf5'
[2]:
0

The resulting HDF5 file can be opened with the anndata package for further analysis:

[3]:
import anndata
data = anndata.read_h5ad("data/BGC0000703.4.hdf5")
data
[3]:
AnnData object with n_obs × n_vars = 1 × 539
    obs: 'source', 'length', 'genes', 'ic'
    var: 'name', 'description', 'n_positives', 'information_accretion'
    uns: 'chamois'

The observations (data.obs) store the metadata about the query BGCs:

[4]:
data.obs
[4]:
source length genes ic
BGC0000703 data/BGC0000703.4.gbk 33678 30 37.269656

The variables (data.var) store the metadata about the chemical classes predicted by the CHAMOIS predictor.

[5]:
data.var
[5]:
name description n_positives information_accretion
CHEMONTID:0000002 Organoheterocyclic compounds Compounds containing a ring with least one car... 1325 0.000000
CHEMONTID:0000004 Organosulfur compounds Organic compounds containing a carbon-sulfur b... 170 0.000000
CHEMONTID:0000007 Amidines Derivatives of oxoacids RnE(=O)OH in which the... 35 5.197146
CHEMONTID:0000011 Carbohydrates and carbohydrate conjugates Monosaccharides, disaccharides, oligosaccharid... 341 2.203840
CHEMONTID:0000012 Lipids and lipid-like molecules Fatty acids and their derivatives, and substan... 679 0.000000
... ... ... ... ...
CHEMONTID:0004808 Alpha-haloketones Organic compounds contaning a halogen atom att... 7 6.056831
CHEMONTID:0004809 Alpha-chloroketones Organic compounds contaning a chlorine atom at... 7 -0.000000
CHEMONTID:0004817 2-heteroaryl carboxamides Compounds containing a heteroaromatic ring tha... 53 3.881258
CHEMONTID:0004830 Dipeptides Organic compounds containing a sequence of exa... 62 2.403356
CHEMONTID:0004831 Oligopeptides Organic compounds containing a sequence of bet... 102 1.685127

539 rows × 4 columns

Visualizing results#

The resulting file is a HDF5 format file contains the class probabilities for each of the records in the input GenBank file. The CLI can be used to quickly inspect the predicted classes:

[6]:
# $ chamois render -i data/BGC0000703.4.hdf5
chamois.cli.run(["render", "-i", "data/BGC0000703.4.hdf5"])
     Loading embedded model
     Loading probability predictions from 'data/BGC0000703.4.hdf5'
╭─────────────────────────────────── BGC0000703 ────────────────────────────────────╮
│ CHEMONTID:0000002 (Organoheterocyclic compounds): 0.924                           │
│ ├── CHEMONTID:0002012 (Oxanes): 0.924                                             │
│ └── CHEMONTID:0004140 (Oxacyclic compounds): 0.908                                │
│ CHEMONTID:0004150 (Hydrocarbon derivatives): 0.997                                │
│ CHEMONTID:0004557 (Organopnictogen compounds): 0.709                              │
│ CHEMONTID:0004603 (Organic oxygen compounds): 0.999                               │
│ └── CHEMONTID:0000323 (Organooxygen compounds): 0.999                             │
│     ├── CHEMONTID:0000011 (Carbohydrates and carbohydrate conjugates): 0.918      │
│     │   ├── CHEMONTID:0001540 (Monosaccharides): 0.814                            │
│     │   ├── CHEMONTID:0002105 (Glycosyl compounds): 0.744                         │
│     │   │   └── CHEMONTID:0002207 (O-glycosyl compounds): 0.744                   │
│     │   └── CHEMONTID:0003305 (Aminosaccharides): 0.918                           │
│     │       └── CHEMONTID:0000282 (Aminoglycosides): 0.918                        │
│     │           └── CHEMONTID:0001675 (Aminocyclitol glycosides): 0.918           │
│     ├── CHEMONTID:0000129 (Alcohols and polyols): 0.990                           │
│     │   ├── CHEMONTID:0001292 (Cyclic alcohols and derivatives): 0.953            │
│     │   │   └── CHEMONTID:0002509 (Cyclitols and derivatives): 0.891              │
│     │   │       └── CHEMONTID:0002510 (Aminocyclitols and derivatives): 0.876     │
│     │   ├── CHEMONTID:0001661 (Secondary alcohols): 0.969                         │
│     │   │   └── CHEMONTID:0002647 (Cyclohexanols): 0.906                          │
│     │   ├── CHEMONTID:0001670 (Tertiary alcohols): 0.516                          │
│     │   └── CHEMONTID:0002286 (Polyols): 0.721                                    │
│     └── CHEMONTID:0000254 (Ethers): 0.634                                         │
│         └── CHEMONTID:0001656 (Acetals): 0.634                                    │
│ CHEMONTID:0004707 (Organic nitrogen compounds): 0.992                             │
│ └── CHEMONTID:0000278 (Organonitrogen compounds): 0.992                           │
│     ├── CHEMONTID:0002449 (Amines): 0.971                                         │
│     │   ├── CHEMONTID:0002450 (Primary amines): 0.971                             │
│     │   │   └── CHEMONTID:0000469 (Monoalkylamines): 0.971                        │
│     │   └── CHEMONTID:0002460 (Alkanolamines): 0.745                              │
│     │       └── CHEMONTID:0001897 (1,2-aminoalcohols): 0.654                      │
│     └── CHEMONTID:0002674 (Cyclohexylamines): 0.876                               │
╰───────────────────────────────────────────────────────────────────────────────────╯
[6]:
0

Screening predictions#

Once predictions have been made, they can be screened with a particular query metabolite to see which BGC is the most likely to predict that metabolite. Let’s try with the kanamycin as a sanity check. Molecules can be passed to chamois compare as either SMILES, InChi, or InChi Key.

Info

Passing a SMILES or an InChi requires the additional Python dependency rdkit to handle conversion to InChi Key.

[7]:
# $ chamois compare -i data/BGC0000703.4.hdf5 -q SBUJHOSQTJFQJX-NOAMYHISSA-N --render
chamois.cli.run(["compare", "-i", "data/BGC0000703.4.hdf5", "-q", 'SBUJHOSQTJFQJX-NOAMYHISSA-N', "--render" ])
     Loading embedded model
     Loading probability predictions from 'data/BGC0000703.4.hdf5'
  Retrieving 1 ClassyFire results for SBUJHOSQTJFQJX-NOAMYHISSA-N
   Computing distances to predictions
╭───────────── SBUJHOSQTJFQJX-NOAMYHISSA-N ─────────────╮╭─────── BGC0000703 (Jaccard=0.94 Distance=0.40) ────────╮
│ CHEMONTID:0000002 (Organoheterocyclic compounds): 1.… ││ CHEMONTID:0000002 (Organoheterocyclic compounds): 0.9… │
│ ├── CHEMONTID:0002012 (Oxanes): 1.000                 ││ ├── CHEMONTID:0002012 (Oxanes): 0.924                  │
│ └── CHEMONTID:0004140 (Oxacyclic compounds): 1.000    ││ └── CHEMONTID:0004140 (Oxacyclic compounds): 0.908     │
│ CHEMONTID:0004150 (Hydrocarbon derivatives): 1.000    ││ CHEMONTID:0004150 (Hydrocarbon derivatives): 0.997     │
│ CHEMONTID:0004557 (Organopnictogen compounds): 1.000  ││ CHEMONTID:0004557 (Organopnictogen compounds): 0.709   │
│ CHEMONTID:0004603 (Organic oxygen compounds): 1.000   ││ CHEMONTID:0004603 (Organic oxygen compounds): 0.999    │
│ └── CHEMONTID:0000323 (Organooxygen compounds): 1.000 ││ └── CHEMONTID:0000323 (Organooxygen compounds): 0.999  │
│     ├── CHEMONTID:0000011 (Carbohydrates and carbohy… ││     ├── CHEMONTID:0000011 (Carbohydrates and carbohyd… │
│     │   ├── CHEMONTID:0001540 (Monosaccharides): 1.0… ││     │   ├── CHEMONTID:0001540 (Monosaccharides): 0.814 │
│     │   ├── CHEMONTID:0002105 (Glycosyl compounds):  ││     │   ├── CHEMONTID:0002105 (Glycosyl compounds): 0… │
│     │   │   └── CHEMONTID:0002207 (O-glycosyl compou… ││     │   │   └── CHEMONTID:0002207 (O-glycosyl compoun… │
│     │   └── CHEMONTID:0003305 (Aminosaccharides): 1.… ││     │   └── CHEMONTID:0003305 (Aminosaccharides): 0.9… │
│     │       └── CHEMONTID:0000282 (Aminoglycosides):… ││     │       └── CHEMONTID:0000282 (Aminoglycosides):  │
│     │           └── CHEMONTID:0001675 (Aminocyclitol… ││     │           └── CHEMONTID:0001675 (Aminocyclitol … │
│     ├── CHEMONTID:0000129 (Alcohols and polyols): 1.… ││     ├── CHEMONTID:0000129 (Alcohols and polyols): 0.9… │
│     │   ├── CHEMONTID:0000286 (Primary alcohols): 1.… ││     │   ├── CHEMONTID:0001292 (Cyclic alcohols and de… │
│     │   ├── CHEMONTID:0001292 (Cyclic alcohols and d… ││     │   │   └── CHEMONTID:0002509 (Cyclitols and deri… │
│     │   │   └── CHEMONTID:0002509 (Cyclitols and der… ││     │   │       └── CHEMONTID:0002510 (Aminocyclitols… │
│     │   │       └── CHEMONTID:0002510 (Aminocyclitol… ││     │   ├── CHEMONTID:0001661 (Secondary alcohols): 0… │
│     │   ├── CHEMONTID:0001661 (Secondary alcohols):  ││     │   │   └── CHEMONTID:0002647 (Cyclohexanols): 0.… │
│     │   │   └── CHEMONTID:0002647 (Cyclohexanols): 1… ││     │   ├── CHEMONTID:0001670 (Tertiary alcohols): 0.… │
│     │   └── CHEMONTID:0002286 (Polyols): 1.000        ││     │   └── CHEMONTID:0002286 (Polyols): 0.721         │
│     └── CHEMONTID:0000254 (Ethers): 1.000             ││     └── CHEMONTID:0000254 (Ethers): 0.634              │
│         └── CHEMONTID:0001656 (Acetals): 1.000        ││         └── CHEMONTID:0001656 (Acetals): 0.634         │
│ CHEMONTID:0004707 (Organic nitrogen compounds): 1.000 ││ CHEMONTID:0004707 (Organic nitrogen compounds): 0.992  │
│ └── CHEMONTID:0000278 (Organonitrogen compounds): 1.… ││ └── CHEMONTID:0000278 (Organonitrogen compounds): 0.9… │
│     ├── CHEMONTID:0002449 (Amines): 1.000             ││     ├── CHEMONTID:0002449 (Amines): 0.971              │
│     │   ├── CHEMONTID:0002450 (Primary amines): 1.000 ││     │   ├── CHEMONTID:0002450 (Primary amines): 0.971  │
│     │   │   └── CHEMONTID:0000469 (Monoalkylamines):… ││     │   │   └── CHEMONTID:0000469 (Monoalkylamines):  │
│     │   └── CHEMONTID:0002460 (Alkanolamines): 1.000  ││     │   └── CHEMONTID:0002460 (Alkanolamines): 0.745   │
│     │       └── CHEMONTID:0001897 (1,2-aminoalcohols… ││     │       └── CHEMONTID:0001897 (1,2-aminoalcohols)… │
│     └── CHEMONTID:0002674 (Cyclohexylamines): 1.000   ││     └── CHEMONTID:0002674 (Cyclohexylamines): 0.876    │
╰───────────────────────────────────────────────────────╯╰────────────────────────────────────────────────────────╯
[7]:
0

Searching a catalog#

Warning

This feature is experimental and has not been properly evaluated. Use with caution.

The predictions can be used to search a catalog of compounds encoded as a classes.hdf5 file, similar to what CHAMOIS uses for training. For instance, we can search which compound of MIBiG 3.1 is most similar to our prediction; hopefully we should get BGC0000703 among the top hits:

[8]:
# $ chamois search -i data/BGC0000703.4.hdf5 --catalog ../../data/datasets/mibig3.1/classes.hdf5 --render
chamois.cli.run(["search", "-i", "data/BGC0000703.4.hdf5", "--catalog", "../../data/datasets/mibig3.1/classes.hdf5", "--render"])
     Loading embedded model
     Loading probability predictions from 'data/BGC0000703.4.hdf5'
     Loading compound catalog from '../../data/datasets/mibig3.1/classes.hdf5'
   Computing pairwise distances and ranks
┏━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃ BGC         Index       Compound   Distance ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│ BGC0000703 │ BGC0000703 │ kanamycin0.40485  │
│            │ BGC0000702 │ kanamycin0.40485  │
│            │ BGC0000704 │ kanamycin0.40485  │
│            │ BGC0000706 │ kanamycin0.40485  │
│            │ BGC0000705 │ kanamycin0.40485  │
└────────────┴────────────┴───────────┴──────────┘
[8]:
0

Interpreting a prediction#

The chamois explain command allows obtaining additional information about a prediction made by CHAMOIS. It must be passed the original sequences of the BGCs, will re-annotate the genes, and will inspect the model weights to break down the prediction made by CHAMOIS into individual contributions from each genes, making it easier to understand the functions of the individual genes of the BGC. We call the chamois explain command with the --cds argument to ensure that the gene coordinates and identifiers are those already defined in the GenBank record:

[9]:
# $ chamois explain --cds -i data/BGC0000703.4.gbk -o data/BGC0000703.4.tsv
chamois.cli.run(["explain", "cluster", "--cds", "-i", "data/BGC0000703.4.gbk", "-o", "data/BGC0000703.4.tsv"])
     Loading embedded model
     Loading BGCs from 'data/BGC0000703.4.gbk'
     Warning install "ipywidgets" for Jupyter support
      Loaded 1 BGCs from 'data/BGC0000703.4.gbk'
  Extracting genes from CDS features
       Found 29 proteins in 1 clusters
   Searching protein domains with HMMER
       Found 60 domains under inclusion threshold in 29 proteins
  Predicting chemical class probabilities
       Build gene contribution table
[9]:
0

The output is a table that shows the contribution of the genes of the BGC to each of the predicted classes. It can be easily loaded with pandas:

[10]:
import pandas
table = pandas.read_table("data/BGC0000703.4.tsv")
table
[10]:
class name probability CP970_06595 CP970_06600 CP970_06605 CP970_06610 CP970_06615 CP970_06620 CP970_06625 ... CP970_06690 CP970_06695 CP970_06700 CP970_06705 CP970_06710 CP970_06715 CP970_06720 CP970_06725 CP970_06730 CP970_06735
0 CHEMONTID:0000002 Organoheterocyclic compounds 0.923556 0.000000 0.877091 -0.112079 0.000000 0.0 -0.206155 0.000000 ... 0.095444 0.130619 -0.042749 0.0 0.415219 0.398794 0.0 0.0 -1.146724 0.427507
1 CHEMONTID:0000011 Carbohydrates and carbohydrate conjugates 0.917567 -0.020241 -0.666054 -0.187588 0.000000 0.0 0.343486 0.000000 ... 0.000000 0.000000 0.118805 0.0 1.548473 0.225796 0.0 0.0 -0.360273 -0.095254
2 CHEMONTID:0000129 Alcohols and polyols 0.989591 0.012490 0.205097 0.200143 0.000000 0.0 0.175132 0.000000 ... -0.335728 0.000000 0.126342 0.0 0.751500 0.811593 0.0 0.0 0.509014 0.000000
3 CHEMONTID:0000254 Ethers 0.633811 -1.316435 0.610610 0.000000 0.000000 0.0 -0.305532 0.032231 ... -0.077757 0.000000 0.506525 0.0 1.788580 -0.423346 0.0 0.0 -0.365802 0.125469
4 CHEMONTID:0000278 Organonitrogen compounds 0.991915 0.402224 0.158624 0.023002 0.000000 0.0 0.357750 0.000000 ... 2.924045 0.000000 -0.656322 0.0 0.000000 0.000000 0.0 0.0 -0.289429 0.000000
5 CHEMONTID:0000282 Aminoglycosides 0.917567 0.000000 -0.954235 0.405878 0.000000 0.0 0.000000 0.000000 ... 2.614419 0.000000 0.528636 0.0 1.601960 0.309899 0.0 0.0 0.000000 -0.399308
6 CHEMONTID:0000323 Organooxygen compounds 0.999130 0.000000 1.600313 0.000000 0.000000 0.0 0.000000 0.000000 ... 0.000000 0.000000 0.574179 0.0 0.000000 1.336391 0.0 0.0 0.000000 0.400177
7 CHEMONTID:0000469 Monoalkylamines 0.971106 0.000000 0.251989 0.000000 0.000000 0.0 0.345969 0.000000 ... 1.820259 0.257724 -0.006440 0.0 -0.159208 0.480359 0.0 0.0 3.019452 -0.400878
8 CHEMONTID:0001292 Cyclic alcohols and derivatives 0.953354 0.000000 0.391831 0.000000 0.000000 0.0 -0.263514 0.000000 ... 0.000000 0.000000 1.055596 0.0 0.730062 0.634123 0.0 0.0 0.000000 0.000000
9 CHEMONTID:0001540 Monosaccharides 0.814167 0.964565 -0.310512 0.000000 0.000000 0.0 0.296987 0.000000 ... -0.105943 0.000000 0.000000 0.0 0.783637 0.032636 0.0 0.0 0.361088 0.000000
10 CHEMONTID:0001656 Acetals 0.633811 0.000000 -0.176495 0.300208 0.000000 0.0 -0.776526 0.000000 ... -0.128851 0.000000 1.226010 0.0 2.688506 0.004065 0.0 0.0 -0.534600 -0.278178
11 CHEMONTID:0001661 Secondary alcohols 0.969029 0.504000 0.590214 -0.358689 0.000000 0.0 0.000000 0.000000 ... -0.145315 0.161952 0.299064 0.0 1.714529 0.464526 0.0 0.0 -0.017224 -0.354166
12 CHEMONTID:0001670 Tertiary alcohols 0.516207 0.110185 0.536606 0.585148 0.000000 0.0 -0.033495 0.000000 ... 0.210159 0.263230 0.047262 0.0 0.000000 0.000000 0.0 0.0 1.445926 0.000000
13 CHEMONTID:0001675 Aminocyclitol glycosides 0.917567 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 ... 0.000000 0.000000 0.082557 0.0 2.953673 0.000000 0.0 0.0 0.000000 0.000000
14 CHEMONTID:0001897 1,2-aminoalcohols 0.654238 0.000000 -1.112867 0.000000 0.000000 0.0 0.000000 0.000000 ... 2.762594 0.000000 0.634128 0.0 0.591480 0.000000 0.0 0.0 1.675683 -0.469000
15 CHEMONTID:0002012 Oxanes 0.923556 -0.814555 -0.340204 0.000000 0.000000 0.0 -0.111479 0.000000 ... 0.768396 0.136129 0.592756 0.0 2.134330 -0.119041 0.0 0.0 1.654629 0.408684
16 CHEMONTID:0002105 Glycosyl compounds 0.743948 0.000000 -0.250118 0.000000 0.000000 0.0 0.000000 0.623561 ... -0.293854 -0.536409 0.738301 0.0 1.316402 0.028088 0.0 0.0 -1.606514 -0.024158
17 CHEMONTID:0002207 O-glycosyl compounds 0.743948 0.000000 -0.513293 0.000000 0.000000 0.0 0.000000 1.390645 ... -0.398967 -0.239261 0.720448 0.0 1.951098 0.261111 0.0 0.0 -1.034882 -0.063126
18 CHEMONTID:0002286 Polyols 0.721185 -0.764770 -0.285684 0.579471 0.000000 0.0 0.000000 0.000000 ... 0.000000 0.418039 0.560816 0.0 -0.309156 0.219549 0.0 0.0 -0.720628 0.492252
19 CHEMONTID:0002449 Amines 0.971106 0.000000 -0.371490 0.000000 0.000000 0.0 -0.018467 0.000000 ... 1.814750 0.000000 -0.440470 0.0 -0.219234 0.402016 0.0 0.0 1.993030 -0.110465
20 CHEMONTID:0002450 Primary amines 0.971106 0.000000 0.390477 0.000000 0.000000 0.0 0.164720 0.000000 ... 1.526317 0.000000 -0.144935 0.0 -0.624989 0.379272 0.0 0.0 2.951594 -0.321949
21 CHEMONTID:0002460 Alkanolamines 0.744554 0.000000 -0.226581 0.000000 0.000000 0.0 -0.057931 0.000000 ... 2.573815 0.000000 0.603845 0.0 0.520413 0.000000 0.0 0.0 2.175561 -0.357317
22 CHEMONTID:0002509 Cyclitols and derivatives 0.890671 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 ... 0.000000 0.000000 0.933805 0.0 1.868207 0.307503 0.0 0.0 0.000000 0.000000
23 CHEMONTID:0002510 Aminocyclitols and derivatives 0.876259 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.0 2.775093 0.000000 0.0 0.0 0.000000 0.000000
24 CHEMONTID:0002647 Cyclohexanols 0.905791 0.000000 0.000000 -0.203591 0.407918 0.0 0.000000 0.000000 ... 0.096246 0.000000 1.256786 0.0 1.880633 0.000000 0.0 0.0 0.000000 0.000000
25 CHEMONTID:0002674 Cyclohexylamines 0.876259 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.0 2.775093 0.000000 0.0 0.0 0.000000 0.000000
26 CHEMONTID:0003305 Aminosaccharides 0.917567 0.000000 -1.313901 0.000000 0.000000 0.0 0.000000 0.000000 ... 2.198156 0.000000 0.745322 0.0 1.692293 0.332242 0.0 0.0 0.023669 -0.285864
27 CHEMONTID:0004140 Oxacyclic compounds 0.907943 -0.060771 -0.075646 0.000000 0.000000 0.0 -0.027500 0.000000 ... 0.034954 0.000000 0.335201 0.0 1.168670 -0.142313 0.0 0.0 0.000000 0.364112
28 CHEMONTID:0004150 Hydrocarbon derivatives 0.997134 0.000000 0.559085 0.000000 0.000000 0.0 0.000000 0.000000 ... 0.000000 0.000000 0.126806 0.0 0.000000 0.260375 0.0 0.0 0.000000 0.000000
29 CHEMONTID:0004557 Organopnictogen compounds 0.708598 -0.316583 -0.193551 0.000000 0.000000 0.0 -0.866210 0.000000 ... 1.461274 0.006655 -0.305747 0.0 0.000000 0.209187 0.0 0.0 -0.915715 -0.019364
30 CHEMONTID:0004603 Organic oxygen compounds 0.999130 0.000000 1.294743 0.000000 0.000000 0.0 0.000000 0.000000 ... 0.000000 0.000000 0.625820 0.0 0.000000 1.106881 0.0 0.0 0.000000 0.190451
31 CHEMONTID:0004707 Organic nitrogen compounds 0.991915 0.402224 0.158624 0.023002 0.000000 0.0 0.357750 0.000000 ... 2.924045 0.000000 -0.656322 0.0 0.000000 0.000000 0.0 0.0 -0.289429 0.000000

32 rows × 32 columns

For instance, to see which genes contribute significantly to the prediction of the BGC compound to CHEMONTID:0000282 (Aminoglycosides), we can extract the relevant row from the table and filter for genes with weight greater than 2.0:

[11]:
w = table.set_index("class").loc["CHEMONTID:0000282"].drop(["name", "probability"])
w[w >= 2]
[11]:
CP970_06665    2.614419
CP970_06690    2.614419
Name: CHEMONTID:0000282, dtype: object

These two genes are actually DegT/DnrJ/EryC1/StrS-family aminotransferases, which are also found in the biosynthesic pathways of streptidine (one of the aminoglycoside moieties of streptomycin) or rifamycin B.