Basic functionalities#
The easiest way to get a prediction with CHAMOIS is to run the chamois predict command with a query BGC given as a GenBank record. For now, let’s use BGC0000703, the MIBiG BGC producing kanamycin in Streptomyces kanamyceticus. The record was pre-downloaded from MIBiG in GenBank format.
Note
This notebook calls the CHAMOIS CLI with the chamois.cli.run function. This is equivalent to calling the chamois command line in your shell, it’s only done here to integrate with the documentation generator. For instance, calling:
chamois.cli.run(["predict"])
is equivalent to running
$ chamois predict
in the console.
[1]:
import chamois.cli
chamois.__version__
[1]:
'0.2.2'
Running predictions#
Use the chamois predict command to run ChemOnt class predictions with CHAMOIS:
[2]:
# $ chamois predict -i data/BGC0000703.4.gbk -o data/BGC0000703.4.hdf5
chamois.cli.run(["predict", "-i", "data/BGC0000703.4.gbk", "-o", "data/BGC0000703.4.hdf5"])
Loading embedded model
Loading BGCs from 'data/BGC0000703.4.gbk'
Warning install "ipywidgets" for Jupyter support
Loaded 1 BGCs from 'data/BGC0000703.4.gbk'
Finding genes with Pyrodigal
Found 30 proteins in 1 clusters
Searching protein domains with HMMER
Found 60 domains under inclusion threshold in 30 proteins
Predicting chemical class probabilities
Computing information content of predictions
Saving result probabilities to 'data/BGC0000703.4.hdf5'
[2]:
0
The resulting HDF5 file can be opened with the anndata package for further analysis:
[3]:
import anndata
data = anndata.read_h5ad("data/BGC0000703.4.hdf5")
data
[3]:
AnnData object with n_obs × n_vars = 1 × 539
obs: 'source', 'length', 'genes', 'ic'
var: 'name', 'description', 'n_positives', 'information_accretion'
uns: 'chamois'
The observations (data.obs) store the metadata about the query BGCs:
[4]:
data.obs
[4]:
| source | length | genes | ic | |
|---|---|---|---|---|
| BGC0000703 | data/BGC0000703.4.gbk | 33678 | 30 | 37.269656 |
The variables (data.var) store the metadata about the chemical classes predicted by the CHAMOIS predictor.
[5]:
data.var
[5]:
| name | description | n_positives | information_accretion | |
|---|---|---|---|---|
| CHEMONTID:0000002 | Organoheterocyclic compounds | Compounds containing a ring with least one car... | 1325 | 0.000000 |
| CHEMONTID:0000004 | Organosulfur compounds | Organic compounds containing a carbon-sulfur b... | 170 | 0.000000 |
| CHEMONTID:0000007 | Amidines | Derivatives of oxoacids RnE(=O)OH in which the... | 35 | 5.197146 |
| CHEMONTID:0000011 | Carbohydrates and carbohydrate conjugates | Monosaccharides, disaccharides, oligosaccharid... | 341 | 2.203840 |
| CHEMONTID:0000012 | Lipids and lipid-like molecules | Fatty acids and their derivatives, and substan... | 679 | 0.000000 |
| ... | ... | ... | ... | ... |
| CHEMONTID:0004808 | Alpha-haloketones | Organic compounds contaning a halogen atom att... | 7 | 6.056831 |
| CHEMONTID:0004809 | Alpha-chloroketones | Organic compounds contaning a chlorine atom at... | 7 | -0.000000 |
| CHEMONTID:0004817 | 2-heteroaryl carboxamides | Compounds containing a heteroaromatic ring tha... | 53 | 3.881258 |
| CHEMONTID:0004830 | Dipeptides | Organic compounds containing a sequence of exa... | 62 | 2.403356 |
| CHEMONTID:0004831 | Oligopeptides | Organic compounds containing a sequence of bet... | 102 | 1.685127 |
539 rows × 4 columns
Visualizing results#
The resulting file is a HDF5 format file contains the class probabilities for each of the records in the input GenBank file. The CLI can be used to quickly inspect the predicted classes:
[6]:
# $ chamois render -i data/BGC0000703.4.hdf5
chamois.cli.run(["render", "-i", "data/BGC0000703.4.hdf5"])
Loading embedded model
Loading probability predictions from 'data/BGC0000703.4.hdf5'
╭─────────────────────────────────── BGC0000703 ────────────────────────────────────╮ │ CHEMONTID:0000002 (Organoheterocyclic compounds): 0.924 │ │ ├── CHEMONTID:0002012 (Oxanes): 0.924 │ │ └── CHEMONTID:0004140 (Oxacyclic compounds): 0.908 │ │ CHEMONTID:0004150 (Hydrocarbon derivatives): 0.997 │ │ CHEMONTID:0004557 (Organopnictogen compounds): 0.709 │ │ CHEMONTID:0004603 (Organic oxygen compounds): 0.999 │ │ └── CHEMONTID:0000323 (Organooxygen compounds): 0.999 │ │ ├── CHEMONTID:0000011 (Carbohydrates and carbohydrate conjugates): 0.918 │ │ │ ├── CHEMONTID:0001540 (Monosaccharides): 0.814 │ │ │ ├── CHEMONTID:0002105 (Glycosyl compounds): 0.744 │ │ │ │ └── CHEMONTID:0002207 (O-glycosyl compounds): 0.744 │ │ │ └── CHEMONTID:0003305 (Aminosaccharides): 0.918 │ │ │ └── CHEMONTID:0000282 (Aminoglycosides): 0.918 │ │ │ └── CHEMONTID:0001675 (Aminocyclitol glycosides): 0.918 │ │ ├── CHEMONTID:0000129 (Alcohols and polyols): 0.990 │ │ │ ├── CHEMONTID:0001292 (Cyclic alcohols and derivatives): 0.953 │ │ │ │ └── CHEMONTID:0002509 (Cyclitols and derivatives): 0.891 │ │ │ │ └── CHEMONTID:0002510 (Aminocyclitols and derivatives): 0.876 │ │ │ ├── CHEMONTID:0001661 (Secondary alcohols): 0.969 │ │ │ │ └── CHEMONTID:0002647 (Cyclohexanols): 0.906 │ │ │ ├── CHEMONTID:0001670 (Tertiary alcohols): 0.516 │ │ │ └── CHEMONTID:0002286 (Polyols): 0.721 │ │ └── CHEMONTID:0000254 (Ethers): 0.634 │ │ └── CHEMONTID:0001656 (Acetals): 0.634 │ │ CHEMONTID:0004707 (Organic nitrogen compounds): 0.992 │ │ └── CHEMONTID:0000278 (Organonitrogen compounds): 0.992 │ │ ├── CHEMONTID:0002449 (Amines): 0.971 │ │ │ ├── CHEMONTID:0002450 (Primary amines): 0.971 │ │ │ │ └── CHEMONTID:0000469 (Monoalkylamines): 0.971 │ │ │ └── CHEMONTID:0002460 (Alkanolamines): 0.745 │ │ │ └── CHEMONTID:0001897 (1,2-aminoalcohols): 0.654 │ │ └── CHEMONTID:0002674 (Cyclohexylamines): 0.876 │ ╰───────────────────────────────────────────────────────────────────────────────────╯
[6]:
0
Screening predictions#
Once predictions have been made, they can be screened with a particular query metabolite to see which BGC is the most likely to predict that metabolite. Let’s try with the kanamycin as a sanity check. Molecules can be passed to chamois compare as either SMILES, InChi, or InChi Key.
Info
Passing a SMILES or an InChi requires the additional Python dependency rdkit to handle conversion to InChi Key.
[7]:
# $ chamois compare -i data/BGC0000703.4.hdf5 -q SBUJHOSQTJFQJX-NOAMYHISSA-N --render
chamois.cli.run(["compare", "-i", "data/BGC0000703.4.hdf5", "-q", 'SBUJHOSQTJFQJX-NOAMYHISSA-N', "--render" ])
Loading embedded model
Loading probability predictions from 'data/BGC0000703.4.hdf5'
Retrieving 1 ClassyFire results for SBUJHOSQTJFQJX-NOAMYHISSA-N
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/urllib/request.py:1348 in do_open │ │ │ │ 1345 │ │ │ │ 1346 │ │ try: │ │ 1347 │ │ │ try: │ │ ❱ 1348 │ │ │ │ h.request(req.get_method(), req.selector, req.data, headers, │ │ 1349 │ │ │ │ │ │ encode_chunked=req.has_header('Transfer-encoding')) │ │ 1350 │ │ │ except OSError as err: # timeout error │ │ 1351 │ │ │ │ raise URLError(err) │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/http/client.py:1303 in request │ │ │ │ 1300 │ def request(self, method, url, body=None, headers={}, *, │ │ 1301 │ │ │ │ encode_chunked=False): │ │ 1302 │ │ """Send a complete request to the server.""" │ │ ❱ 1303 │ │ self._send_request(method, url, body, headers, encode_chunked) │ │ 1304 │ │ │ 1305 │ def _send_request(self, method, url, body, headers, encode_chunked): │ │ 1306 │ │ # Honor explicitly requested Host: and Accept-Encoding: headers. │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/http/client.py:1349 in _send_request │ │ │ │ 1346 │ │ │ # RFC 2616 Section 3.7.1 says that text default has a │ │ 1347 │ │ │ # default charset of iso-8859-1. │ │ 1348 │ │ │ body = _encode(body, 'body') │ │ ❱ 1349 │ │ self.endheaders(body, encode_chunked=encode_chunked) │ │ 1350 │ │ │ 1351 │ def getresponse(self): │ │ 1352 │ │ """Get the response from the server. │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/http/client.py:1298 in endheaders │ │ │ │ 1295 │ │ │ self.__state = _CS_REQ_SENT │ │ 1296 │ │ else: │ │ 1297 │ │ │ raise CannotSendHeader() │ │ ❱ 1298 │ │ self._send_output(message_body, encode_chunked=encode_chunked) │ │ 1299 │ │ │ 1300 │ def request(self, method, url, body=None, headers={}, *, │ │ 1301 │ │ │ │ encode_chunked=False): │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/http/client.py:1058 in _send_output │ │ │ │ 1055 │ │ self._buffer.extend((b"", b"")) │ │ 1056 │ │ msg = b"\r\n".join(self._buffer) │ │ 1057 │ │ del self._buffer[:] │ │ ❱ 1058 │ │ self.send(msg) │ │ 1059 │ │ │ │ 1060 │ │ if message_body is not None: │ │ 1061 │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/http/client.py:996 in send │ │ │ │ 993 │ │ │ │ 994 │ │ if self.sock is None: │ │ 995 │ │ │ if self.auto_open: │ │ ❱ 996 │ │ │ │ self.connect() │ │ 997 │ │ │ else: │ │ 998 │ │ │ │ raise NotConnected() │ │ 999 │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/http/client.py:1475 in connect │ │ │ │ 1472 │ │ │ else: │ │ 1473 │ │ │ │ server_hostname = self.host │ │ 1474 │ │ │ │ │ ❱ 1475 │ │ │ self.sock = self._context.wrap_socket(self.sock, │ │ 1476 │ │ │ │ │ │ │ │ │ │ │ │ server_hostname=server_hostname) │ │ 1477 │ │ │ 1478 │ __all__.append("HTTPSConnection") │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/ssl.py:517 in wrap_socket │ │ │ │ 514 │ │ │ │ │ server_hostname=None, session=None): │ │ 515 │ │ # SSLSocket class handles server_hostname encoding before it calls │ │ 516 │ │ # ctx._wrap_socket() │ │ ❱ 517 │ │ return self.sslsocket_class._create( │ │ 518 │ │ │ sock=sock, │ │ 519 │ │ │ server_side=server_side, │ │ 520 │ │ │ do_handshake_on_connect=do_handshake_on_connect, │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/ssl.py:1104 in _create │ │ │ │ 1101 │ │ │ │ │ if timeout == 0.0: │ │ 1102 │ │ │ │ │ │ # non-blocking │ │ 1103 │ │ │ │ │ │ raise ValueError("do_handshake_on_connect should not be specifie │ │ ❱ 1104 │ │ │ │ │ self.do_handshake() │ │ 1105 │ │ except: │ │ 1106 │ │ │ try: │ │ 1107 │ │ │ │ self.close() │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/ssl.py:1382 in do_handshake │ │ │ │ 1379 │ │ try: │ │ 1380 │ │ │ if timeout == 0.0 and block: │ │ 1381 │ │ │ │ self.settimeout(None) │ │ ❱ 1382 │ │ │ self._sslobj.do_handshake() │ │ 1383 │ │ finally: │ │ 1384 │ │ │ self.settimeout(timeout) │ │ 1385 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ SSLError: [SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:1016) During handling of the above exception, another exception occurred: ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/docs/checkouts/readthedocs.org/user_builds/chamois/envs/latest/lib/python3.11/site-package │ │ s/chamois/cli/__init__.py:181 in run │ │ │ │ 178 │ │ │ 179 │ with patch_showwarnings(functools.partial(_showwarnings, console)): │ │ 180 │ │ try: │ │ ❱ 181 │ │ │ return args.run(args, console) │ │ 182 │ │ except Exception as err: │ │ 183 │ │ │ console.print_exception() │ │ 184 │ │ │ return getattr(err, "code", 1) │ │ │ │ /home/docs/checkouts/readthedocs.org/user_builds/chamois/envs/latest/lib/python3.11/site-package │ │ s/chamois/_meta.py:106 in newfunc │ │ │ │ 103 │ │ │ wrapped.__globals__[basename] = sys.modules[basename] │ │ 104 │ │ │ │ │ 105 │ │ │ # call the function with the given arguments │ │ ❱ 106 │ │ │ return func(*args, **kwargs) │ │ 107 │ │ │ │ 108 │ │ return newfunc # type: ignore │ │ 109 │ │ │ │ /home/docs/checkouts/readthedocs.org/user_builds/chamois/envs/latest/lib/python3.11/site-package │ │ s/chamois/cli/compare.py:212 in run │ │ │ │ 209 │ for query in queries: │ │ 210 │ │ inchikey = query.inchikey │ │ 211 │ │ console.print(f"[bold blue]{'Retrieving':>12}[/] {len(queries)} ClassyFire resul │ │ ❱ 212 │ │ classifications[inchikey] = classyfire.fetch(inchikey) │ │ 213 │ │ │ 214 │ # binarize classifications │ │ 215 │ compounds = numpy.zeros((len(queries), len(predictor.classes_))) │ │ │ │ /home/docs/checkouts/readthedocs.org/user_builds/chamois/envs/latest/lib/python3.11/site-package │ │ s/chamois/classyfire.py:226 in fetch │ │ │ │ 223 │ │ if inchikey not in self.cache: │ │ 224 │ │ │ url = f"{self.entities_url}{inchikey}.json" │ │ 225 │ │ │ try: │ │ ❱ 226 │ │ │ │ response = self._get(url) │ │ 227 │ │ │ │ if 'error' in response: │ │ 228 │ │ │ │ │ raise RuntimeError(f"Failed to get classification: {response['error' │ │ 229 │ │ │ │ elif 'report' in response: │ │ │ │ /home/docs/checkouts/readthedocs.org/user_builds/chamois/envs/latest/lib/python3.11/site-package │ │ s/chamois/classyfire.py:216 in _get │ │ │ │ 213 │ │ if dt < self.delay: │ │ 214 │ │ │ time.sleep(self.delay - dt) │ │ 215 │ │ self._last_query = t │ │ ❱ 216 │ │ with urllib.request.urlopen(request) as res: │ │ 217 │ │ │ response = json.load(res) │ │ 218 │ │ return response │ │ 219 │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/urllib/request.py:216 in urlopen │ │ │ │ 213 │ │ _opener = opener = build_opener() │ │ 214 │ else: │ │ 215 │ │ opener = _opener │ │ ❱ 216 │ return opener.open(url, data, timeout) │ │ 217 │ │ 218 def install_opener(opener): │ │ 219 │ global _opener │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/urllib/request.py:519 in open │ │ │ │ 516 │ │ │ req = meth(req) │ │ 517 │ │ │ │ 518 │ │ sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method( │ │ ❱ 519 │ │ response = self._open(req, data) │ │ 520 │ │ │ │ 521 │ │ # post-process response │ │ 522 │ │ meth_name = protocol+"_response" │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/urllib/request.py:536 in _open │ │ │ │ 533 │ │ │ return result │ │ 534 │ │ │ │ 535 │ │ protocol = req.type │ │ ❱ 536 │ │ result = self._call_chain(self.handle_open, protocol, protocol + │ │ 537 │ │ │ │ │ │ │ │ '_open', req) │ │ 538 │ │ if result: │ │ 539 │ │ │ return result │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/urllib/request.py:496 in _call_chain │ │ │ │ 493 │ │ handlers = chain.get(kind, ()) │ │ 494 │ │ for handler in handlers: │ │ 495 │ │ │ func = getattr(handler, meth_name) │ │ ❱ 496 │ │ │ result = func(*args) │ │ 497 │ │ │ if result is not None: │ │ 498 │ │ │ │ return result │ │ 499 │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/urllib/request.py:1391 in https_open │ │ │ │ 1388 │ │ │ self._check_hostname = check_hostname │ │ 1389 │ │ │ │ 1390 │ │ def https_open(self, req): │ │ ❱ 1391 │ │ │ return self.do_open(http.client.HTTPSConnection, req, │ │ 1392 │ │ │ │ context=self._context, check_hostname=self._check_hostname) │ │ 1393 │ │ │ │ 1394 │ │ https_request = AbstractHTTPHandler.do_request_ │ │ │ │ /home/docs/.asdf/installs/python/3.11.14/lib/python3.11/urllib/request.py:1351 in do_open │ │ │ │ 1348 │ │ │ │ h.request(req.get_method(), req.selector, req.data, headers, │ │ 1349 │ │ │ │ │ │ encode_chunked=req.has_header('Transfer-encoding')) │ │ 1350 │ │ │ except OSError as err: # timeout error │ │ ❱ 1351 │ │ │ │ raise URLError(err) │ │ 1352 │ │ │ r = h.getresponse() │ │ 1353 │ │ except: │ │ 1354 │ │ │ h.close() │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ URLError: <urlopen error [SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:1016)>
[7]:
1
Searching a catalog#
Warning
This feature is experimental and has not been properly evaluated. Use with caution.
The predictions can be used to search a catalog of compounds encoded as a classes.hdf5 file, similar to what CHAMOIS uses for training. For instance, we can search which compound of MIBiG 3.1 is most similar to our prediction; hopefully we should get BGC0000703 among the top hits:
[8]:
# $ chamois search -i data/BGC0000703.4.hdf5 --catalog ../../data/datasets/mibig3.1/classes.hdf5 --render
chamois.cli.run(["search", "-i", "data/BGC0000703.4.hdf5", "--catalog", "../../data/datasets/mibig3.1/classes.hdf5", "--render"])
Loading embedded model
Loading probability predictions from 'data/BGC0000703.4.hdf5'
Loading compound catalog from '../../data/datasets/mibig3.1/classes.hdf5'
Computing pairwise distances and ranks
┏━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ BGC ┃ Index ┃ Compound ┃ Distance ┃ ┡━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩ │ BGC0000703 │ BGC0000706 │ kanamycin │ 0.40485 │ │ │ BGC0000704 │ kanamycin │ 0.40485 │ │ │ BGC0000703 │ kanamycin │ 0.40485 │ │ │ BGC0000702 │ kanamycin │ 0.40485 │ │ │ BGC0000705 │ kanamycin │ 0.40485 │ └────────────┴────────────┴───────────┴──────────┘
[8]:
0
Interpreting a prediction#
The chamois explain command allows obtaining additional information about a prediction made by CHAMOIS. It must be passed the original sequences of the BGCs, will re-annotate the genes, and will inspect the model weights to break down the prediction made by CHAMOIS into individual contributions from each genes, making it easier to understand the functions of the individual genes of the BGC. We call the chamois explain command with the --cds argument to ensure that the gene
coordinates and identifiers are those already defined in the GenBank record:
[9]:
# $ chamois explain --cds -i data/BGC0000703.4.gbk -o data/BGC0000703.4.tsv
chamois.cli.run(["explain", "cluster", "--cds", "-i", "data/BGC0000703.4.gbk", "-o", "data/BGC0000703.4.tsv"])
Loading embedded model
Loading BGCs from 'data/BGC0000703.4.gbk'
Warning install "ipywidgets" for Jupyter support
Loaded 1 BGCs from 'data/BGC0000703.4.gbk'
Extracting genes from CDS features
Found 29 proteins in 1 clusters
Searching protein domains with HMMER
Found 60 domains under inclusion threshold in 29 proteins
Predicting chemical class probabilities
Build gene contribution table
[9]:
0
The output is a table that shows the contribution of the genes of the BGC to each of the predicted classes. It can be easily loaded with pandas:
[10]:
import pandas
table = pandas.read_table("data/BGC0000703.4.tsv")
table
[10]:
| class | name | probability | CP970_06595 | CP970_06600 | CP970_06605 | CP970_06610 | CP970_06615 | CP970_06620 | CP970_06625 | ... | CP970_06690 | CP970_06695 | CP970_06700 | CP970_06705 | CP970_06710 | CP970_06715 | CP970_06720 | CP970_06725 | CP970_06730 | CP970_06735 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | CHEMONTID:0000002 | Organoheterocyclic compounds | 0.923556 | 0.000000 | 0.877091 | -0.112079 | 0.000000 | 0.0 | -0.206155 | 0.000000 | ... | 0.095444 | 0.130619 | -0.042749 | 0.0 | 0.415219 | 0.398794 | 0.0 | 0.0 | -1.146724 | 0.427507 |
| 1 | CHEMONTID:0000011 | Carbohydrates and carbohydrate conjugates | 0.917567 | -0.020241 | -0.666054 | -0.187588 | 0.000000 | 0.0 | 0.343486 | 0.000000 | ... | 0.000000 | 0.000000 | 0.118805 | 0.0 | 1.548473 | 0.225796 | 0.0 | 0.0 | -0.360273 | -0.095254 |
| 2 | CHEMONTID:0000129 | Alcohols and polyols | 0.989591 | 0.012490 | 0.205097 | 0.200143 | 0.000000 | 0.0 | 0.175132 | 0.000000 | ... | -0.335728 | 0.000000 | 0.126342 | 0.0 | 0.751500 | 0.811593 | 0.0 | 0.0 | 0.509014 | 0.000000 |
| 3 | CHEMONTID:0000254 | Ethers | 0.633811 | -1.316435 | 0.610610 | 0.000000 | 0.000000 | 0.0 | -0.305532 | 0.032231 | ... | -0.077757 | 0.000000 | 0.506525 | 0.0 | 1.788580 | -0.423346 | 0.0 | 0.0 | -0.365802 | 0.125469 |
| 4 | CHEMONTID:0000278 | Organonitrogen compounds | 0.991915 | 0.402224 | 0.158624 | 0.023002 | 0.000000 | 0.0 | 0.357750 | 0.000000 | ... | 2.924045 | 0.000000 | -0.656322 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | -0.289429 | 0.000000 |
| 5 | CHEMONTID:0000282 | Aminoglycosides | 0.917567 | 0.000000 | -0.954235 | 0.405878 | 0.000000 | 0.0 | 0.000000 | 0.000000 | ... | 2.614419 | 0.000000 | 0.528636 | 0.0 | 1.601960 | 0.309899 | 0.0 | 0.0 | 0.000000 | -0.399308 |
| 6 | CHEMONTID:0000323 | Organooxygen compounds | 0.999130 | 0.000000 | 1.600313 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.574179 | 0.0 | 0.000000 | 1.336391 | 0.0 | 0.0 | 0.000000 | 0.400177 |
| 7 | CHEMONTID:0000469 | Monoalkylamines | 0.971106 | 0.000000 | 0.251989 | 0.000000 | 0.000000 | 0.0 | 0.345969 | 0.000000 | ... | 1.820259 | 0.257724 | -0.006440 | 0.0 | -0.159208 | 0.480359 | 0.0 | 0.0 | 3.019452 | -0.400878 |
| 8 | CHEMONTID:0001292 | Cyclic alcohols and derivatives | 0.953354 | 0.000000 | 0.391831 | 0.000000 | 0.000000 | 0.0 | -0.263514 | 0.000000 | ... | 0.000000 | 0.000000 | 1.055596 | 0.0 | 0.730062 | 0.634123 | 0.0 | 0.0 | 0.000000 | 0.000000 |
| 9 | CHEMONTID:0001540 | Monosaccharides | 0.814167 | 0.964565 | -0.310512 | 0.000000 | 0.000000 | 0.0 | 0.296987 | 0.000000 | ... | -0.105943 | 0.000000 | 0.000000 | 0.0 | 0.783637 | 0.032636 | 0.0 | 0.0 | 0.361088 | 0.000000 |
| 10 | CHEMONTID:0001656 | Acetals | 0.633811 | 0.000000 | -0.176495 | 0.300208 | 0.000000 | 0.0 | -0.776526 | 0.000000 | ... | -0.128851 | 0.000000 | 1.226010 | 0.0 | 2.688506 | 0.004065 | 0.0 | 0.0 | -0.534600 | -0.278178 |
| 11 | CHEMONTID:0001661 | Secondary alcohols | 0.969029 | 0.504000 | 0.590214 | -0.358689 | 0.000000 | 0.0 | 0.000000 | 0.000000 | ... | -0.145315 | 0.161952 | 0.299064 | 0.0 | 1.714529 | 0.464526 | 0.0 | 0.0 | -0.017224 | -0.354166 |
| 12 | CHEMONTID:0001670 | Tertiary alcohols | 0.516207 | 0.110185 | 0.536606 | 0.585148 | 0.000000 | 0.0 | -0.033495 | 0.000000 | ... | 0.210159 | 0.263230 | 0.047262 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 1.445926 | 0.000000 |
| 13 | CHEMONTID:0001675 | Aminocyclitol glycosides | 0.917567 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.082557 | 0.0 | 2.953673 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 |
| 14 | CHEMONTID:0001897 | 1,2-aminoalcohols | 0.654238 | 0.000000 | -1.112867 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | ... | 2.762594 | 0.000000 | 0.634128 | 0.0 | 0.591480 | 0.000000 | 0.0 | 0.0 | 1.675683 | -0.469000 |
| 15 | CHEMONTID:0002012 | Oxanes | 0.923556 | -0.814555 | -0.340204 | 0.000000 | 0.000000 | 0.0 | -0.111479 | 0.000000 | ... | 0.768396 | 0.136129 | 0.592756 | 0.0 | 2.134330 | -0.119041 | 0.0 | 0.0 | 1.654629 | 0.408684 |
| 16 | CHEMONTID:0002105 | Glycosyl compounds | 0.743948 | 0.000000 | -0.250118 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.623561 | ... | -0.293854 | -0.536409 | 0.738301 | 0.0 | 1.316402 | 0.028088 | 0.0 | 0.0 | -1.606514 | -0.024158 |
| 17 | CHEMONTID:0002207 | O-glycosyl compounds | 0.743948 | 0.000000 | -0.513293 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 1.390645 | ... | -0.398967 | -0.239261 | 0.720448 | 0.0 | 1.951098 | 0.261111 | 0.0 | 0.0 | -1.034882 | -0.063126 |
| 18 | CHEMONTID:0002286 | Polyols | 0.721185 | -0.764770 | -0.285684 | 0.579471 | 0.000000 | 0.0 | 0.000000 | 0.000000 | ... | 0.000000 | 0.418039 | 0.560816 | 0.0 | -0.309156 | 0.219549 | 0.0 | 0.0 | -0.720628 | 0.492252 |
| 19 | CHEMONTID:0002449 | Amines | 0.971106 | 0.000000 | -0.371490 | 0.000000 | 0.000000 | 0.0 | -0.018467 | 0.000000 | ... | 1.814750 | 0.000000 | -0.440470 | 0.0 | -0.219234 | 0.402016 | 0.0 | 0.0 | 1.993030 | -0.110465 |
| 20 | CHEMONTID:0002450 | Primary amines | 0.971106 | 0.000000 | 0.390477 | 0.000000 | 0.000000 | 0.0 | 0.164720 | 0.000000 | ... | 1.526317 | 0.000000 | -0.144935 | 0.0 | -0.624989 | 0.379272 | 0.0 | 0.0 | 2.951594 | -0.321949 |
| 21 | CHEMONTID:0002460 | Alkanolamines | 0.744554 | 0.000000 | -0.226581 | 0.000000 | 0.000000 | 0.0 | -0.057931 | 0.000000 | ... | 2.573815 | 0.000000 | 0.603845 | 0.0 | 0.520413 | 0.000000 | 0.0 | 0.0 | 2.175561 | -0.357317 |
| 22 | CHEMONTID:0002509 | Cyclitols and derivatives | 0.890671 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.933805 | 0.0 | 1.868207 | 0.307503 | 0.0 | 0.0 | 0.000000 | 0.000000 |
| 23 | CHEMONTID:0002510 | Aminocyclitols and derivatives | 0.876259 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.0 | 2.775093 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 |
| 24 | CHEMONTID:0002647 | Cyclohexanols | 0.905791 | 0.000000 | 0.000000 | -0.203591 | 0.407918 | 0.0 | 0.000000 | 0.000000 | ... | 0.096246 | 0.000000 | 1.256786 | 0.0 | 1.880633 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 |
| 25 | CHEMONTID:0002674 | Cyclohexylamines | 0.876259 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.0 | 2.775093 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 |
| 26 | CHEMONTID:0003305 | Aminosaccharides | 0.917567 | 0.000000 | -1.313901 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | ... | 2.198156 | 0.000000 | 0.745322 | 0.0 | 1.692293 | 0.332242 | 0.0 | 0.0 | 0.023669 | -0.285864 |
| 27 | CHEMONTID:0004140 | Oxacyclic compounds | 0.907943 | -0.060771 | -0.075646 | 0.000000 | 0.000000 | 0.0 | -0.027500 | 0.000000 | ... | 0.034954 | 0.000000 | 0.335201 | 0.0 | 1.168670 | -0.142313 | 0.0 | 0.0 | 0.000000 | 0.364112 |
| 28 | CHEMONTID:0004150 | Hydrocarbon derivatives | 0.997134 | 0.000000 | 0.559085 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.126806 | 0.0 | 0.000000 | 0.260375 | 0.0 | 0.0 | 0.000000 | 0.000000 |
| 29 | CHEMONTID:0004557 | Organopnictogen compounds | 0.708598 | -0.316583 | -0.193551 | 0.000000 | 0.000000 | 0.0 | -0.866210 | 0.000000 | ... | 1.461274 | 0.006655 | -0.305747 | 0.0 | 0.000000 | 0.209187 | 0.0 | 0.0 | -0.915715 | -0.019364 |
| 30 | CHEMONTID:0004603 | Organic oxygen compounds | 0.999130 | 0.000000 | 1.294743 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.625820 | 0.0 | 0.000000 | 1.106881 | 0.0 | 0.0 | 0.000000 | 0.190451 |
| 31 | CHEMONTID:0004707 | Organic nitrogen compounds | 0.991915 | 0.402224 | 0.158624 | 0.023002 | 0.000000 | 0.0 | 0.357750 | 0.000000 | ... | 2.924045 | 0.000000 | -0.656322 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | -0.289429 | 0.000000 |
32 rows × 32 columns
For instance, to see which genes contribute significantly to the prediction of the BGC compound to CHEMONTID:0000282 (Aminoglycosides), we can extract the relevant row from the table and filter for genes with weight greater than 2.0:
[11]:
w = table.set_index("class").loc["CHEMONTID:0000282"].drop(["name", "probability"])
w[w >= 2]
[11]:
CP970_06665 2.614419
CP970_06690 2.614419
Name: CHEMONTID:0000282, dtype: object
These two genes are actually DegT/DnrJ/EryC1/StrS-family aminotransferases, which are also found in the biosynthesic pathways of streptidine (one of the aminoglycoside moieties of streptomycin) or rifamycin B.