ORF Finder#

class chamois.orf.ORFFinder#

An abstract base class to provide a generic ORF finder.

abstractmethod find_genes(clusters, progress=None)#

Find all genes from a DNA sequence.

class chamois.orf.PyrodigalFinder(ORFFinder)#

An ORFFinder that uses the pyrodigal bindings to Prodigal.

Prodigal is a fast and reliable protein-coding gene prediction for prokaryotic genomes, with support for draft genomes and metagenomes. Since BGCs are short sequences, only “meta” mode can be used for detecting genes in the input.

__init__(mask=False, cpus=None)#

Create a new PyrodigalFinder instance.

Parameters:
  • mask (bool) – Whether or not to mask genes running across regions containing unknown nucleotides, defaults to False.

  • cpus (int) – The number of threads to use to run Pyrodigal in parallel. Pass 0 to use the number of CPUs on the machine.

find_genes(clusters, progress=None, *, pool_factory=<class 'multiprocessing.pool.ThreadPool'>)#

Find all genes contained in a sequence of DNA records.

Parameters:
  • clusters (iterable of ClusterSequence) – An iterable of raw cluster sequences in which to find genes

  • progress (callable, optional) – A progress callback of signature progress(cluster, total) that will be called everytime a record has been processed successfully, with record being the ClusterSequence instance, and total being the total number of records to process.

Keyword Arguments:

pool_factory (type) – The callable for creating pools, defaults to the multiprocessing.pool.ThreadPool class, but multiprocessing.pool.Pool is also supported.

Yields:

Protein – An iterator over all the genes found in the given records.

class chamois.orf.CDSFinder(ORFFinder)#

An ORFFinder that simply extracts CDS annotations from records.

__init__(feature='CDS', translation_table=11, locus_tag='locus_tag')#
find_genes(clusters, progress=None)#

Find all genes contained in a sequence of DNA records.