Compositions#

chamois.compositions.build_observations(clusters, proteins=None)#

Build an observation table from a list of cluster sequences.

Parameters:
  • clusters (list of ClusterSequence) – The cluster sequences to add to the observations table.

  • proteins (Iterable of Protein) – The proteins extracted from the clusters, or None. If given, the observations table will contain an additional column with the number of proteins per cluster.

Returns:

DataFrame – The data frame containing the cluster sequences and their medata, to be used as the obs table of an AnnData object.

chamois.compositions.build_variables(domains)#

Build a variable table from an iterable of domains.

The domain accessions will be used if all domains have an accession set, otherwise the domain names will be used (for compatibility with other HMM libraries than Pfam).

Parameters:

domains (Iterable of Domain) – The domains to add to the variables table.

Returns:

DataFrame – The data frame containing the sorted, deduplicated domains to be used as the var table of an AnnData object.

chamois.compositions.build_compositions(domains, obs, var, uns=None)#

Build a compositional matrix from the given domain.

Parameters:
Returns:

AnnData – The compositional matrix, encoding the presence of protein domains in each gene cluster as a binary indicator matrix stored in a csr_matrix.