Compositions#
- chamois.compositions.build_observations(clusters, proteins=None)#
Build an observation table from a list of cluster sequences.
- Parameters:
clusters (
listofClusterSequence) – The cluster sequences to add to the observations table.proteins (
IterableofProtein) – The proteins extracted from the clusters, orNone. If given, the observations table will contain an additional column with the number of proteins per cluster.
- Returns:
DataFrame– The data frame containing the cluster sequences and their medata, to be used as theobstable of anAnnDataobject.
- chamois.compositions.build_variables(domains)#
Build a variable table from an iterable of domains.
The domain accessions will be used if all domains have an accession set, otherwise the domain names will be used (for compatibility with other HMM libraries than Pfam).
- chamois.compositions.build_compositions(domains, obs, var, uns=None)#
Build a compositional matrix from the given domain.
- Parameters:
domains (
IterableofDomain) – The domains found in the clusters to turn into a binary indicator matrix.obs (
DataFrame) – The input clusters, given as an observation table (obtained withbuild_observations).var (
DataFrame) – The feature domains, given as a variable table (obtained withbuild_variables).uns (
Mappingofstrtoobject) – Additional unstructured metadata to be added to the createdAnnDataobject.
- Returns:
AnnData– The compositional matrix, encoding the presence of protein domains in each gene cluster as a binary indicator matrix stored in acsr_matrix.