PyPop.DataTypes#
Data structures storing genotype and allele count data.
Classes#
Stores genotypes and caches basic genotype statistics. |
|
Deprecated class to store information in allele count form. |
Functions#
|
Heuristic check to determine whether we are analysing sequence. |
|
Get the overall locus that this sequence belongs to. |
|
Get locus pairs for a given matrix. |
|
Get lumped data for a specific locus. |
Module Contents#
- class Genotypes(matrix=None, untypedAllele='****', unsequencedSite=None, allowSemiTyped=0, debug=0)#
Stores genotypes and caches basic genotype statistics.
- Parameters:
matrix (StringMatrix) – The
StringMatrix
to be converted into aGenotype
instanceuntypedAllele (str) – The placeholder for an untyped allele site
unsequencedSite (bool) – The identifier used for an unsequenced site (only used for sequence data)
allowSemiTyped (int) – Whether or not to allow individuals that are typed at only one allele
debug (int) – Switch on debugging
- getLocusList()#
Get the list of loci.
Note
The returned list filters out all loci that consist of individuals that are all untyped. The order of returned list is now fixed for the lifetime of the object.
- Returns:
The list of loci.
- Return type:
- getAlleleCount()#
Allele count statistics for all loci.
- Returns:
a map of tuples where the key is the locus name. Each tuple is a triple, consisting of a map keyed by alleles containing counts, the total count at that locus and the number of untyped individuals.
- Return type:
- getAlleleCountAt(locus, lumpValue=0)#
Get allele count for given locus.
- serializeSubclassMetadataTo(stream)#
Serialize subclass-specific metadata.
Specifically, total number of individuals and loci and population name.
- Parameters:
stream (TextOutputStream) – the stream used for output.
- serializeAlleleCountDataAt(stream, locus)#
Serialize locus count data for a specific locus.
Specifically, total number of individuals and loci and population name.
- Parameters:
stream (TextOutputStream) – the stream used for output
locus (str) – locus
- serializeAlleleCountDataTo(stream)#
Serialize allele count data for a specific locus.
- Parameters:
stream (TextOutputStream) – the stream used for output
- Returns:
always returns
1
- Return type:
- getLocusDataAt(locus, lumpValue=0)#
Get the genotyped data for specified locus.
Note
The returned list has filtered out all individuals that are untyped at either chromosome. Data is sorted so that
allele1
<allele2
, alphabetically
- getLocusData()#
Get the genotyped data for all loci.
- Returns:
keyed by locus name of lists of 2-tuples as defined by
getLocusDataAt()
- Return type:
- getIndividualsData()#
Get data for all individuals.
- Returns:
StringMatrix
for all individuals- Return type:
- class AlleleCounts(alleleTable=None, locusName=None, debug=0)#
Deprecated class to store information in allele count form.
Deprecated since version 0.6.0: this class is now obsolete, the
Genotypes
class now holds allele count data as pseudo-genotype matrix.- serializeSubclassMetadataTo(stream)#
Serialize subclass-specific metadata.
Specifically, total number of alleles and loci.
- serializeAlleleCountDataAt(stream, locus)#
- getAlleleCount()#
- getLocusName()#
- checkIfSequenceData(matrix)#
Heuristic check to determine whether we are analysing sequence.
Note
The regex matches loci of the form
A_32
orA_-32
- Parameters:
matrix (StringMatrix) – matrix to check
- Returns:
if sequence, return
1
, otherwise0
- Return type:
- getMetaLocus(locus, isSequenceData)#
Get the overall locus that this sequence belongs to.
- getLocusPairs(matrix, sequenceData)#
Get locus pairs for a given matrix.
- Parameters:
matrix (StringMatrix) – matrix
sequenceData (bool) – is this sequence data?
- Returns:
Returns a list of all pairs of loci from a given
StringMatrix
.- Return type:
- getLumpedDataLevels(genotypeData, locus, lumpLevels)#
Get lumped data for a specific locus.