PyPop.DataTypes#

Data structures storing genotype and allele count data.

Classes#

Genotypes

Stores genotypes and caches basic genotype statistics.

AlleleCounts

Deprecated class to store information in allele count form.

Functions#

checkIfSequenceData(matrix)

Heuristic check to determine whether we are analysing sequence.

getMetaLocus(locus, isSequenceData)

Get the overall locus that this sequence belongs to.

getLocusPairs(matrix, sequenceData)

Get locus pairs for a given matrix.

getLumpedDataLevels(genotypeData, locus, lumpLevels)

Get lumped data for a specific locus.

Module Contents#

class Genotypes(matrix=None, untypedAllele='****', unsequencedSite=None, allowSemiTyped=0, debug=0)#

Stores genotypes and caches basic genotype statistics.

Parameters:
  • matrix (StringMatrix) – The StringMatrix to be converted into a Genotype instance

  • untypedAllele (str) – The placeholder for an untyped allele site

  • unsequencedSite (bool) – The identifier used for an unsequenced site (only used for sequence data)

  • allowSemiTyped (int) – Whether or not to allow individuals that are typed at only one allele

  • debug (int) – Switch on debugging

getLocusList()#

Get the list of loci.

Note

The returned list filters out all loci that consist of individuals that are all untyped. The order of returned list is now fixed for the lifetime of the object.

Returns:

The list of loci.

Return type:

list

getAlleleCount()#

Allele count statistics for all loci.

Returns:

a map of tuples where the key is the locus name. Each tuple is a triple, consisting of a map keyed by alleles containing counts, the total count at that locus and the number of untyped individuals.

Return type:

dict

getAlleleCountAt(locus, lumpValue=0)#

Get allele count for given locus.

Parameters:
  • locus (str) – locus

  • lumpValue (int) – the specified amount of lumping (Default: 0)

Returns:

a tuple consisting of a map keyed by alleles containing counts, the total count at that locus, and number of untyped individuals.

Return type:

tuple

serializeSubclassMetadataTo(stream)#

Serialize subclass-specific metadata.

Specifically, total number of individuals and loci and population name.

Parameters:

stream (TextOutputStream) – the stream used for output.

serializeAlleleCountDataAt(stream, locus)#

Serialize locus count data for a specific locus.

Specifically, total number of individuals and loci and population name.

Parameters:
serializeAlleleCountDataTo(stream)#

Serialize allele count data for a specific locus.

Parameters:

stream (TextOutputStream) – the stream used for output

Returns:

always returns 1

Return type:

int

getLocusDataAt(locus, lumpValue=0)#

Get the genotyped data for specified locus.

Note

The returned list has filtered out all individuals that are untyped at either chromosome. Data is sorted so that allele1 < allele2, alphabetically

Parameters:
  • locus (str) – locus to use

  • lumpValue (int) – the specified amount of lumping (Default: 0).

Returns:

a list genotypes consisting of 2-tuples which contain each of the alleles for that individual in the list.

Return type:

list

getLocusData()#

Get the genotyped data for all loci.

Returns:

keyed by locus name of lists of 2-tuples as defined by getLocusDataAt()

Return type:

dict

getIndividualsData()#

Get data for all individuals.

Returns:

StringMatrix for all individuals

Return type:

StringMatrix

class AlleleCounts(alleleTable=None, locusName=None, debug=0)#

Deprecated class to store information in allele count form.

Deprecated since version 0.6.0: this class is now obsolete, the Genotypes class now holds allele count data as pseudo-genotype matrix.

serializeSubclassMetadataTo(stream)#

Serialize subclass-specific metadata.

Specifically, total number of alleles and loci.

serializeAlleleCountDataAt(stream, locus)#
getAlleleCount()#
getLocusName()#
checkIfSequenceData(matrix)#

Heuristic check to determine whether we are analysing sequence.

Note

The regex matches loci of the form A_32 or A_-32

Parameters:

matrix (StringMatrix) – matrix to check

Returns:

if sequence, return 1, otherwise 0

Return type:

int

getMetaLocus(locus, isSequenceData)#

Get the overall locus that this sequence belongs to.

Parameters:
  • locus (str) – Locus of interest.

  • isSequenceData (bool) – whether this locus is sequence data

Returns:

The locus name, or None if not sequence data.

Return type:

str

getLocusPairs(matrix, sequenceData)#

Get locus pairs for a given matrix.

Parameters:
  • matrix (StringMatrix) – matrix

  • sequenceData (bool) – is this sequence data?

Returns:

Returns a list of all pairs of loci from a given StringMatrix.

Return type:

list

getLumpedDataLevels(genotypeData, locus, lumpLevels)#

Get lumped data for a specific locus.

Parameters:
  • genotypeData (Genotypes) – genotype data to query

  • locus (str) – the locus

  • lumpLevels (list) – a list of integers representing lumping levels

Returns:

a dictionary of tuples:
  • locusData: keyed by locus

  • alleleCount:

Return type:

dict