PyPop.hardyweinberg#

Computing Hardy-Weinberg statistics on genotype data.

Attributes#

use_scipy

If True use scipy to compute pvalue, rather than internal pval

Classes#

`HardyWeinberg`	Calculate Hardy-Weinberg statistics for a single locus.
`HardyWeinbergGuoThompson`	Use Guo & Thompson (1992) algorithm for calculating statistics.
`HardyWeinbergEnumeration`	HW testing with Maldonado Torres' exact enumeration test.
`HardyWeinbergGuoThompsonArlequin`	Arlequin implementation of the Guo & Thompson algorithm.

Functions#

pval(chisq, dof)

Calculate p-value.

Module Contents#

use_scipy = False#: If True use scipy to compute pvalue, rather than internal pval

class HardyWeinberg(locusData=None, alleleCount=None, lumpBelow=5, flagChenTest=0)#

Calculate Hardy-Weinberg statistics for a single locus.

Given the observed genotypes for a locus, calculate the expected genotype counts based on Hardy Weinberg proportions for individual genotype values, and test for fit.

Parameters:

locusData (list) – list of tuples of genotype (allele1, allele2)
alleleCount (tuple) – a tuple consisting of a dictionary of counts, total count and number of untyped individuals as returned by PyPop.DataTypes.Genotypes.getLocusDataAt()
lumpBelow (int, optional) – lump alleles with frequency less than this threshold as if they were in same class (Default: 5)
flagChenTest (int, optional) – if enabled (1) do Chen’s chi-square-based “corrected” p-value (Default: 0, disabled)

serializeTo(stream, allelelump=0)#

Serialize output to specified XML stream.

Parameters:

stream (XMLOutputStream) – write to specified XML stream (generally a file)
allelelump (int) – record the allele lumping value

serializeXMLTableTo(stream)#

Serialize the genotype table.

Parameters:: stream (XMLOutputStream) – XML stream

class HardyWeinbergGuoThompson(locusData=None, alleleCount=None, runMCMCTest=0, runPlainMCTest=0, dememorizationSteps=2000, samplingNum=1000, samplingSize=1000, maxMatrixSize=250, monteCarloSteps=1000000, testing=False, **kw)#

Bases: HardyWeinberg

Inheritance diagram of PyPop.hardyweinberg.HardyWeinbergGuoThompson

Use Guo & Thompson (1992) algorithm for calculating statistics.

This Python class wraps the functionality of the Guo & Thompson program gthwe. In addition to the arguments for the base class, this class accepts the following additional keywords:

Parameters:

locusData (list) – list of tuples of genotype (allele1, allele2)
alleleCount (tuple) – a tuple consisting of a dictionary of counts, total count and number of untyped individuals as returned by PyPop.DataTypes.Genotypes.getLocusDataAt()
runMCMCTest (int) – If enabled (1) run the Monte Carlo-Markov chain (MCMC) version of the test (what is normally referred to as “Guo & Thompson”), default disabled (0)
runPlainMCTest (int) – If enabled (1) run a plain Monte Carlo/randomization without the Markov-chain version of the test (this is also described in the original Guo & Thompson Biometrics paper, but was not in their original program)
dememorizationSteps (int) – number of “dememorization” initial steps for random number generator (default 2000).
samplingNum (int) – the number of chunks for random number generator (default 1000).
samplingSize (int) – size of each chunk (default 1000).
maxMatrixSize (int) – maximum size of flattened’ lower-triangular matrix of observed alleles (default ``250`).
monteCarloSteps (int) – number of steps for the plain Monte Carlo randomization test (without Markov-chain)
testing (bool) – testing mode, default False

generateFlattenedMatrix()#: Generated a flattened version of the genotype matrix.

dumpTable(locusName, stream, allelelump=0)#

Output table to stream.

Parameters:

locusName (str) – locus to output table
stream (XMLOutputStream) – name of XML stream
allelelump (int) – record allele lumping level (default 0)

Returns:

if an empty tag

Return type:

None

class HardyWeinbergEnumeration(locusData=None, alleleCount=None, doOverall=0, **kw)#

Bases: HardyWeinbergGuoThompson

Inheritance diagram of PyPop.hardyweinberg.HardyWeinbergEnumeration

HW testing with Maldonado Torres’ exact enumeration test.

Warning

This requires the Enumeration C code to be compiled as a module using SWIG. By default this is currently disabled.

Parameters:

locusData (list) – list of tuples of genotype (allele1, allele2)
alleleCount (tuple) – a tuple consisting of a dictionary of counts, total count and number of untyped individuals as returned by PyPop.DataTypes.Genotypes.getLocusDataAt()
doOverall (int) – if set to true (1), then do overall p-value test default is false (0)

serializeTo(stream, allelelump=0)#

Serialize enumeration test output to stream.

Parameters:

stream (XMLOutputStream) – XML stream to use
allelelump (int) – record allele lumping level (default 0)

class HardyWeinbergGuoThompsonArlequin(matrix=None, locusName=None, arlequinExec='arlecore.exe', markovChainStepsHW=100000, markovChainDememorisationStepsHW=1000, untypedAllele='****')#

Arlequin implementation of the Guo & Thompson algorithm.

Deprecated since version 1.0.0.

This class extracts the Hardy-Weinberg (HW) statistics using the Arlequin implementation of the HW exact test, by the following:

creates a subdirectory arlequinRuns in which all the Arlequin specific files are generated;
then the specified arlequin executable is run, generating the Arlequin output HTML files (*.htm);
the Arlequin output is then parsed for the relevant statistics;
lastly, the arlequinRuns directory is removed.

Since the directory name arlequinRuns is currently hardcoded, this has the consequence that this class cannot be invoked concurrently.

Parameters:

matrix (StringMatrix) – matrix to extract locus from
locusName (str) – locus to use
arlequinExec (str) – name of Arlequin executable
markovChainStepsHW (int) – number of steps to use in Markov chain (default: 100000).
markovChainDememorisationStepsHW (int) – “Burn-in” time for Markov chain (default: 1000).
untypedAllele (str) – untyped allele identifier

serializeTo(stream)#

Serialize output to stream.

Parameters:: stream (XMLOutputStream) – stream to serialize to

pval(chisq, dof)#

Calculate p-value.

Parameters:

chisq (float) – Chi-square value
dof (int) – degrees of freedom

Returns:

p-value

Return type:

float