PyPop.hardyweinberg#

Computing Hardy-Weinberg statistics on genotype data.

Attributes#

use_scipy

If True use scipy to compute pvalue, rather than internal pval

Classes#

HardyWeinberg

Calculate Hardy-Weinberg statistics for a single locus.

HardyWeinbergGuoThompson

Use Guo & Thompson (1992) algorithm for calculating statistics.

HardyWeinbergEnumeration

HW testing with Maldonado Torres' exact enumeration test.

HardyWeinbergGuoThompsonArlequin

Arlequin implementation of the Guo & Thompson algorithm.

Functions#

pval(chisq, dof)

Calculate p-value.

Module Contents#

use_scipy = False#

If True use scipy to compute pvalue, rather than internal pval

class HardyWeinberg(locusData=None, alleleCount=None, lumpBelow=5, flagChenTest=0)#

Calculate Hardy-Weinberg statistics for a single locus.

Given the observed genotypes for a locus, calculate the expected genotype counts based on Hardy Weinberg proportions for individual genotype values, and test for fit.

Parameters:
  • locusData (list) – list of tuples of genotype (allele1, allele2)

  • alleleCount (tuple) – a tuple consisting of a dictionary of counts, total count and number of untyped individuals as returned by PyPop.DataTypes.Genotypes.getLocusDataAt()

  • lumpBelow (int, optional) – lump alleles with frequency less than this threshold as if they were in same class (Default: 5)

  • flagChenTest (int, optional) – if enabled (1) do Chen’s chi-square-based “corrected” p-value (Default: 0, disabled)

serializeTo(stream, allelelump=0)#

Serialize output to specified XML stream.

Parameters:
  • stream (XMLOutputStream) – write to specified XML stream (generally a file)

  • allelelump (int) – record the allele lumping value

serializeXMLTableTo(stream)#

Serialize the genotype table.

Parameters:

stream (XMLOutputStream) – XML stream

class HardyWeinbergGuoThompson(locusData=None, alleleCount=None, runMCMCTest=0, runPlainMCTest=0, dememorizationSteps=2000, samplingNum=1000, samplingSize=1000, maxMatrixSize=250, monteCarloSteps=1000000, testing=False, **kw)#

Bases: HardyWeinberg

Inheritance diagram of PyPop.hardyweinberg.HardyWeinbergGuoThompson

Use Guo & Thompson (1992) algorithm for calculating statistics.

This Python class wraps the functionality of the Guo & Thompson program gthwe. In addition to the arguments for the base class, this class accepts the following additional keywords:

Parameters:
  • locusData (list) – list of tuples of genotype (allele1, allele2)

  • alleleCount (tuple) – a tuple consisting of a dictionary of counts, total count and number of untyped individuals as returned by PyPop.DataTypes.Genotypes.getLocusDataAt()

  • runMCMCTest (int) – If enabled (1) run the Monte Carlo-Markov chain (MCMC) version of the test (what is normally referred to as “Guo & Thompson”), default disabled (0)

  • runPlainMCTest (int) – If enabled (1) run a plain Monte Carlo/randomization without the Markov-chain version of the test (this is also described in the original Guo & Thompson Biometrics paper, but was not in their original program)

  • dememorizationSteps (int) – number of “dememorization” initial steps for random number generator (default 2000).

  • samplingNum (int) – the number of chunks for random number generator (default 1000).

  • samplingSize (int) – size of each chunk (default 1000).

  • maxMatrixSize (int) – maximum size of flattened’ lower-triangular matrix of observed alleles (default ``250`).

  • monteCarloSteps (int) – number of steps for the plain Monte Carlo randomization test (without Markov-chain)

  • testing (bool) – testing mode, default False

generateFlattenedMatrix()#

Generated a flattened version of the genotype matrix.

dumpTable(locusName, stream, allelelump=0)#

Output table to stream.

Parameters:
  • locusName (str) – locus to output table

  • stream (XMLOutputStream) – name of XML stream

  • allelelump (int) – record allele lumping level (default 0)

Returns:

if an empty tag

Return type:

None

class HardyWeinbergEnumeration(locusData=None, alleleCount=None, doOverall=0, **kw)#

Bases: HardyWeinbergGuoThompson

Inheritance diagram of PyPop.hardyweinberg.HardyWeinbergEnumeration

HW testing with Maldonado Torres’ exact enumeration test.

Warning

This requires the Enumeration C code to be compiled as a module using SWIG. By default this is currently disabled.

Parameters:
  • locusData (list) – list of tuples of genotype (allele1, allele2)

  • alleleCount (tuple) – a tuple consisting of a dictionary of counts, total count and number of untyped individuals as returned by PyPop.DataTypes.Genotypes.getLocusDataAt()

  • doOverall (int) – if set to true (1), then do overall p-value test default is false (0)

serializeTo(stream, allelelump=0)#

Serialize enumeration test output to stream.

Parameters:
  • stream (XMLOutputStream) – XML stream to use

  • allelelump (int) – record allele lumping level (default 0)

class HardyWeinbergGuoThompsonArlequin(matrix=None, locusName=None, arlequinExec='arlecore.exe', markovChainStepsHW=100000, markovChainDememorisationStepsHW=1000, untypedAllele='****')#

Arlequin implementation of the Guo & Thompson algorithm.

Deprecated since version 1.0.0.

This class extracts the Hardy-Weinberg (HW) statistics using the Arlequin implementation of the HW exact test, by the following:

  1. creates a subdirectory arlequinRuns in which all the Arlequin specific files are generated;

  2. then the specified arlequin executable is run, generating the Arlequin output HTML files (*.htm);

  3. the Arlequin output is then parsed for the relevant statistics;

  4. lastly, the arlequinRuns directory is removed.

Since the directory name arlequinRuns is currently hardcoded, this has the consequence that this class cannot be invoked concurrently.

Parameters:
  • matrix (StringMatrix) – matrix to extract locus from

  • locusName (str) – locus to use

  • arlequinExec (str) – name of Arlequin executable

  • markovChainStepsHW (int) – number of steps to use in Markov chain (default: 100000).

  • markovChainDememorisationStepsHW (int) – “Burn-in” time for Markov chain (default: 1000).

  • untypedAllele (str) – untyped allele identifier

serializeTo(stream)#

Serialize output to stream.

Parameters:

stream (XMLOutputStream) – stream to serialize to

pval(chisq, dof)#

Calculate p-value.

Parameters:
  • chisq (float) – Chi-square value

  • dof (int) – degrees of freedom

Returns:

p-value

Return type:

float