PyPop.hardyweinberg#
Computing Hardy-Weinberg statistics on genotype data.
Attributes#
If |
Classes#
Calculate Hardy-Weinberg statistics for a single locus. |
|
Use Guo & Thompson (1992) algorithm for calculating statistics. |
|
HW testing with Maldonado Torres' exact enumeration test. |
|
Arlequin implementation of the Guo & Thompson algorithm. |
Functions#
|
Calculate p-value. |
Module Contents#
- use_scipy = False#
If
Trueusescipyto compute pvalue, rather than internalpval
- class HardyWeinberg(locusData=None, alleleCount=None, lumpBelow=5, flagChenTest=0)#
Calculate Hardy-Weinberg statistics for a single locus.
Given the observed genotypes for a locus, calculate the expected genotype counts based on Hardy Weinberg proportions for individual genotype values, and test for fit.
- Parameters:
locusData (list) – list of tuples of genotype
(allele1, allele2)alleleCount (tuple) – a tuple consisting of a dictionary of counts, total count and number of untyped individuals as returned by
PyPop.DataTypes.Genotypes.getLocusDataAt()lumpBelow (int, optional) – lump alleles with frequency less than this threshold as if they were in same class (Default: 5)
flagChenTest (int, optional) – if enabled (
1) do Chen’s chi-square-based “corrected” p-value (Default:0, disabled)
- serializeTo(stream, allelelump=0)#
Serialize output to specified XML stream.
- Parameters:
stream (XMLOutputStream) – write to specified XML stream (generally a file)
allelelump (int) – record the allele lumping value
- serializeXMLTableTo(stream)#
Serialize the genotype table.
- Parameters:
stream (XMLOutputStream) – XML stream
- class HardyWeinbergGuoThompson(locusData=None, alleleCount=None, runMCMCTest=0, runPlainMCTest=0, dememorizationSteps=2000, samplingNum=1000, samplingSize=1000, maxMatrixSize=250, monteCarloSteps=1000000, testing=False, **kw)#
Bases:
HardyWeinberg
Use Guo & Thompson (1992) algorithm for calculating statistics.
This Python class wraps the functionality of the Guo & Thompson program
gthwe. In addition to the arguments for the base class, this class accepts the following additional keywords:- Parameters:
locusData (list) – list of tuples of genotype
(allele1, allele2)alleleCount (tuple) – a tuple consisting of a dictionary of counts, total count and number of untyped individuals as returned by
PyPop.DataTypes.Genotypes.getLocusDataAt()runMCMCTest (int) – If enabled (
1) run the Monte Carlo-Markov chain (MCMC) version of the test (what is normally referred to as “Guo & Thompson”), default disabled (0)runPlainMCTest (int) – If enabled (
1) run a plain Monte Carlo/randomization without the Markov-chain version of the test (this is also described in the original Guo & Thompson Biometrics paper, but was not in their original program)dememorizationSteps (int) – number of “dememorization” initial steps for random number generator (default
2000).samplingNum (int) – the number of chunks for random number generator (default
1000).samplingSize (int) – size of each chunk (default
1000).maxMatrixSize (int) – maximum size of flattened’ lower-triangular matrix of observed alleles (default ``250`).
monteCarloSteps (int) – number of steps for the plain Monte Carlo randomization test (without Markov-chain)
testing (bool) – testing mode, default
False
- generateFlattenedMatrix()#
Generated a flattened version of the genotype matrix.
- dumpTable(locusName, stream, allelelump=0)#
Output table to stream.
- Parameters:
locusName (str) – locus to output table
stream (XMLOutputStream) – name of XML stream
allelelump (int) – record allele lumping level (default
0)
- Returns:
if an empty tag
- Return type:
None
- class HardyWeinbergEnumeration(locusData=None, alleleCount=None, doOverall=0, **kw)#
Bases:
HardyWeinbergGuoThompson
HW testing with Maldonado Torres’ exact enumeration test.
Warning
This requires the
EnumerationC code to be compiled as a module using SWIG. By default this is currently disabled.- Parameters:
locusData (list) – list of tuples of genotype
(allele1, allele2)alleleCount (tuple) – a tuple consisting of a dictionary of counts, total count and number of untyped individuals as returned by
PyPop.DataTypes.Genotypes.getLocusDataAt()doOverall (int) – if set to true (
1), then do overall p-value test default is false (0)
- serializeTo(stream, allelelump=0)#
Serialize enumeration test output to stream.
- Parameters:
stream (XMLOutputStream) – XML stream to use
allelelump (int) – record allele lumping level (default
0)
- class HardyWeinbergGuoThompsonArlequin(matrix=None, locusName=None, arlequinExec='arlecore.exe', markovChainStepsHW=100000, markovChainDememorisationStepsHW=1000, untypedAllele='****')#
Arlequin implementation of the Guo & Thompson algorithm.
Deprecated since version 1.0.0.
This class extracts the Hardy-Weinberg (HW) statistics using the Arlequin implementation of the HW exact test, by the following:
creates a subdirectory
arlequinRunsin which all the Arlequin specific files are generated;then the specified arlequin executable is run, generating the Arlequin output HTML files (
*.htm);the Arlequin output is then parsed for the relevant statistics;
lastly, the
arlequinRunsdirectory is removed.
Since the directory name
arlequinRunsis currently hardcoded, this has the consequence that this class cannot be invoked concurrently.- Parameters:
matrix (StringMatrix) – matrix to extract locus from
locusName (str) – locus to use
arlequinExec (str) – name of Arlequin executable
markovChainStepsHW (int) – number of steps to use in Markov chain (default:
100000).markovChainDememorisationStepsHW (int) – “Burn-in” time for Markov chain (default:
1000).untypedAllele (str) – untyped allele identifier
- serializeTo(stream)#
Serialize output to stream.
- Parameters:
stream (XMLOutputStream) – stream to serialize to