PyPop.haplo#
Module for estimating haplotypes and linkage disequilibrium measures.
Currently there are two implementations: Emhaplofreq and
Haplostats.
Classes#
Estimating haplotypes given genotype data. |
|
Haplotype and linkage disequilibrium (LD) estimation via emhaplofreq. |
|
Haplotype and LD estimation implemented via |
|
Performs haplotype estimation via Arlequin. |
Module Contents#
- class Haplo#
Estimating haplotypes given genotype data.
This is abstract stub class (currently has no methods).
- class Emhaplofreq(locusData, untypedAllele='****', stream=None, testMode=False)#
Bases:
Haplo
Haplotype and linkage disequilibrium (LD) estimation via emhaplofreq.
This is essentially a wrapper to a Python extension built on top of the
emhaplofreqcommand-line program. Will refuse to estimate haplotypes longer than that defined byemhaplofreq.- Parameters:
locusData (StringMatrix) – a StringMatrix
untypedAllele (str) – defaults to
****stream (TextOutputStream) – output file
testMode (bool) – default is
False
- serializeStart()#
Serialize start of XML output to the currently defined XML stream.
See also
must be paired with a subsequent
Emhaplofreq.serializeEnd()
- serializeEnd()#
Serialize end of XML output to the currently defined XML stream.
See also
must be paired with a previous
Emhaplofreq.serializeStart()
- estHaplotypes(locusKeys=None, numInitCond=None)#
Estimate haplotypes for listed loci in
locusKeys.- Parameters:
Example
*DQA1:*DPB1,*DRB1:*DQB1, means to estimate haplotypes forDQA1andDPB1loci followed by estimation of haplotypes forDRB1andDQB1loci.
- estLinkageDisequilibrium(locusKeys=None, permutationPrintFlag=0, numInitCond=None, numPermutations=None, numPermuInitCond=None)#
Estimate linkage disequilibrium (LD) for listed loci.
- Parameters:
locusKeys (str) – see
estHaplotypes()permutationPrintFlag (int) – print all permutations (default
0)numInitCond (int) – number of initial conditions (default
None)numPermutations (int) – number of permutations (default
None)numPermuInitCond (int) – number of initial conditions for each permutation (default
None)
Example
See
estHaplotypes()for an example that estimates LD
- allPairwise(permutationPrintFlag=0, numInitCond=None, numPermutations=None, numPermuInitCond=None, haploSuppressFlag=None, haplosToShow=None, mode=None)#
Estimate pairwise statistics for a given set of loci.
Depending on the flags passed, this can be used to estimate both LD (linkage disequilibrium) and HF (haplotype frequencies), an optional permutation test on LD can be run.
- Parameters:
permutationPrintFlag (int) – sets whether the result from permutation output run will be included in the output XML. Default:
0(disabled).numInitCond (int) – sets number of initial conditions before performing the permutation test. Default:
None.numPermutations (int) – sets number of permutations that will be performed. Default:
None.numPermuInitCond (int) – sets number of initial conditions tried per-permutation. Default:
None.haploSuppressFlag (int) – sets whether haplotype information is generated in the output. Default:
NonehaplosToShow (list) – list of haplotypes to show in output
mode (str) – mode for haplotype output
- class Haplostats(locusData, untypedAllele='****', stream=None, testMode=False)#
Bases:
Haplo
Haplotype and LD estimation implemented via
haplo.stats.This is a wrapper to a portion of the
haplo.statsR package.- Parameters:
locusData (StringMatrix) – a StringMatrix
untypedAllele (str) – defaults to
****stream (TextOutputStream) – output file
testMode (bool) – default is
False
- serializeStart()#
Serialize start of XML output to currently defined XML stream.
See also
must be paired with a subsequent
Haplostats.serializeEnd()
- serializeEnd()#
Serialize end of XML output to currently defined XML stream.
See also
must be paired with a previous
Haplostats.serializeStart()
- estHaplotypes(locusKeys=None, weight=None, control=None, numInitCond=10, testMode=False)#
Estimate haplotypes for listed loci in
locusKeys.If
locusKeysisNone, assume entire matrix. LD is also estimated if there arelocusKeysconsisting of only two loci.Warning
FIXME: this does not yet remove missing data before haplotype estimations
- Parameters:
locusKeys (str) – see
Emhaplofreq.estHaplotypes()for formatweight (list) – set weights (default
None, which sets all weights equal)control (dict) – a dictionary of control parameters
numInitCond (int) – number of initial conditions (default
None)testMode (bool) – run in test mode default is
False
- Returns:
multiple statistics
- Return type:
- allPairwise(weight=None, control=None, numInitCond=10)#
Estimate pairwise statistics for all pairs of loci.
- Parameters:
weight (list) – see
Haplostats.estHaplotypes()control (dict) – see
Haplostats.estHaplotypes()numInitCond (int) – see
Haplostats.estHaplotypes()
- class HaploArlequin(arpFilename, idCol, prefixCols, suffixCols, windowSize, mapOrder=None, untypedAllele='0', arlequinPrefix='arl_run')#
Bases:
Haplo
Performs haplotype estimation via Arlequin.
Deprecated since version 1.0.0.
Outputs Arlequin format data files and runtime info, also runs and parses the resulting Arlequin data so it can be made available programmatically to rest of Python framework.
Delegates all calls Arlequin to an internally instantiated ArlequinBatch Python object called ‘batch’.
- Parameters:
arpFilename (str) – Arlequin filename (must have
.arpfile extension)idCol (str) – column in input file that contains the individual
id.prefixCols (int) – number of columns to ignore before allele data starts
suffixCols (int) – number of columns to ignore after allele data stops
windowSize (int) – size of sliding window
mapOrder (list) – list order of columns if different to column order in file (defaults to order in file)
untypedAllele (str) – (defaults to
0)arlequinPrefix (str) – prefix for all Arlequin run-time files (defaults to
arl_run).
- outputArlequin(data)#
Outputs the specified
.arpsample file.- Parameters:
data (list) – list of strings containing the
.arpsample file
- runArlequin()#
Run the Arlequin haplotyping program.
Generates the expected
.txtset-up files for Arlequin, then forks a copy ofarlecore.exe, which must be onPATHto actually generate the haplotype estimates from the generated.arpfile.
- genHaplotypes()#
Parses Arlequin output to retrieve estimated haplotypes.
- Returns:
a list of the sliding
windowswhich consists of tuples. Each tuple consists of:freqs (dict): dictionary entry (the haplotype-frequency) key-value pairs.
popName (str): population name (original
.arpfile prefix)sampleCount (int): sample count (number of samples for that window)
lociList (list): ordered list of loci considered
- Return type: