PyPop.Haplo#
Module for estimating haplotypes and linkage disequilibrium measures.
Currently there are two implementations: Emhaplofreq
and
Haplostats
.
Classes#
Estimating haplotypes given genotype data. |
|
Haplotype and linkage disequilibrium (LD) estimation via emhaplofreq. |
|
Haplotype and LD estimation implemented via |
|
Performs haplotype estimation via Arlequin. |
Module Contents#
- class Haplo#
Estimating haplotypes given genotype data.
This is abstract stub class (currently has no methods).
- class Emhaplofreq(locusData, debug=0, untypedAllele='****', stream=None, testMode=False)#
Bases:
Haplo
Haplotype and linkage disequilibrium (LD) estimation via emhaplofreq.
This is essentially a wrapper to a Python extension built on top of the
emhaplofreq
command-line program. Will refuse to estimate haplotypes longer than that defined byemhaplofreq
.- Parameters:
locusData (StringMatrix) – a StringMatrix
debug (int) – defaults to
0
(off)untypedAllele (str) – defaults to
****
stream (TextOutputStream) – output file
testMode (bool) – default is
False
- serializeStart()#
Serialize start of XML output to the currently defined XML stream.
See also
must be paired with a subsequent
Emhaplofreq.serializeEnd()
- serializeEnd()#
Serialize end of XML output to the currently defined XML stream.
See also
must be paired with a previous
Emhaplofreq.serializeStart()
- estHaplotypes(locusKeys=None, numInitCond=None)#
Estimate haplotypes for listed loci in
locusKeys
.- Parameters:
Example
*DQA1:*DPB1,*DRB1:*DQB1
, means to estimate haplotypes forDQA1
andDPB1
loci followed by estimation of haplotypes forDRB1
andDQB1
loci.
- estLinkageDisequilibrium(locusKeys=None, permutationPrintFlag=0, numInitCond=None, numPermutations=None, numPermuInitCond=None)#
Estimate linkage disequilibrium (LD) for listed loci.
- Parameters:
locusKeys (str) – see
estHaplotypes()
permutationPrintFlag (int) – print all permutations (default
0
)numInitCond (int) – number of initial conditions (default
None
)numPermutations (int) – number of permutations (default
None
)numPermuInitCond (int) – number of initial conditions for each permutation (default
None
)
Example
See
estHaplotypes()
for an example that estimates LD
- allPairwise(permutationPrintFlag=0, numInitCond=None, numPermutations=None, numPermuInitCond=None, haploSuppressFlag=None, haplosToShow=None, mode=None)#
Estimate pairwise statistics for a given set of loci.
Depending on the flags passed, this can be used to estimate both LD (linkage disequilibrium) and HF (haplotype frequencies), an optional permutation test on LD can be run.
- Parameters:
permutationPrintFlag (int) – sets whether the result from permutation output run will be included in the output XML. Default:
0
(disabled).numInitCond (int) – sets number of initial conditions before performing the permutation test. Default:
None
.numPermutations (int) – sets number of permutations that will be performed. Default:
None
.numPermuInitCond (int) – sets number of initial conditions tried per-permutation. Default:
None
.haploSuppressFlag (int) – sets whether haplotype information is generated in the output. Default:
None
haplosToShow (list) – list of haplotypes to show in output
mode (str) – mode for haplotype output
- class Haplostats(locusData, debug=0, untypedAllele='****', stream=None, testMode=False)#
Bases:
Haplo
Haplotype and LD estimation implemented via
haplo.stats
.This is a wrapper to a portion of the
haplo.stats
R package.- Parameters:
locusData (StringMatrix) – a StringMatrix
debug (int) – defaults to
0
(off)untypedAllele (str) – defaults to
****
stream (TextOutputStream) – output file
testMode (bool) – default is
False
- serializeStart()#
Serialize start of XML output to currently defined XML stream.
See also
must be paired with a subsequent
Haplostats.serializeEnd()
- serializeEnd()#
Serialize end of XML output to currently defined XML stream.
See also
must be paired with a previous
Haplostats.serializeStart()
- estHaplotypes(locusKeys=None, weight=None, control=None, numInitCond=10, testMode=False)#
Estimate haplotypes for listed loci in
locusKeys
.If
locusKeys
isNone
, assume entire matrix. LD is also estimated if there arelocusKeys
consisting of only two loci.Warning
FIXME: this does not yet remove missing data before haplotype estimations
- Parameters:
locusKeys (str) – see
Emhaplofreq.estHaplotypes()
for formatweight (list) – set weights (default
None
, which sets all weights equal)control (dict) – a dictionary of control parameters
numInitCond (int) – number of initial conditions (default
None
)testMode (bool) – run in test mode default is
False
- Returns:
multiple statistics
- Return type:
- allPairwise(weight=None, control=None, numInitCond=10)#
Estimate pairwise statistics for all pairs of loci.
- Parameters:
weight (list) – see
Haplostats.estHaplotypes()
control (dict) – see
Haplostats.estHaplotypes()
numInitCond (int) – see
Haplostats.estHaplotypes()
- class HaploArlequin(arpFilename, idCol, prefixCols, suffixCols, windowSize, mapOrder=None, untypedAllele='0', arlequinPrefix='arl_run', debug=0)#
Bases:
Haplo
Performs haplotype estimation via Arlequin.
Deprecated since version 1.0.0.
Outputs Arlequin format data files and runtime info, also runs and parses the resulting Arlequin data so it can be made available programmatically to rest of Python framework.
Delegates all calls Arlequin to an internally instantiated ArlequinBatch Python object called ‘batch’.
- Parameters:
arpFilename (str) – Arlequin filename (must have
.arp
file extension)idCol (str) – column in input file that contains the individual
id
.prefixCols (int) – number of columns to ignore before allele data starts
suffixCols (int) – number of columns to ignore after allele data stops
windowSize (int) – size of sliding window
mapOrder (list) – list order of columns if different to column order in file (defaults to order in file)
untypedAllele (str) – (defaults to
0
)arlequinPrefix (str) – prefix for all Arlequin run-time files (defaults to
arl_run
).debug (int) – (defaults to
0
, i.e. OFF)
- outputArlequin(data)#
Outputs the specified
.arp
sample file.- Parameters:
data (list) – list of strings containing the
.arp
sample file
- runArlequin()#
Run the Arlequin haplotyping program.
Generates the expected
.txt
set-up files for Arlequin, then forks a copy ofarlecore.exe
, which must be onPATH
to actually generate the haplotype estimates from the generated.arp
file.
- genHaplotypes()#
Parses Arlequin output to retrieve estimated haplotypes.
- Returns:
a list of the sliding
windows
which consists of tuples. Each tuple consists of:freqs (dict): dictionary entry (the haplotype-frequency) key-value pairs.
popName (str): population name (original
.arp
file prefix)sampleCount (int): sample count (number of samples for that window)
lociList (list): ordered list of loci considered
- Return type: