PyPop.Haplo#

Module for estimating haplotypes and linkage disequilibrium measures.

Currently there are two implementations: Emhaplofreq and Haplostats.

Classes#

Haplo

Estimating haplotypes given genotype data.

Emhaplofreq

Haplotype and linkage disequilibrium (LD) estimation via emhaplofreq.

Haplostats

Haplotype and LD estimation implemented via haplo.stats.

HaploArlequin

Performs haplotype estimation via Arlequin.

Module Contents#

class Haplo#

Estimating haplotypes given genotype data.

This is abstract stub class (currently has no methods).

class Emhaplofreq(locusData, debug=0, untypedAllele='****', stream=None, testMode=False)#

Bases: Haplo

Inheritance diagram of PyPop.Haplo.Emhaplofreq

Haplotype and linkage disequilibrium (LD) estimation via emhaplofreq.

This is essentially a wrapper to a Python extension built on top of the emhaplofreq command-line program. Will refuse to estimate haplotypes longer than that defined by emhaplofreq.

Parameters:
  • locusData (StringMatrix) – a StringMatrix

  • debug (int) – defaults to 0 (off)

  • untypedAllele (str) – defaults to ****

  • stream (TextOutputStream) – output file

  • testMode (bool) – default is False

serializeStart()#

Serialize start of XML output to the currently defined XML stream.

See also

must be paired with a subsequent Emhaplofreq.serializeEnd()

serializeEnd()#

Serialize end of XML output to the currently defined XML stream.

See also

must be paired with a previous Emhaplofreq.serializeStart()

estHaplotypes(locusKeys=None, numInitCond=None)#

Estimate haplotypes for listed loci in locusKeys.

Parameters:
  • locusKeys (str) –

    format is a string consisting of

    • comma (,) separated haplotypes blocks for which to estimate haplotypes

    • within each “block”, each locus is separated by colons ( : )

  • numInitCond (int) – number of initial conditions to use

Example

*DQA1:*DPB1,*DRB1:*DQB1, means to estimate haplotypes for DQA1 and DPB1 loci followed by estimation of haplotypes for DRB1 and DQB1 loci.

estLinkageDisequilibrium(locusKeys=None, permutationPrintFlag=0, numInitCond=None, numPermutations=None, numPermuInitCond=None)#

Estimate linkage disequilibrium (LD) for listed loci.

Parameters:
  • locusKeys (str) – see estHaplotypes()

  • permutationPrintFlag (int) – print all permutations (default 0)

  • numInitCond (int) – number of initial conditions (default None)

  • numPermutations (int) – number of permutations (default None)

  • numPermuInitCond (int) – number of initial conditions for each permutation (default None)

Example

See estHaplotypes() for an example that estimates LD

allPairwise(permutationPrintFlag=0, numInitCond=None, numPermutations=None, numPermuInitCond=None, haploSuppressFlag=None, haplosToShow=None, mode=None)#

Estimate pairwise statistics for a given set of loci.

Depending on the flags passed, this can be used to estimate both LD (linkage disequilibrium) and HF (haplotype frequencies), an optional permutation test on LD can be run.

Parameters:
  • permutationPrintFlag (int) – sets whether the result from permutation output run will be included in the output XML. Default: 0 (disabled).

  • numInitCond (int) – sets number of initial conditions before performing the permutation test. Default: None.

  • numPermutations (int) – sets number of permutations that will be performed. Default: None.

  • numPermuInitCond (int) – sets number of initial conditions tried per-permutation. Default: None.

  • haploSuppressFlag (int) – sets whether haplotype information is generated in the output. Default: None

  • haplosToShow (list) – list of haplotypes to show in output

  • mode (str) – mode for haplotype output

class Haplostats(locusData, debug=0, untypedAllele='****', stream=None, testMode=False)#

Bases: Haplo

Inheritance diagram of PyPop.Haplo.Haplostats

Haplotype and LD estimation implemented via haplo.stats.

This is a wrapper to a portion of the haplo.stats R package.

Parameters:
  • locusData (StringMatrix) – a StringMatrix

  • debug (int) – defaults to 0 (off)

  • untypedAllele (str) – defaults to ****

  • stream (TextOutputStream) – output file

  • testMode (bool) – default is False

serializeStart()#

Serialize start of XML output to currently defined XML stream.

See also

must be paired with a subsequent Haplostats.serializeEnd()

serializeEnd()#

Serialize end of XML output to currently defined XML stream.

See also

must be paired with a previous Haplostats.serializeStart()

estHaplotypes(locusKeys=None, weight=None, control=None, numInitCond=10, testMode=False)#

Estimate haplotypes for listed loci in locusKeys.

If locusKeys is None, assume entire matrix. LD is also estimated if there are locusKeys consisting of only two loci.

Warning

FIXME: this does not yet remove missing data before haplotype estimations

Parameters:
  • locusKeys (str) – see Emhaplofreq.estHaplotypes() for format

  • weight (list) – set weights (default None, which sets all weights equal)

  • control (dict) – a dictionary of control parameters

  • numInitCond (int) – number of initial conditions (default None)

  • testMode (bool) – run in test mode default is False

Returns:

multiple statistics

Return type:

tuple

allPairwise(weight=None, control=None, numInitCond=10)#

Estimate pairwise statistics for all pairs of loci.

Parameters:
class HaploArlequin(arpFilename, idCol, prefixCols, suffixCols, windowSize, mapOrder=None, untypedAllele='0', arlequinPrefix='arl_run', debug=0)#

Bases: Haplo

Inheritance diagram of PyPop.Haplo.HaploArlequin

Performs haplotype estimation via Arlequin.

Deprecated since version 1.0.0.

Outputs Arlequin format data files and runtime info, also runs and parses the resulting Arlequin data so it can be made available programmatically to rest of Python framework.

Delegates all calls Arlequin to an internally instantiated ArlequinBatch Python object called ‘batch’.

Parameters:
  • arpFilename (str) – Arlequin filename (must have .arp file extension)

  • idCol (str) – column in input file that contains the individual id.

  • prefixCols (int) – number of columns to ignore before allele data starts

  • suffixCols (int) – number of columns to ignore after allele data stops

  • windowSize (int) – size of sliding window

  • mapOrder (list) – list order of columns if different to column order in file (defaults to order in file)

  • untypedAllele (str) – (defaults to 0)

  • arlequinPrefix (str) – prefix for all Arlequin run-time files (defaults to arl_run).

  • debug (int) – (defaults to 0, i.e. OFF)

outputArlequin(data)#

Outputs the specified .arp sample file.

Parameters:

data (list) – list of strings containing the .arp sample file

runArlequin()#

Run the Arlequin haplotyping program.

Generates the expected .txt set-up files for Arlequin, then forks a copy of arlecore.exe, which must be on PATH to actually generate the haplotype estimates from the generated .arp file.

genHaplotypes()#

Parses Arlequin output to retrieve estimated haplotypes.

Returns:

a list of the sliding windows which consists of tuples. Each tuple consists of:

  • freqs (dict): dictionary entry (the haplotype-frequency) key-value pairs.

  • popName (str): population name (original .arp file prefix)

  • sampleCount (int): sample count (number of samples for that window)

  • lociList (list): ordered list of loci considered

Return type:

list