PyPop.utils#

Module for common utility classes and functions.

Contains convenience classes for output of text and XML files.

Attributes#

GENOTYPE_SEPARATOR

Separator between genotypes

GENOTYPE_TERMINATOR

Terminator of genotypes

Classes#

TextOutputStream

Output stream for writing text files.

XMLOutputStream

Output stream for writing XML files.

StringMatrix

Matrix of strings and other metadata from input file to PyPop.

Group

Group list or sequence into non-overlapping chunks.

OrderedDict

A dictionary class with ordered pairs.

Index

Returns an Index object for OrderedDict.

Functions#

critical_exit(message, *args)

Log a CRITICAL message and exit with status 1.

getStreamType(stream)

Get the type of stream.

glob_with_pathlib(pattern)

Use globbing with pathlib.

natural_sort_key(s[, _nsre])

Generate a key for natural (human-friendly) sorting.

unique_elements(li)

Gets the unique elements in a list.

appendTo2dList(aList[, appendStr])

Append a string to each element in a list.

convertLineEndings(file, mode)

Convert line endings based on platform.

fixForPlatform(filename[, txt_ext])

Fix for some Windws/MS-DOS platforms.

copyfileCustomPlatform(src, dest[, txt_ext])

Copy file to file with fixes.

copyCustomPlatform(file, dist_dir[, txt_ext])

Copy file to directory with fixes.

checkXSLFile(xslFilename[, path, subdir, abort, msg])

Check XSL filename and return full path.

getUserFilenameInput(prompt, filename)

Get user filename input.

splitIntoNGroups(alist[, n])

Divides a list up into n parcels (plus whatever is left over).

Module Contents#

GENOTYPE_SEPARATOR = '~'#

Separator between genotypes

Example

In a haplotype 01:01~13:01~04:02

GENOTYPE_TERMINATOR = '~'#

Terminator of genotypes

Example

`02:01:01:01~

class TextOutputStream(file)#

Output stream for writing text files.

Parameters:

file (file) – file handle

write(str)#

Write to stream.

Parameters:

str (str) – string to write

writeln(str='\n')#

Write a newline to stream.

Parameters:

str (str, optional) – defaults to newline

close()#

Close stream.

flush()#

Flush to disk.

class XMLOutputStream(file)#

Bases: TextOutputStream

Inheritance diagram of PyPop.utils.XMLOutputStream

Output stream for writing XML files.

opentag(tagname, **kw)#

Write an open XML tag to stream.

Tag attributes passed as optional named keyword arguments.

Example

opentag('tagname', role=something, id=else)

produces the result:

<tagname role="something" id="else">

Attribute and values are optional:

opentag('tagname')

Produces:

<tagname>

See also

Must be be followed by a closetag().

Parameters:

tagname (str) – name of XML tag

emptytag(tagname, **kw)#

Write an empty XML tag to stream.

This follows the same syntax as opentag() but without XML content (but can contain attributes).

Example

`emptytag('tagname', attr='val')

produces:

<tagname attr="val"/>

Parameters:

tagname (str) – name of XML tag

closetag(tagname)#

Write a closing XML tag to stream.

Example

closetag('tagname')

Generate a tag in the form:

</tagname>

See also

Must be be preceded by a opentag().

Parameters:

tagname (str) – name of XML tag

tagContents(tagname, content, **kw)#

Write XML tags around contents to a stream.

Example

tagContents('tagname', 'foo bar')

produces:

<tagname>foo bar</tagname>`

Parameters:
  • tagname (str) – name of XML tag

  • content (str) – must only be a string. &, < and > are converted into valid XML equivalents.

class StringMatrix(rowCount=None, colList=None, extraList=None, colSep='\t', headerLines=None)#

Bases: numpy.lib.user_array.container

Inheritance diagram of PyPop.utils.StringMatrix

Matrix of strings and other metadata from input file to PyPop.

StringMatrix is a subclass of NumPy’s numpy.lib.user_array class, store the data in an efficient array format, using NumPy-style access.

Parameters:
  • rowCount (int) – number of rows in matrix

  • colList (list) – list of locus keys in a specified order

  • extraList (list) – other non-matrix metadata

  • colSep (str) – column separator

  • headerLines (list) – list of lines in the header of original file

dump(locus=None, stream=sys.stdout)#

Write file to a stream in original format.

Parameters:
copy()#

Make a (deep) copy.

Returns:

a deep copy of the current object

Return type:

StringMatrix

getNewStringMatrix(key)#

Create new StringMatrix containing specified loci.

Note

The format of the keys is identical to __getitem__() except that it returns a full StringMatrix instance which includes all metadata

Parameters:

key (str) – a string representing the loci, using the locus1:locus2 format

Returns:

full instance

Return type:

StringMatrix

Raises:

KeyError – if locus can not be found.

getUniqueAlleles(key)#

Get naturally sorted list of unique alleles.

Parameters:

key (str) – loci to get

Returns:

list of unique integers sorted by allele name using

natural sort

Return type:

list

convertToInts()#

Convert the matrix to integers.

Note

This function is used by the PyPop.Haplo.Haplostats class. Note that integers start at 1 for compatibility with haplo-stats module

Returns:

matrix where the original allele names are now represented by integers

Return type:

StringMatrix

countPairs()#

Count all possible pairs of haplotypes for each matrix row.

Warning

This does not do any involved handling of missing data as per geno.count.pairs from R haplo.stats module.

Returns:

each element is the number of pairs in row order

Return type:

list

flattenCols()#

Flatten columns into a single list.

Important

Currently assumes entries are integers.

Returns:

all alleles, the two genotype columns concatenated

for each locus

Return type:

list

filterOut(key, blankDesignator)#

Get matrix rows filtered by a designator.

Parameters:
  • key (str) – locus to filter

  • blankDesignator (str) – string to exclude

Returns:

the rows of the matrix that do not contain blankDesignator at any rows

Return type:

list

getSuperType(key)#

Get a matrix grouped by specified key.

Example

Return a new matrix with the column vector with the alleles for each genotype concatenated like so:

>>> matrix = StringMatrix(2, ["A", "B"])
>>> matrix[0, "A"] = ("A01", "A02")
>>> matrix[1, "A"] = ("A11", "A12")
>>> matrix[0, "B"] = ("B01", "B02")
>>> matrix[1, "B"] = ("B11", "B12")
>>> print(matrix)
StringMatrix([['A01', 'A02', 'B01', 'B02'],
       ['A11', 'A12', 'B11', 'B12']], dtype=object)
>>> matrix.getSuperType("A:B")
StringMatrix([['A01:B01', 'A02:B02'],
       ['A11:B11', 'A12:B12']], dtype=object)
Parameters:

key (str) – loci to group

Returns:

a new matrix with the columns concatenated

Return type:

StringMatrix

class Group(li, size)#

Group list or sequence into non-overlapping chunks.

Example

>>> for pair in Group('aabbccddee', 2):
...    print(pair)
...
aa
bb
cc
dd
ee
>>> a = Group('aabbccddee', 2)
>>> a[0]
'aa'
>>> a[3]
'dd'
Parameters:
  • li (str|list) – string or list

  • size (int) – size of grouping

class OrderedDict(hash=None)#

A dictionary class with ordered pairs.

Deprecated since version 1.3.1: Will be removed in a later release, to be replaced by internal Python version

Creates an ordered dict.

index(key)#

Returns position of key in dict.

keys()#

Returns list of keys in dict.

values()#

Returns list of values in dict.

items()#

Returns list of tuples of keys and values.

insert(i, key, value)#

Inserts a key-value pair at a given index.

remove(i)#

Removes a key-value pair from the dict.

reverse()#

Reverses the order of the key-value pairs.

sort(cmp=0)#

Sorts the dict (allows for sort algorithm).

clear()#

Clears all the entries in the dict.

copy()#

Makes copy of dict, also of OrderdDict class.

get(key)#

Returns the value of a key.

has_key(key)#

Looks for existence of key in dict.

update(dict)#

Updates entries in a dict based on another.

count(key)#

Finds occurrences of a key in a dict (0/1).

class Index(i=0)#

Returns an Index object for OrderedDict.

Deprecated since version 1.3.1: Will be removed in a later release, to be replaced by internal Python version

critical_exit(message, *args)#

Log a CRITICAL message and exit with status 1.

Added in version 1.4.0.

Parameters:

message (str) – Logging format string.

getStreamType(stream)#

Get the type of stream.

Parameters:

stream (TextOutputStream|XMLOutputStream) – stream to check

Returns:

either xml or text.

Return type:

string

glob_with_pathlib(pattern)#

Use globbing with pathlib.

Parameters:

pattern (str) – globbing pattern

Returns:

of pathlib globs

Return type:

list

natural_sort_key(s, _nsre=re.compile('([0-9]+)'))#

Generate a key for natural (human-friendly) sorting.

This function splits a string into text and number components so that numbers are compared by value instead of lexicographically. It is intended for use as the key function in list.sort() or sorted().

Example

>>> items = ["item2", "item10", "item1"]
>>> sorted(items, key=natural_sort_key)
['item1', 'item2', 'item10']
Parameters:
  • s (str) – The string to split into text and number components.

  • _nsre (Pattern) – Precompiled regular expression used internally to split the string into digit and non-digit chunks. This is not intended to be overridden in normal use.

Returns:

A list of strings and integers to be used as a sort key.

Return type:

list

unique_elements(li)#

Gets the unique elements in a list.

Parameters:

li (list) – a list

Returns:

unique elements

Return type:

list

appendTo2dList(aList, appendStr=':')#

Append a string to each element in a list.

Parameters:
  • aList (list) – list to append to

  • appendStr (str) – string to append

Returns:

a list with string appended to each element

Return type:

list

convertLineEndings(file, mode)#

Convert line endings based on platform.

Parameters:
  • file (str) – file name to convert

  • mode (int) –

    Conversion mode, one of

    • 1 Unix to Mac

    • 2 Unix to DOS

fixForPlatform(filename, txt_ext=0)#

Fix for some Windws/MS-DOS platforms.

Parameters:
  • filename (str) – path to file

  • txt_ext (int, optional) – if enabled (1) add a .txt extension

copyfileCustomPlatform(src, dest, txt_ext=0)#

Copy file to file with fixes.

Parameters:
  • src (str) – source file

  • dest (str) – source file

  • txt_ext (int, optional) – if enabled (1) add a .txt extension

copyCustomPlatform(file, dist_dir, txt_ext=0)#

Copy file to directory with fixes.

Parameters:
  • file (str) – source file

  • dist_dir (str) – source directory

  • txt_ext (int, optional) – if enabled (1) add a .txt extension

checkXSLFile(xslFilename, path='', subdir='', abort=False, msg='')#

Check XSL filename and return full path.

Parameters:
  • xslFilename (str) – name of the XSL file

  • path (str) – root path to check

  • subdir (str) – subdirectory under path to check

  • abort (bool) – if enabled (True) file isn’t found, exit with an error. Default is False

  • msg (str) – output message on abort

Returns:

checked and validaated path

Return type:

str

getUserFilenameInput(prompt, filename)#

Get user filename input.

Read user input for a filename, check its existence, continue requesting input until a valid filename is entered.

Parameters:
  • prompt (str) – description of file

  • filename (str) – default filename

Returns:

name of file eventually selected

Return type:

str

splitIntoNGroups(alist, n=1)#

Divides a list up into n parcels (plus whatever is left over).

Example

>>> a = ['A', 'B', 'C', 'D', 'E']
>>> splitIntoNGroups(a, 2)
[['A', 'B'], ['C', 'D'], ['E']]
Parameters:
  • alist (list) – list to divide up

  • n (int) – parcel size

Returns:

list of lists

Return type:

list