API

textdata.lines(source, noblanks=True, dedent=True, lstrip=False, rstrip=True, expandtabs=False, cstrip=True, join=False)

Grab lines from a string. Discard initial and final lines if blank.

Parameters:
  • source (str|lines) – Text (or list of text lines) to be processed
  • dedent (bool) – a common prefix should be stripped from each line (default True)
  • noblanks (bool) – allow no blank lines at all (default True)
  • lstrip (bool) – all left space be stripped from each line (default False); dedent and lstrip are mutually exclusive
  • rstrip (bool) – all right space be stripped from each line (default True)
  • expandtabs (Union[bool,int]) – should all tabs be expanded? if int, by how much?
  • cstrip (bool) – strips comment strings from # to end of each line (like Python itself)
  • join (bool|str) – if False, no effect; otherwise a string used to join the lines
Returns:

a list of strings

Return type:

list

textdata.text(source, **kwargs)

Like lines(), but returns result as unified text. Useful primarily because of the nice cleanups lines() does.

Parameters:
  • source (str|lines) – Text (or list of text lines) to be processed
  • join (str) – String to join lines with. Typically newline for line-oriented text but change to ” ” for a single continous line.
Returns:

the cleaned string

Return type:

str

textdata.textline(source, cstrip=True)

Like text(), but returns result as unified string that is not line-oriented. Really a special case of text()

Parameters:
  • source (str|list) –
  • cstrip (bool) – Should comments be stripped? (default: True)
Returns:

the cleaned string

Return type:

str

textdata.words(source, cstrip=True, sep=None)

Returns a sequence of words, like qw() in Perl. Similar to s.split(), except that it respects quoted spans for the occasional word (really, phrase) with spaces included.) If the sep argument is provided, words are split on that boundary (rather like str.split()). Either the standard space and possibly-quoted word behavior should be used, or the explicit separator. They don’t cooperate well.

Like lines, removes comment strings by default.

Parameters:
  • source (str|list) – Text (or list of text lines) to gather words from
  • cstrip (bool) – Should comments be stripped? (default: True)
  • sep (Optional[str]) – Optional explicit separator.
Returns:

list of words/phrases

Return type:

list

textdata.paras(source, keep_blanks=False, join=False, cstrip=True)

Given a string or list of text lines, return a list of lists where each sub list is a paragraph (list of non-blank lines). If the source is a string, use lines to split into lines. Optionally can also keep the runs of blanks, and/or join the lines in each paragraph with a desired separator (likely a newline if you want to preserve multi-line structure in the resulting string, or ” ” if you don’t). Like words, lines, and textlines, will also strip comments by default.

Parameters:
  • source (str|list) – Text (or list of text lines) from which paras are to be gathered
  • keep_blanks – Should internal blank lines be retained (default: False)
  • join (bool|str) – Should paras be joined into a string? (default: False).
  • cstrip (bool) – Should comments be stripped? (default: True)
Returns:

list of strings (each a paragraph)

Return type:

list

textdata.attrs(source, evaluate='natural', dict=<type 'dict'>, cstrip=True)

Parse attribute strings into a dict (or other mapping type). By default evaluates literals as natural to Python, e.g. turning what looks like numbers into into real int and float instances, not just strings). Quoted values are always treated as strings, never evaluated.

Parameters:
  • source (Union[str, List[str]]) – Text to parse (as string or list of lines)
  • evaluate (Union[str, bool]) – How to evaluate resulting values
  • dict (type) – Type of mapping to return
  • cstrip (bool) – Remove comments from string before interpretation?
  • astyle – Deprecated. Use dict parameter instead.
  • literal – Deprecated. Use evaluate parameter instead.
Returns:

dict (or given dict type)

class textdata.Dict(*args, **kwargs)

Attribute-accessible dict subclass. Does whatever dict does, but its keys accessible via .attribute notation. Provided as a convenience. In future, will use the inherently ordered items.Item instead. It is more robust and complete, though only supporting Python 2 at the moment. But if you’re on Python 3, Items recommended over Dict.

textdata.table(source, header=None, evaluate=True, cstrip=True)

Return a list of lists representing a table.

Parameters:
  • source (Union[str, List[str]]) – Text to parse (as string or list of lines)
  • header (Union[str, List, None]) – Header for the table
  • evaluate (Union[str, function, None]) – Indicates how to post-process table cells. By default, True or “natural” means as Python literals. Other options are False or ‘minimal’ (just string trimming), or None or ‘none’. Can also provide a custom function.
  • cstrip (bool) – strip comments?
Returns:

List of lists, where each inner list represents a row.

textdata.records(source, dict=<class 'textdata.attrs.Dict'>, keyclean=<function keyclean>, **kwargs)

Alternate table parser. Renders not a list of lists, but a list of attribute-accessible Dict (dict subclasses).

Parameters:
  • source (Union[str, List[str]]) – Text to parse (as string or list of lines)
  • dict (type) – dictionary subtype in which to return results
  • keyclean (Union[Function, None]) – function to clean table headers into more suitable dictionary keys
  • **kwargs – All other kw args passed to textdata.table
Returns:

list of dictionaries, one per non-header row