API¶
-
textdata.
lines
(source, noblanks=True, dedent=True, lstrip=False, rstrip=True, expandtabs=False, cstrip=True, join=False)¶ Grab lines from a string. Discard initial and final lines if blank.
Parameters: - source (str|lines) – Text (or list of text lines) to be processed
- dedent (bool) – a common prefix should be stripped from each line (default True)
- noblanks (bool) – allow no blank lines at all (default True)
- lstrip (bool) – all left space be stripped from each line (default False); dedent and lstrip are mutually exclusive
- rstrip (bool) – all right space be stripped from each line (default True)
- expandtabs (Union[bool,int]) – should all tabs be expanded? if int, by how much?
- cstrip (bool) – strips comment strings from # to end of each line (like Python itself)
- join (bool|str) – if False, no effect; otherwise a string used to join the lines
Returns: a list of strings
Return type: list
-
textdata.
text
(source, **kwargs)¶ Like
lines()
, but returns result as unified text. Useful primarily because of the nice cleanupslines()
does.Parameters: - source (str|lines) – Text (or list of text lines) to be processed
- join (str) – String to join lines with. Typically newline for line-oriented text but change to ” ” for a single continous line.
Returns: the cleaned string
Return type: str
-
textdata.
textline
(source, cstrip=True)¶ Like
text()
, but returns result as unified string that is not line-oriented. Really a special case oftext()
Parameters: - source (str|list) –
- cstrip (bool) – Should comments be stripped? (default:
True
)
Returns: the cleaned string
Return type: str
-
textdata.
words
(source, cstrip=True)¶ Returns a sequence of words, like qw() in Perl. Similar to s.split(), except that it respects quoted spans for the occasional word (really, phrase) with spaces included.) Like
lines
, removes comment strings by default.Parameters: - source (str|list) – Text (or list of text lines) to gather words from
- cstrip (bool) – Should comments be stripped? (default:
True
)
Returns: list of words/phrases
Return type: list
-
textdata.
paras
(source, keep_blanks=False, join=False, cstrip=True)¶ Given a string or list of text lines, return a list of lists where each sub list is a paragraph (list of non-blank lines). If the source is a string, use
lines
to split into lines. Optionally can also keep the runs of blanks, and/or join the lines in each paragraph with a desired separator (likely a newline if you want to preserve multi-line structure in the resulting string, or ” ” if you don’t). Likewords
,lines
, andtextlines
, will also strip comments by default.Parameters: - source (str|list) – Text (or list of text lines) from which paras are to be gathered
- keep_blanks – Should internal blank lines be retained (default:
False
) - join (bool|str) – Should paras be joined into a string? (default:
False
). - cstrip (bool) – Should comments be stripped? (default:
True
)
Returns: list of strings (each a paragraph)
Return type: list
-
textdata.
attrs
(source, evaluate='natural', dict=<type 'dict'>, cstrip=True)¶ Parse attribute strings into a dict (or other mapping type). By default evaluates literals as natural to Python, e.g. turning what looks like numbers into into real
int
andfloat
instances, not just strings). Quoted values are always treated as strings, never evaluated.Parameters: - source (Union[str, List[str]]) – Text to parse (as string or list of lines)
- evaluate (Union[str, bool]) – How to evaluate resulting values
- dict (type) – Type of mapping to return
- cstrip (bool) – Remove comments from string before interpretation?
- astyle – Deprecated. Use
dict
parameter instead. - literal – Deprecated. Use
evaluate
parameter instead.
Returns: dict (or given dict type)
-
class
textdata.
Dict
(*args, **kwargs)¶ Attribute-accessible
dict
subclass. Does whateverdict
does, but its keys accessible via .attribute notation. Provided as a convenience. In future, will use the inherently ordered items.Item instead. It is more robust and complete, though only supporting Python 2 at the moment. But if you’re on Python 3,Items
recommended overDict
.
-
textdata.
table
(source, header=None, evaluate=True, cstrip=True)¶ Return a list of lists representing a table.
Parameters: - source (Union[str, List[str]]) – Text to parse (as string or list of lines)
- header (Union[str, List, None]) – Header for the table
- evaluate (Union[str, function, None]) – Indicates how to post-process table cells. By default, True or “natural” means as Python literals. Other options are False or ‘minimal’ (just string trimming), or None or ‘none’. Can also provide a custom function.
- cstrip (bool) – strip comments?
Returns: List of lists, where each inner list represents a row.
-
textdata.
records
(source, dict=<class 'textdata.attrs.Dict'>, keyclean=<function keyclean>, **kwargs)¶ Alternate table parser. Renders not a list of lists, but a list of attribute-accessible Dict (dict subclasses).
Parameters: - source (Union[str, List[str]]) – Text to parse (as string or list of lines)
- dict (type) – dictionary subtype in which to return results
- keyclean (Union[Function, None]) – function to clean table headers into more suitable dictionary keys
- **kwargs – All other kw args passed to textdata.table
Returns: list of dictionaries, one per non-header row