Tables¶
Much data comes in tabular format. The table()
and records()
functions help you extract it in convenient ways…either as a list
of lists, or as a list of dictionaries.
>>> text = """
... name age strengths
... ---- --- ---------------
... Joe 12 woodworking
... Jill 12 slingshot
... Meg 13 snark, snapchat
... """
>>> table(text)
[['name', 'age', 'strengths'],
['Joe', 12, 'woodworking'],
['Jill', 12, 'slingshot'],
['Meg', 13, 'snark, snapchat']]
>>> records(text)
[{'name': 'Joe', 'age': 12, 'strengths': 'woodworking'},
{'name': 'Jill', 'age': 12, 'strengths': 'slingshot'},
{'name': 'Meg', 'age': 13, 'strengths': 'snark, snapchat'}]
The table()
function returns a list of lists, while the records()
function uses the table header as keys and returns a list of dictionaries.
table()
and records()
work even if you have a lot of extra fluff:
>>> fancy = """
... +------+-----+-----------------+
... | name | age | strengths |
... +------+-----+-----------------+
... | Joe | 12 | woodworking |
... | Jill | 12 | slingshot |
... | Meg | 13 | snark, snapchat |
... +------+-----+-----------------+
... """
>>> assert table(text) == table(fancy)
>>> assert records(text) == records(fancy)
The parsing algorithm is heuristic, but works well with tables formatted in a variety of conventional ways including Markdown, RST, ANSI/Unicode line drawing characters, plain text columns and borders, …. See the table tests for dozens of samples of formats that work.
What constitutes table columns are contiguous bits of text, without intervening
whitespace. Typographic “rivers” of whitespace define column breaks. For this
reason, it’s recommended that every table column have a separator line, usually
consistng of '-'
, '='
, or Unicode box drawing characters, to control
column width.
If there are '#'
characters in your table data, best to pass
cstrip=False
so that they will not be erroneously interpreted as comments.
Records and Keys¶
Records depends on there being a header row available.
Many tables use natural language headers, such as First Name
and Item Price
.
When retrieving records (dicts), this is not impossible, but it’s often also not
entirely convenient–especially for attribute-accessible dictionary keys. So records()
provides a keyclean
feature that passes each key through a cleanup function.
By default whitespace at the start and end of the key are removed, multiple interior
whitespace characters are collapsed and replaced with underscore characters (_).
You can provide your own custom keyclean function if you like, or None
if you
like your keys as-is.