This grate repository will unbrielievably parse your CSV string and return a Python dictionary with headers and rows of data. You cheddar believe I put some gouda tests in here.
All cheesiness aside, the goal of this project is to allow the parsing of CSV files to stand alone as a separate service and remove the burden from the frontend code of Datasmith and later Wordsmith. This project will most likely expand to include other data formats.
Python 3.6+ is required. From repo root, run pip install -r requirements.txt
From repo root, run pytest
From repo root, run pylint parsemesan tests
From repo root, run coverage run -m pytest && coverage html
parse_data
- {str}:
source_type
the type of data source (arrays
,csv
,unicode
) - {*}:
input_data
CSV{bytes}
or Python{list}
or Python{str}
(which is Unicode)
- {dict}: Python object with following spec:
{ "errors": [ <dict> ] "result": { "headers": <list of {str}> "rows": <list of lists of {str}> } }
get_valid_formats
- No inputs required.
- {dict}: Python object with following spec:
{ "formats": <list of {str}> }
byte_string
: a raw Pythonbytes
array. Does not apply tosource_type='arrays'
.EncodingError
raised if not valid.stream
: a Python object that reads the data sequentially.FileTypeError
raised if not valid.data
: a Python dictionary with keysheaders
androws
(see below).DataError
raised if not valid.
Each pipeline combines a parser
and at least one validator
, as shown in the pipelines/
directory.
The input_data
is a Python {list}
of {lists}
containing the rows of data.
- Parse or reorganize the object into
data
. - Validate the
data
's rows and headers as proper tabular.
data
is returned as described above.
The input_data
is a Python {bytes}
object containing an encoded CSV table.
- Detect
input_data
encoding using thechardet
module. - Validate the
byte_string
's encoding by calling the native.decode(encoding)
function. - Convert it to a
stream
with the nativeioString
module. - Validate the
stream
's file type by eliminating other common possibilities of files (.html
,.xml
). - Parse the stream into
data
using the nativecsv
module. - Validate the
data
's rows and headers as proper tabular.
data
is returned as described above.
The input_data
is a Python {str}
object (which is Unicode in Python 3) containing an encoded CSV table.
- Convert the string to a
stream
with the nativeioString
module. - Validate the
stream
's file type by eliminating other common possibilities of files (.html
,.xml
). - Parse the stream into
data
using the nativecsv
module. - Validate the
data
's rows and headers as proper tabular.
data
is returned as described above.