-
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose document_stream interface #71
Comments
Note that as of the upcoming 1.0 release, the On Demand API will support ndjson/stream parsing (sequences of JSON documents). |
How stable is the on-demand API now? I've avoided it so far due to a few issues:
|
The benefit of the On Demand approach is that you bypass entirely the C++ DOM and you just build up your own data structure directly. So instead of doing It is not a replacement of the DOM approach. We will continue to support the DOM approach forever I expect. I am not urging you to use On Demand. I do think that it is something you should seriously examine eventually. If you do, you will get help from us.
There were a few issues, yes. They have all been fixed quickly. The current code appears quite robust. It is a new approach, so it required a lot more work to get it to a solid state. It is also more challenging for fundamental reasons because the user has more power. But I think we are there (hence the 1.0 status). We will have an R wrapper. And we have extensive tests.
I am not sure that this is true. In both instances, we support streaming JSON data (streams of JSON documents, ndjson). The indexing phase (stage 1) in chunks would be easier to do with the DOM API since we are in full control (from within simdjson). It is more of a challenge with On Demand since the user is in control. You can move through the document (including moving back) and you can imagine the problems that could emerge.
You can use On Demand in a DOM-like manner. void recursive_print_json(ondemand::value element) {
bool add_comma;
switch (element.type()) {
case ondemand::json_type::array:
cout << "[";
add_comma = false;
for (auto child : element.get_array()) {
if (add_comma) {
cout << ",";
}
// We need the call to value() to get
// an ondemand::value type.
recursive_print_json(child.value());
add_comma = true;
}
cout << "]";
break;
case ondemand::json_type::object:
cout << "{";
add_comma = false;
for (auto field : element.get_object()) {
if (add_comma) {
cout << ",";
}
// key() returns the key as it appears in the raw
// JSON document, if we want the unescaped key,
// we should do field.unescaped_key().
cout << "\"" << field.key() << "\": ";
recursive_print_json(field.value());
add_comma = true;
}
cout << "}\n";
break;
case ondemand::json_type::number:
// assume it fits in a double
cout << element.get_double();
break;
case ondemand::json_type::string:
// get_string() would return escaped string, but
// we are happy with unescaped string.
cout << "\"" << element.get_raw_json_string() << "\"";
break;
case ondemand::json_type::boolean:
cout << element.get_bool();
break;
case ondemand::json_type::null:
cout << "null";
break;
}
}
void basics_treewalk() {
padded_string json = R"( [
{ "make": "Toyota", "model": "Camry", "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] },
{ "make": "Kia", "model": "Soul", "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
{ "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
] )"_padded;
ondemand::parser parser;
ondemand::document doc = parser.iterate(json);
ondemand::value val = doc;
recursive_print_json(val);
std::cout << std::endl;
} |
@TkTech Let us rule out a scenario: the Python programmer uses directly the On Demand API. Though that's possible, I suspect that it would not be performant since each call would have to cross the language barrier. |
Prerequisite to unblock this done in #110. |
The
pysimdjson
library could support our document_stream interface (parse_many
function). It is well tested as of release 0.7 (with fuzz testing) and works well today. It supports streams of indefinite size.See https://github.com/simdjson/simdjson/blob/master/doc/parse_many.md
Related to #70
The text was updated successfully, but these errors were encountered: