Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear documentation for getstring #119

Open
KristofferC opened this issue Jun 14, 2022 · 6 comments
Open

Unclear documentation for getstring #119

KristofferC opened this issue Jun 14, 2022 · 6 comments

Comments

@KristofferC
Copy link
Contributor

Here is a part of the docs for getstring

If the actual parsed String is needed, however, you can pass your source and the res.val::PosLen to Parsers.getstring to get the actual parsed String value.

But getstring takes 3 arguments, the last one which is called e. I don't see any description of e in the docstring and how it should be used. Looking at the tests, it seems that 0x00 is passed in.

@nickrobinson251
Copy link
Collaborator

I think it is the "escapechar", so usually you'd pass opts::Options.e

here's how Options documents that:

* `escapechar='"'`: an ascii character used to "escape" a `closequotechar` within a quoted field

As you say, we should improve the docs!

@KristofferC
Copy link
Contributor Author

I think it is the "escapechar", so usually you'd pass opts::Options.e

Hm, I have no Options struct. I just parsed as:

julia> str = "\"foo\" # comment"
"\"foo\" # comment"

julia> res = Parsers.xparse(String, str)
Parsers.Result{PosLen}(-32603, 15, PosLen(0x0000000000200003))

julia> Parsers.getstring(str, res.val, 0x00) # what should go as third arg?
"foo"

@nickrobinson251
Copy link
Collaborator

ah, i see. I'd strongly recommend explicitly creating an Options and passing it around if calling xparse multiple times, otherwise a new Options object gets constructed on each call.

Anyway, I believe for getstring to return the expected result (if an escapechar was parsed) you need to pass the same e (third arg) as was used by the xparse call. I suppose tests use 0x00 since any value is fine if you know no escapechar was parsed.

The default for xparse/Options is e=UInt8('"'), so when not specifying a different escapechar, i'd say pass UInt8('"') to getstring.

Arguably the e argument of getstring should have a default matching the xparse/Options default. But i don't think the 2-arg xparse function is really recommended, which is probably why no corresponding 2-arg getstring function exists.

Also, is "foo" the result you expected here? That xparse call is using quoted=true. If you wanted "\"foo\" # comment" you may want to use quoted=false

@KristofferC
Copy link
Contributor Author

Also, is "foo" the result you expected here? That xparse call is using quoted=true. If you wanted ""foo" # comment" you may want to use quoted=false

Actually, that is not correct, the default is quoted=false. So this parsed until the end of line and returns an error code.

julia> Parsers.Options().quoted
false

julia> res = Parsers.xparse(String, str)
Parsers.Result{PosLen}(-32603, 15, PosLen(0x0000000000200003))

julia> Parsers.text(res.code)
"encountered an opening quote character, initial value parsing succeeded, reached EOF, invalid delimiter"

julia> res = Parsers.xparse(String, str, 1, sizeof(str), Parsers.Options(; quoted=true))
Parsers.Result{PosLen}(5, 6, PosLen(0x0000000000200003))

julia> Parsers.text(res.code)
"encountered an opening quote character, initial value parsing succeeded"

@nickrobinson251
Copy link
Collaborator

nickrobinson251 commented Jun 15, 2022

yikes, we have too many different xparse methods that set different defaults.

julia> Parsers.xparse(String, str)  # == Parsers.xparse(String, str; quoted=true)
options.quoted = true
Parsers.Result{PosLen}(-32603, 15, PosLen(0x0000000000200003))

julia> Parsers.xparse(String, str, 1, sizeof(str))
options.quoted = true
Parsers.Result{PosLen}(-32603, 15, PosLen(0x0000000000200003))

julia> Parsers.xparse(String, str, 1, sizeof(str), Parsers.Options())
options.quoted = false
Parsers.Result{PosLen}(33, 15, PosLen(0x000000000010000f))

julia> Parsers.xparse(String, str, 1, sizeof(str), Parsers.Options(; quoted=true))
options.quoted = true
Parsers.Result{PosLen}(5, 6, PosLen(0x0000000000200003))
  1. The first hits this, which passes quoted::Bool=true:

    Parsers.jl/src/Parsers.jl

    Lines 211 to 212 in e2259a6

    # for testing purposes only, it's much too slow to dynamically create Options for every xparse call
    function xparse(::Type{T}, buf::Union{AbstractVector{UInt8}, AbstractString, IO}; pos::Integer=1, len::Integer=buf isa IO ? 0 : sizeof(buf), sentinel=nothing, wh1::Union{UInt8, Char}=UInt8(' '), wh2::Union{UInt8, Char}=UInt8('\t'), quoted::Bool=true, openquotechar::Union{UInt8, Char}=UInt8('"'), closequotechar::Union{UInt8, Char}=UInt8('"'), escapechar::Union{UInt8, Char}=UInt8('"'), ignorerepeated::Bool=false, ignoreemptylines::Bool=false, delim::Union{UInt8, Char, PtrLen, AbstractString, Nothing}=UInt8(','), decimal::Union{UInt8, Char}=UInt8('.'), comment=nothing, trues=nothing, falses=nothing, dateformat::Union{Nothing, String, Dates.DateFormat}=nothing, debug::Bool=false, stripwhitespace::Bool=false, stripquoted::Bool=false) where {T}

  2. The second hits this, which uses Parsers.XOPTIONS

    Parsers.jl/src/Parsers.jl

    Lines 217 to 218 in e2259a6

    xparse(::Type{T}, buf::AbstractString, pos, len, options=Parsers.XOPTIONS) where {T} =
    xparse(T, codeunits(buf), pos, len, options)

    • and XOPTIONS has quoted=true
      const XOPTIONS = Options(missing, UInt8(' '), UInt8('\t'), UInt8('"'), UInt8('"'), UInt8('"'), UInt8(','), UInt8('.'), nothing, nothing, nothing, false, false, nothing, true, false, false)
    • (aside: Why does XOPTIONS exist?)
  3. The third hits the same method as 2, but passing in Parsers.Options() which has quoted=false:

    quoted::Bool=false,

  4. The fourth hits the same method as 2/3, but passes in Parsers.Options(; quoted=true) to set that explicitly ...but returns a different answer to 1 (xparse(String, str; quoted=true)), because Options() defaults to delim=nothing whereas 1 sets delim=UInt8(',')

    delim::Union{Nothing, UInt8, Char, String}=nothing,

This is now very off-topic from your original issue (sorry), so can move it to a new issue, but i think perhaps we could simplify the xparse interface to make this whole thing a little less confusing / more explicit.

I think i'd be in favour of requiring a user-given ::Options argument.

cc @quinnj


EDIT: opened #124

@quinnj
Copy link
Member

quinnj commented Jun 15, 2022

0x00 is fine to use as the 3rd argument if you're not dealing with quoted/escaped characters at all. Basically any byte that shouldn't occur in your string. Parsers.getstring just checks if the PosLen is escaped and, if so, will do the unescaping for you based on that 3rd argument, so if you know the PosLen will never be escaped (i.e. you're not using quote or escape chars), then it's fine to just ignore all of that.

@nickrobinson251, yeah, the xparse methods have kind of grown into a bit of a spaghetti mess; they were each born from different use-cases/motivations, but I agree it's gotten a bit intractable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants