fread(logicalYN=TRUE) could avoid detecting 'y'/'n' in header detection?

Follow-up to #4564. There we set `logicalYN=FALSE` by default for back-compatibility because currently auto-detection of header has a bad interaction with `logicalYN` -- `fread` finds `y` in the header row and thinks "that's non-character data, namely, `TRUE`", thereby concluding `header=FALSE`. See the tests in that PR.

Ideally we can disable this parser for the header, but I don't think we have any logic for subsetting the valid parsers for the header only.

We'd also want to think through if doing so will create any undesirable edge cases. The only thing I can think of is something pathological like a file where `header=FALSE` that consists _only_ of columns named `y` or `n`, then we might incorrectly detect `header=TRUE` based on the string-only data in the first row. AFAICT any realistic example where there are other columns would do fine. But perhaps we can also revisit the header detection logic -- I might think something like "do type detection on row 1; then do type detection on samples of rows 2...N, and compare the inferred types" would be fine.

Anyway, if we can remove these false positives of detecting `y --> TRUE`, I think we could switch to `logicalYN=TRUE` by default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fread(logicalYN=TRUE) could avoid detecting 'y'/'n' in header detection? #6643

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fread(logicalYN=TRUE) could avoid detecting 'y'/'n' in header detection? #6643

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions