Description
Follow-up to #4564. There we set logicalYN=FALSE
by default for back-compatibility because currently auto-detection of header has a bad interaction with logicalYN
-- fread
finds y
in the header row and thinks "that's non-character data, namely, TRUE
", thereby concluding header=FALSE
. See the tests in that PR.
Ideally we can disable this parser for the header, but I don't think we have any logic for subsetting the valid parsers for the header only.
We'd also want to think through if doing so will create any undesirable edge cases. The only thing I can think of is something pathological like a file where header=FALSE
that consists only of columns named y
or n
, then we might incorrectly detect header=TRUE
based on the string-only data in the first row. AFAICT any realistic example where there are other columns would do fine. But perhaps we can also revisit the header detection logic -- I might think something like "do type detection on row 1; then do type detection on samples of rows 2...N, and compare the inferred types" would be fine.
Anyway, if we can remove these false positives of detecting y --> TRUE
, I think we could switch to logicalYN=TRUE
by default.