Skip to content

fread(logicalYN=TRUE) could avoid detecting 'y'/'n' in header detection? #6643

Open
@MichaelChirico

Description

@MichaelChirico

Follow-up to #4564. There we set logicalYN=FALSE by default for back-compatibility because currently auto-detection of header has a bad interaction with logicalYN -- fread finds y in the header row and thinks "that's non-character data, namely, TRUE", thereby concluding header=FALSE. See the tests in that PR.

Ideally we can disable this parser for the header, but I don't think we have any logic for subsetting the valid parsers for the header only.

We'd also want to think through if doing so will create any undesirable edge cases. The only thing I can think of is something pathological like a file where header=FALSE that consists only of columns named y or n, then we might incorrectly detect header=TRUE based on the string-only data in the first row. AFAICT any realistic example where there are other columns would do fine. But perhaps we can also revisit the header detection logic -- I might think something like "do type detection on row 1; then do type detection on samples of rows 2...N, and compare the inferred types" would be fine.

Anyway, if we can remove these false positives of detecting y --> TRUE, I think we could switch to logicalYN=TRUE by default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions