You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Follow-up to #4564. There we set logicalYN=FALSE by default for back-compatibility because currently auto-detection of header has a bad interaction with logicalYN -- fread finds y in the header row and thinks "that's non-character data, namely, TRUE", thereby concluding header=FALSE. See the tests in that PR.
Ideally we can disable this parser for the header, but I don't think we have any logic for subsetting the valid parsers for the header only.
We'd also want to think through if doing so will create any undesirable edge cases. The only thing I can think of is something pathological like a file where header=FALSE that consists only of columns named y or n, then we might incorrectly detect header=TRUE based on the string-only data in the first row. AFAICT any realistic example where there are other columns would do fine. But perhaps we can also revisit the header detection logic -- I might think something like "do type detection on row 1; then do type detection on samples of rows 2...N, and compare the inferred types" would be fine.
Anyway, if we can remove these false positives of detecting y --> TRUE, I think we could switch to logicalYN=TRUE by default.
The text was updated successfully, but these errors were encountered:
19.IntheNEWSfor v1.11.0 (May2018), section"NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES"item2, westatedtheintentiontoeventuallychange`logical01`tobe`TRUE`bydefault.Aftersomeconsideration, reflection, andcommunityinput, wehavedecidednottomoveforwardwiththisplan, i.e., `logical01`willremain`FALSE`bydefaultinboth`fread()`and`fwrite()`.Seediscussionin#5856; most importantly, we think changing the default would be a major disruption to reading "sharded" CSVs where data with the same schema is split into many files, some of which could be converted to `logical` while others remain `integer`. We will retain the option `datatable.logical01` for users who wish to use a different default -- for example, if you are doing input/output on tables with a large number of logical columns, where writing '0'/'1' to the CSV many millions of times is preferable to writing 'TRUE'/'FALSE'.
So we probably shouldn't plan to switch the default. Though we still have some vestiges of that discussion to remove from the docs...
Follow-up to #4564. There we set
logicalYN=FALSE
by default for back-compatibility because currently auto-detection of header has a bad interaction withlogicalYN
--fread
findsy
in the header row and thinks "that's non-character data, namely,TRUE
", thereby concludingheader=FALSE
. See the tests in that PR.Ideally we can disable this parser for the header, but I don't think we have any logic for subsetting the valid parsers for the header only.
We'd also want to think through if doing so will create any undesirable edge cases. The only thing I can think of is something pathological like a file where
header=FALSE
that consists only of columns namedy
orn
, then we might incorrectly detectheader=TRUE
based on the string-only data in the first row. AFAICT any realistic example where there are other columns would do fine. But perhaps we can also revisit the header detection logic -- I might think something like "do type detection on row 1; then do type detection on samples of rows 2...N, and compare the inferred types" would be fine.Anyway, if we can remove these false positives of detecting
y --> TRUE
, I think we could switch tologicalYN=TRUE
by default.The text was updated successfully, but these errors were encountered: