Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DateTime incoherent value #649

Open
eJJonatah opened this issue Aug 18, 2023 · 9 comments · May be fixed by #650
Open

DateTime incoherent value #649

eJJonatah opened this issue Aug 18, 2023 · 9 comments · May be fixed by #650
Labels
Milestone

Comments

@eJJonatah
Copy link

eJJonatah commented Aug 18, 2023

I'm using the ExcelDataReader library to read an excel file that comes from an API web request

i read the content as Stream and pass it into the ExcelReaderFactory.CreateReader.
All works really well but, in the target table there is a date column that is being read with the wrong value.
The column is in the positon 14 of the table and is the only date column in that table so it cannot be mispositioning.
All The values from that column start in year 2023 and represent deadlines for orders. For some reason, whenever i try to
read the value in the IExcelDataReader i allways receive the value 31/08/2019, allways. I tried debugging into the library but
i couldn't find the reason why this is happening in this case. I have a copy of that table and can ensure that none of the orders have their deadline prior to 2021.

	PendingInfo[] pendings;
        using (pendingTitlesReport )// the httpresponse message
        using (var reportbin = await pendingTitlesReport.Content.ReadAsStreamAsync())
        using (var filereader = ExcelReaderFactory.CreateReader(reportbin))
        {
            pendings = new PendingInfo[filereader.RowCount - 1];
            filereader.Read(); // start the reading
            for (int index = 0; filereader.Read();)
            {
                pendings[index++] = new PendingInfo
                {
                    EndOfCteCode = (string)filereader[EndCTEColumn],
                    BruteValue = (int)Convert.ChangeType(filereader[BruteValueColumn], typeof(int)),
                    ExpiringDate = DateOnly.FromDateTime(((DateTime)filereader[ExpiringDateColumn]))
                };
            }
        }

debugging:
image
logreport
dates

any idea what could be causing this problem?

@andersnm
Copy link
Collaborator

Hi @eJJonatah, can you share a file having this problem?

(If not, and its xlsx, can you unzip it and paste some examples what those dates look like in the sheet xml?)
(If not, and its xls or xlsb, its going to be difficult to help without a file to repro)

@eJJonatah
Copy link
Author

eJJonatah commented Aug 19, 2023

Hi @andersnm, thank you for your response.
I downloaded the same file and read it in my pc and got everything right
image
I'm thinking that something can be messed up because its a web request, i can test transfering those bytes to a file then reading the file but only at monday did that kind of problem ever occured?

@eJJonatah
Copy link
Author

eJJonatah commented Aug 21, 2023

when i tested in my computer the dates returned right, what I realized is in order to transfer this file to my computer I had to save it using Excel.

The stream from the request has ~2079 bytes while the (saved by Excel) file has ~5569 so I compared the byte structure in both files in notepad++ and tested a lot to ensure that the original version in fact never returns the dates right (every other data is correct) but the modified one is perfectly fine. I tested this behavior in 3 machines and the results where the same.

The request stream is probably from a sql functions that converts a query into an excel report. Because the end of the file is signed as "Implementation by Anton Scheffer"

an excel save solve its all. I will try to edit the binaries to remove sensitive data and provide it and provide the raw file here, you can access it through excel but, cannot save.

@andersnm
Copy link
Collaborator

@eJJonatah It might be premature to close this. If the original file opens correctly in Excel, then one could argue it should read in ExcelDataReader too. The key facts so far are:

  • the excel file contains the binary string "Implementation by Anton Scheffer"
  • the date discrepancy is exactly the result if the "1904 Date" flag was missing or wrong

Does not seem to be any Excel writer repos in https://github.com/antonscheffer but found this in a different repo
https://github.com/sokolsaiti/as_xlsx/blob/master/as_xlsx.sql

The problem can likely be reproduced using the original library, whatever language it is (PL/SQL for Oracle??). It sounds like ExcelDataReader cannot read its output "date1904" workbook flag for some reason:

https://github.com/sokolsaiti/as_xlsx/blob/f268ec3bce9cd44aac600c6f183fc9bc4568fdbb/as_xlsx.sql#L1742

@andersnm
Copy link
Collaborator

Well wouldn't you know. Here's the code that parses the 1904 flag:

bool date1904 = Reader.GetAttribute("date1904") == "1";
yield return new WorkbookPrRecord(date1904);

vs the code that generates the 1904 flag:

https://github.com/sokolsaiti/as_xlsx/blob/f268ec3bce9cd44aac600c6f183fc9bc4568fdbb/as_xlsx.sql#L1742

<workbookPr date1904="true" defaultThemeVersion="124226"/>

Took a while to notice, but indeed 1 is not equal to true, so there's the bug.

@andersnm andersnm reopened this Aug 22, 2023
@eJJonatah
Copy link
Author

so the function by schaffer sets the flag to a true literall and the library was expecting a numeric representation of the boolean, this explains why the excel save solves it, since it would probably overwrite the "true" to the default from excel probably "1"

@eJJonatah
Copy link
Author

eJJonatah commented Aug 22, 2023

A possible solution for this would be to use C# 7 Pattern matching mdsn: source

bool date1904 = Reader.GetAttribute("date1904") is "1" or "true" or "True";
date1904 = Reader.GetAttribute("date1904").Trim().ToLower() is "1" or "true";
// if the literals are strongly patternized then
date1904 = Reader.GetAttribute("date1904") is "1" or "true";

does the library platform supports this feature? I'm not sure but i think its faster than calling equals

@andersnm
Copy link
Collaborator

A PR for this is welcome :) It'd be good to craft a couple .xlsx with different values for date1904 and reverse more precisely how Excel interprets the value, f.ex are only "1" and "true" valid options or can it be "yes" or "blawhabhah"; what is the default for invalid values, etc. And include the findings as tests and .xlsx in the PR. (I'm not familiar with pattern matching syntax so cannot vouch for the suggestions - the third one looks good as a quick fix, keep in mind the value can be null, so no unvalidated .Trim() etc )

@eJJonatah
Copy link
Author

eJJonatah commented Aug 22, 2023

   <workbookPr date1904="1" defaultThemeVersion="124226" />
   <!-- tested:
      <workbookPr date1904="1" defaultThemeVersion="124226" /> //works
      <workbookPr date1904="true" defaultThemeVersion="124226" /> //works
      <workbookPr date1904="True" defaultThemeVersion="124226" /> //doesn't work.
      <workbookPr date1904="TRUE" defaultThemeVersion="124226" /> //doens't work.
      <workbookPr date1904="blawhabhah" defaultThemeVersion="124226" /> //doens't work.
   -->

I created a test file to test the possibilities and found out that the only values that works are 1 and true
to activate the Date1904 behavior, so i cloned with a branch to apply the third suggestion in the OpenXmlReader
to handle this text boolean possibility.

eJJonatah added a commit to eJJonatah/ExcelDataReader that referenced this issue Aug 22, 2023
…rom xl/worbook.xml. The only two possible values that excel recognises are "true" and "1"

also, pattern matching is more readable than equals "==" aldo this can be easily changed if breaks the project pattern. ExcelDataReader#649 third solve suggestion
@appel1 appel1 added the bug label Aug 23, 2023
@appel1 appel1 added this to the 3.7 milestone Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants