-
Notifications
You must be signed in to change notification settings - Fork 953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get OA dates as double instead of converted to DateTime/TimeSpan? #545
Comments
Hi, This is something I've wanted myself for the ExcelNumberFormat project in order to properly support those invalid dates that Excel supports (f.ex January 0 1900 and February 29 1900). Although to be useful, the reader must also expose the "Is1904" flag how to interpret the double. However, in your case, is there any reason you cannot just use |
Yes, the I have some reasons for wanting the raw OA date. I have some tools built around this library and everything is converted to strings with the overridden and convenient Ok, so, admittedly the above situation does not preclude me from specifically detecting a Also, according to something a friend found, it appears that certain OA dates converted to So, hopefully, I have adequately described the reasons why simply retrieving the raw OA date and avoiding "filtering" it through a |
I still don't quite get it. It sounds like all your concerns are related to the default formatting provided by the .NET runtime. You would see the same behavior whether you get a raw double from ExcelDataReader or from ToOADate(). |
Have you looked at this section: https://github.com/ExcelDataReader/ExcelDataReader#formatting If you want to print a value as date-only, or time-only, you cannot rely on the raw values in the file. Neither Excel nor .NET knows how to print those without a number format. Thus you cannot use |
As per my previous post, the main takeaway should have been is that "filtering" the raw OA date double through a
Sure I can. And do correct me if my simple assumption here is wrong somehow. Inspect the double. Whole number part only? Pure-date. Fractional part only with leading The following data, code, and output, illustrates my point: data.xlsx:(I do not seem to be able to attach things so here are the three raw OA date doubles in the file)
Code:using ExcelDataReader;
using System;
using System.IO;
using System.Text;
namespace ExcelDataReaderTest
{
static class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello, ExcelDataReaderTest!");
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
var filePath = "..\\..\\..\\data\\data.xlsx";
var stream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
IExcelDataReader reader = ExcelReaderFactory.CreateReader(stream);
Console.WriteLine("Column | FieldType");
do
{
while (reader.Read())
{
for(var i = 0; i < reader.FieldCount; ++i)
{
var fieldType = reader.GetFieldType(i);
Console.WriteLine($"{i} {fieldType}");
if (fieldType.Name == "DateTime")
{
var dt = reader.GetDateTime(i);
var oa = dt.ToOADate();
Console.WriteLine($" {dt}");
Console.WriteLine($" {oa}");
}
}
}
} while (reader.NextResult());
reader.Dispose();
stream.Dispose();
}
}
} Ouput:
Do you see the issue? After "filtering" through the |
It is the number format that decides how values are printed in Excel. You can apply identical logic to a DateTime object: Time is midnight? Pure-date. Date-part is 1899-12-31? Pure-time. Both? Yes. Neither is correct, but gives identical results and kind of works. Regarding your netcore inconsistencies, here's some info about changes to formatting in .NET Core 3.0: As far as I can tell, having the OA date might simplify things for you, but it's not a blocker. |
I am sorry, but "kind of works" is not really good enough. Sure, looking for "1899-12-31" would be ok, but any recent date at exactly midnight, while improbable, is still very valid and therefore is inappropriate for doing a pure-date check. Thank you for the .NET Core 3.0 format changes info. Explains things, but I still have to deal with the aftermath. And dealing with that would be easier with access to the OA date. Look. You have made a great library here and it has made my life easier. Props. But I respectfully disagree that this is not blocking. It is for me. I am sure most users will use and appreciate the convenient internal conversion from OA date to |
There is no difference between a DateTime at midnight and a whole number. Technically they are both just numbers that require additional information through a number format to tell if its date-only, time-only or both. That's why I'm saying both approaches "kind of works" - but neither is correct. You must use the number format if correctness is a requirement. A raw OA date suffers exactly the same issue as a DateTime. If you want to contribute a PR for an option that returns OA dates from ExcelDataReader then that's ok. But it won't help to determine if a value is date-only, time-only or both in a 100% way. |
I don't think any of us is against providing this as a feature somehow. But just to make sure I understand. You want to differentiate between date and time by how it is stored as an OADate and not by how it is formatted to look in Excel? So if the OADate is |
Are you sure about that? I feel like the example I provided proves that you can indeed tell the difference with the OA date only, no additional number formats are required. |
Some time in the future I'd love to expose this as the new DateOnly and TimeOnly structs that will be introduced in .NET 6 instead. Perhaps something to take into consideration if/when we decide how to expose the raw OADates doubles. |
@appel1 Those Heck, while I would advocate for providing access to the double regardless, a |
Here's an example of the same values repeated, but with different number formats for date-only, time-only or both in each row: If we were to implement OA dates in ExcelDataReader, then all rows would return identical doubles in columns 2-4. You couldn't tell the "OA date" doubles apart from the "General" doubles without examining the number format. The underlying values are still "just doubles" with no intrinsic information about presentation. That's all in the number format. |
Oh, I misunderstood your original post about that. Good point. Another value in a enum DateTimeType
{
None, // NotADateTime
DateAndTime,
DateOnly,
TimeOnly
} |
@jbrockerville It is important to understand how number formats work when your requirement is to produce output similar to Excel. Feels like we're missing the full picture and only discussing details out of incorrect assumptions. You can derive a Let me just stress that the |
Hi,
Incredibly useful library! However, like some other users, I have a date-time issue. The ones I looked at, and mine, seem to stem from the fact that the worksheet implementations detect and convert OA dates to DateTime or TimeSpan objects. For example, the XlsxWorksheet. From there, the problems seem like issues with system region formatting and Excel formatting and their interplay. This was my issue too until I figured out that the library was doing a conversion before the system region formatting came into play. I worked through it by understanding it. Cool. But I have found instances where I simply do not want any conversion whatsoever. I want to take Excel's and the system's formatting completely out of the equation and work with the raw double. Unfortunately, there does not seem to be an option to simply not do this and return the raw OA date double value, or at least none that I could find. Can you do that? Just return the OA date double? If not, could you please add the option to do so?
Thanks!
-jb
The text was updated successfully, but these errors were encountered: