Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell::fromValue() incorrectly detect a formula #132

Open
geoffroyp opened this issue Jan 6, 2023 · 3 comments
Open

Cell::fromValue() incorrectly detect a formula #132

geoffroyp opened this issue Jan 6, 2023 · 3 comments
Labels
question Further information is requested

Comments

@geoffroyp
Copy link

geoffroyp commented Jan 6, 2023

Greetings beautiful people who made this bundle!

I've found today that Cell::fromValue() automatically consider any string starting by = as a formula.
While I totally understand why you did this, I believe it's not the way to handle it. I think that if people want to create a cell for a formula, they should use new FormulaCell() instead, since it's a more specific ude case.

Most people will use Cell::fromValue() with "basic" database data (string, int, date...). And an = sign could be used in many of "normal" field. For instance, in a phone number field, if someone miss the shift key to make a "+", it turns it into a "=" (yes, it actually happened to me).

But also in a "comment" field. Some people sometimes start their comment with a "=", for example when they explain something, or give the full meaning of an acronym, etc...

@Slamdunk Slamdunk added the question Further information is requested label Jan 9, 2023
@Slamdunk
Copy link
Contributor

Slamdunk commented Jan 9, 2023

Hi, thank you for raising this topic, but I disagree with your PoV on the topic: = is a well enstablished standard for starting a formula and I'm pretty sure 100% of the user base of this library are aware of that.

It would be a PITA for everyone having to explicitely call FormulaCell for formula cells: explicitness must comes the other way around, for non formulas starting with =

@geoffroyp
Copy link
Author

geoffroyp commented Jan 9, 2023

I agree with you that = is a well established standard, but what I am trying to say is that you never can control what users insert in any field, especially string fields. Therefore, they can start their string with = for any reason, whether because they choose to, or because they made a typo.

This would later lead to unforeseen bugs.

Consider the following case (which happened to me, hence this topic):

  • You create a "export" page on your website where users can export their own data in various format: CSV, XLSX and ODS. Data is huge, thousands of rows, possibly million.
  • When you test it with of few users, everything works fine because in their data, none of them started by an =. Thus, you believe your code works perfectly
  • Then suddently a user complain it doesn't work when he tries to export as CSV
  • You have to dig through thousands of line, which take a lot of time to you and you computer. A real PITA ;)
  • You finally find out it crashes because there's an = starting a phone field because the man wanted to make a + but failed...Thus, openspout believe it's a formula, but since CSV Writer does not accept formulas, it crashes
  • You fix the data butit still doesn't work. You dig again in the data (again, PITA) to find out he also wrote a comment one day starting with =

So, bottom line is that I do understand your point, and you are probably right about it. But the current situation is also painful for other use case.
Maybe that one reasonable fix could be that CSV writer accepts FormulaCell, but simply convert them to strings?
...Or maybe that there should not be any "Cell" class which guess the cell type, forcing every users to use the real cell types, avoiding any mistakes?
At the very least, there should be a big warning in the documentation that using Cell class could lead to unwanted results, and that using typed cells is recommended.

Additionally, I find it regretable that Cell class is abstract (and thus, can be extended) but fromValue method is final (thus, cannot be overloaded), which would help individually solving this case by adapting it to our own use case

@Slamdunk
Copy link
Contributor

Slamdunk commented Jan 9, 2023

what I am trying to say is that you never can control what users insert in any field

That's very wrong: you should indeed control what your users input, or at least how that inputs are treated during export.
If you have any systematic export tool, you should never rely on Cell::fromValue and instead instantiate a each CSV/XSLX/ODS field (new StringCell, new NumericCell...) by hand so you don't end up with unexpected bugs

Maybe that one reasonable fix could be that CSV writer accepts FormulaCell, but simply convert them to strings?

I'd accept a PR with such change

Or maybe that there should not be any "Cell" class which guess the cell type, forcing every users to use the real cell types, avoiding any mistakes?

You are not forced to use Cell::fromValue, you can skip it altogether if you don't like it

At the very least, there should be a big warning in the documentation that using Cell class could lead to unwanted results, and that using typed cells is recommended.

I'd accept a PR for this too 👍

Additionally, I find it regrettable that Cell class is abstract (and thus, can be extended) but fromValue method is final (thus, cannot be overloaded), which would help individually solving this case by adapting it to our own use case

That's a defensive architectural choice: it leads to much fewer bugs, or at least much more obvious bugs.
Cell::fromValue and Row::fromValues are just factory methods, you are free to ignore them if you need

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants