Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testcase "[PASS] Floating point numbers are compared with a 1e-6 tolerance 1-4/4" #78

Open
hesrah opened this issue Sep 12, 2022 · 25 comments · May be fixed by #304
Open

Testcase "[PASS] Floating point numbers are compared with a 1e-6 tolerance 1-4/4" #78

hesrah opened this issue Sep 12, 2022 · 25 comments · May be fixed by #304
Labels
Critical Discussion documentation Improvements or additions to documentation
Milestone

Comments

@hesrah
Copy link

hesrah commented Sep 12, 2022

[PASS] Floating point numbers are compared with a 1e-6 tolerance 1/4
#23=IFCPROPERTYSINGLEVALUE('Foo',$,IFCREAL(42.000042),$); is PASS with a 1e-4 tolerance but not with 1e-6 tolerance?

Dito [PASS] Floating point numbers are compared with a 1e-6 tolerance 2/4
#23=IFCPROPERTYSINGLEVALUE('Foo',$,IFCREAL(41.999958),$); -> 41.99999958

And maybe in [FAIL] Floating point numbers are compared with a 1e-6 tolerance 3/4
you wanted write #23=IFCPROPERTYSINGLEVALUE('Foo',$,IFCREAL(42.00000084),$);? (FAIL it is anyway.)

And analog in [FAIL] Floating point numbers are compared with a 1e-6 tolerance 4/4

@Moult
Copy link
Contributor

Moult commented Sep 12, 2022

For this one I used the 1+-1*1e-6 equation provided by @aothms - can confirm?

@Moult
Copy link
Contributor

Moult commented Sep 12, 2022

#36 (comment) is the comment by @aothms which informed this testcase. Happy to modify but I'll need an expert to confirm the approach :)

@aothms
Copy link

aothms commented Sep 13, 2022

Exactly, we have to be adaptive to both large and small values (especially since we only use SI units, so the range from $mm^2$ ( $1mm^2 = 0.000001$ ) to $km$ needs to be covered by one tolerance mechanism. If we'd use the $|a - b| < \epsilon$ kind of fixed tolerance, it would mean that you can't really constrain the square millimetre of a wire and that for a rail way length you're essentially comparing without tolerance because $a + \epsilon = a$ in floating point math for any sufficiently large $a$ and small $\epsilon$.

@Moult maybe it would be a bit more illustrative if we use large and small values in the test case, but obviously 42 will also do fine.. ;)

@hesrah
Copy link
Author

hesrah commented Sep 13, 2022

Ok, "1e-6 tolerance" means "6 significant digits", if I understand aothom's statement correctly.
This is a difficult topic. I think, we can't give a global significance over all measures. A significance belongs to a unit. But don't ask me how to solve this in IDS. :-D

I would say 6 significant digits are not enough. E.g. values from Swiss coordinate system has 10 significant digits for millimeter - and then we don't talk about round problems with float numbers yet. Or one centimeter in degress for geografic coordinate systems is something like 1E-9 °. So we need more significant digits.

@hesrah
Copy link
Author

hesrah commented Sep 13, 2022

Oh, I see aothms answers in the meantime. But my statements seems to remain valid.
I think, I understand the mathematical problem. But think we can't give a general rule over all units/measurs ... ?

@aothms
Copy link

aothms commented Sep 13, 2022

I would say 6 significant digits are not enough

That's definitely a valid comment. 6 is rather pessimistic, it's based on the number of significant digits that can safely survive a roundtrip as 32-bit float. (generally this is 7, but there are a handful of cases where only 6 can be preserved).

We can also say, all implementations should use 64bit (I'm not aware of applications using 32bit, you'd generally get into tolerance issue on the geometry side. In that case we can choose any value <= 15.

See https://en.cppreference.com/w/cpp/types/numeric_limits/digits10

@aothms
Copy link

aothms commented Sep 13, 2022

But think we can't give a general rule over all units/measurs

Also note, we're not trying to solve actual construction tolerances, users can do that using minInclusive maxInclusive. We're just trying to fight rounding errors.

@lasselindqvist
Copy link

I would say it would be good if you could define your own tolerances anyway, even if there was a default one.
Maybe with

<property measure="IfcReal" minOccurs="1" maxOccurs="1">
  <propertySet>
    <simpleValue>Foo_Bar</simpleValue>
  </propertySet>
  <name>
    <simpleValue>Foo</simpleValue>
  </name>
  <value>
    <simpleValue>42.</simpleValue>
    <tolerance>0.00001</tolerance>
  </value>
</property>

maybe in some other way. Value and tolerance is in some ways a nicer representation than min/max limits.

@pasi-paasiala
Copy link
Contributor

One suggestion from @CBenghi is to add reasonable tolerances to the measures table.

@CBenghi
Copy link
Contributor

CBenghi commented Jun 16, 2023

One suggestion from @CBenghi is to add reasonable tolerances to the measures table.

An objection to this proposal: closed list of tolerances for measures do not work well for all contexts (e.g. when an ifcLenghtMeasure is used in the context of rail lenght or a screw thread, it would not be appropriate to use the same tolerance).

Speaking with @aothms, I now understand that the proposed solution was more subtle that what I initially understood and works ad follows:

The actual tolerance has two components:

  1. an absolute value (i.e. 1e-6)
  2. a variable component proportional to the value provided in the IDS constraint (i.e. value * 1e-6)

The second one is significant for large values, while the first one kicks in for values around 0, as the following table demonstrates:

IDS value Fixed Proportional Tolerance Range Min Range Max
10000 0.000001 0.010000 0.010001 9999.99 10000.01
1 0.000001 0.000001 0.000002 0.9999980 1.0000020
0 0.000001 0.000000 0.000001 -0.0000010 0.0000010
0.1 0.000001 0.000000 0.0000011 0.0999989 0.1000011
0.00001 0.000001 0.000000 1.00001E-06 0.0000090 0.0000110

Maybe I was the only one that did not understand this, but I leave this here for future discussion.

This solution seems sensible to me, but me must develop a reasonable spread of use cases to make sure that the two components work across the range of IDS use cases.

@aothms
Copy link

aothms commented Jun 16, 2023

I think some detail was lost in the way the solution was presented.

$x = v$ implies $(v \times (1. - ε) - ε) &lt; x &lt; (v \times (1. + ε) + ε)$
with $ε = 1.0e-6$

So there is a relative component and fixed component (for 0.)

This means:

v lower bound upper bound
0.001 0.000998999 0.0010010009999999998
0.0 -1e-06 1e-06
1.0 0.9999979999999999 1.0000019999999998
1000.0 999.998999 1000.0010009999999

reasonable tolerances to the measures table.

I don't think this can work. A length measure can still be the thickness of a screw thread or the length of a railway track.

I would say it would be good if you could define your own tolerances anyway, even if there was a default one.

I don't disagree, but I think that can wait for a next version if there is end user demand for it.

@gverduci
Copy link

gverduci commented Mar 4, 2024

I think some detail was lost in the way the solution was presented.

x=v implies (v×(1.−ε)−ε)<x<(v×(1.+ε)+ε) with ε=1.0e−6

using the solution presented by @aothms in all the 0.9.7 tollerance test they pass

@andyward
Copy link
Contributor

andyward commented Mar 6, 2024

I think some detail was lost in the way the solution was presented.

x=v implies (v×(1.−ε)−ε)<x<(v×(1.+ε)+ε) with ε=1.0e−6

So there is a relative component and fixed component (for 0.)

This means:

v lower bound upper bound
0.001 0.000998999 0.0010010009999999998
0.0 -1e-06 1e-06
1.0 0.9999979999999999 1.0000019999999998
1000.0 999.998999 1000.0010009999999

An interesting quirk of this approach is that as v approaches -1 the upper and lower bound both tend to -1e-6. With essentially no spread of values. Until the inputs get large in magnitude the spread (i.e the range of FP values we treat as acceptable/equivalent) is consistently around 2e-6. But there is this anomaly. A bit of an example using ε=1.0e−6

image

image

Fortunately this falls on -1 not something more useful like 0, so I suspect it's something unlikely to affect many use cases. But worth noting.

@aothms
Copy link

aothms commented Mar 21, 2024

@andyward Sharp! Excellent that you bring this up. But I think there is a bigger problem hiding underneath this. For values under -1, the lowerbound is actually above the upper bound. So the problem I'd say is not so much that the absolute difference between the bounds touches 0, but that the signed difference between higher and lower is a linear function that crosses the x axis and becomes negative.

afbeelding

[(v-abs(v)*eps-eps), (v+abs(v)*eps+eps)]

^ What do you think of this?

@andyward
Copy link
Contributor

andyward commented Mar 21, 2024

I think we ended up just inverting the logic when testing -ve numbers, at least when testing for equality: XIDS Code

Will have a look at whether your Abs() approach makes more sense. It will probably be clearer

It gets more complex when you get into RangeConstraints as we found you need to account for whether it's the Min or Max you're apply the range tolerance to. I'm not even sure if this is sensible, but given a requirement that:

<maxInclusive value = "100"/> i.e. x <= 100

... obviously this will be satisfied by (100 - 1e-7), but should we expect this be satisfied by (100+1e-7)?

Now what about <minInclusive value ="0"/> ( i.e. x >= 0 ) should that be satisfied by -(1e-7) - ie. a real number just under zero?

Then you get similar fun and games with boundaries when testing -ve values ranges.

I can see there may be real world use cases where a user may have range requirements where they expect real values with these precision artefacts to be handled. E.g. "LCA CO2e/kg/m2 should be <=600". 600.000001 would probably satisfy, as should a net -ve value. Do we need to consider for IDS v1.0?

We had a stab at it. Out initial assumption on RangeConstraints is that applying any precision tolerance on Reals is only sensible when MinInclusive = true. 100.000001 can never be < 100. But this opens up the question about whether 99.999999998 is really < 100 bearing in mind it is within 1e-6 tolerance of 100.

Bottom line, adding tolerance to Range checks opens up all kinds of interesting edge cases. Some tests and notes here in XIDS Tests.

Just one example from the tests:

// TODO: Consider semantic paradox where:
// 41.999958d satisfies being in the range 42-50 inclusive, but also satisfies being < 42 exclusive

I guess the main question is: Before we finesse the exact tolerance algorithm, do we need to consider tolerance on ranges as well?

@aothms
Copy link

aothms commented Mar 21, 2024

do we need to consider tolerance on ranges as well?

An additional consideration there could be that maybe a range has been used by the end user to actually incorporate precision on a single value and then we do it again in the IDS implementation.

Also, in general the inclusive/exclusive distinction really just doesn't work well on floating point, because the distinction between inclusive/exclusive is either infinitesimally small (real number line), really really tiny (an ieee754 float value) or somewhat significant (our tolerance padding). And then there are corner cases like negative and positive zero, where we have multiple ieee754 float value that compare equal?

My inclination would be:

  • No distinction between interpretation of inclusive/exclusive when operating on reals, they just mean the same thing
  • Apply the tolerance padding as for single values, they slightly stretch the domain of accepted values, better be a bit on the permissive side here than on the restrictive side.

@andyward
Copy link
Contributor

Apply the tolerance padding as for single values, they slightly stretch the domain of accepted values, better be a bit on the permissive side here than on the restrictive side.

Yes this is what we're thinking. We stretch the upper/lower bounds slightly on ranges when the value being tested is Real.

But if we do apply this to exclusive ranges don't we risk breaking some pretty fundamental user assumptions? Use case: "The value must be greater than 0 [and less than...]" i.e. <minExclusive value="0"/>

If we stretch the bounds that would accept float values > -0.00001 (i.e. negative), as well as 0 itself. I suspect some folks would be 🤯! They may also be raising eyebrows that 0.00000001 would satisfy the requirement - but that's more understandable.

I'm now thinking that Exclusive ranges should actually be shrunk, not stretched, to match the semantic intent of the requirement. So > 0 actually becomes > 1e-6; while Inclusive ranges are stretched - >= 0 becomes >= -1e-6. (using your proposed algorithm)

An additional consideration there could be that maybe a range has been used by the end user to actually incorporate precision on a single value and then we do it again in the IDS implementation.

My assumption is ranges are meant for real world tolerances. E.g. measures, costs, product performance characteristics, where a spread of values is permittable, but there is some real world minima, maxima that requires enforcing.

This double-handling, of precision feels like a separate edge-case that should be handled by documentation & auditing rules. If we end up with lots of rules specifying 0.999999 < x < 1.000001 we've failed here - or the author is compensating unnecessarily since this is the exact FP issue we're discussing we hide from the spec writer?

E.g. if a numeric RangeConstraint's spread is < [agreed-calculation] we should raise a warning in the audit tool @CBenghi ?

@aothms
Copy link

aothms commented Mar 21, 2024 via email

@andyward
Copy link
Contributor

andyward commented Mar 21, 2024

Agreed. I'd probably make the argument that to a typical end-user a value of 0.00000000001 is to all intents and purposes 'zero'. We know that 999.9+ times in 1000 it's just an artificial artifact introduced by tooling using Floats for numbers, when they probably should have been using fixed precision decimals; or it's a derived measure based off FP numbers/geometry. And either way the requirement intent was for a meaningful positive number.

andyward added a commit to xBimTeam/Xbim.IDS.Validator that referenced this issue Apr 11, 2024
@CBenghi
Copy link
Contributor

CBenghi commented May 3, 2024

Following the conversation in the call on 3rd May 2024 we are inviting all interested parties in producing a succinct proposal for the management of rounding that can be cover a variety of use cases.

The proposal should address as many possible of the following criteria:

  • clarity of interpretation from end users (people specifying requirements)
  • ability to specify the desired tolerance (or no tolerance)
  • ease of use for specification writers
  • ease of use in the export of models
  • compliance with unit transformation of measures
  • ability to ensure adequate georeferencing constraints

This should be accomplished with no changes to the current xsd schema; however, it is possible to take into consideration different behaviours depending on the contextual information such as:

  • differences behaviours of ids:simplevalue vs xs:restriction
  • differences behaviours when using min/max restrictions vs exact values in the xs:restriction facet
  • contextual information on the dataType

Please write your proposal in a documentation markdown file and propose a PR targeting this issue. We will use the github review mechanism to come to an consensus in the coming weeks.

Thanks for your contribution.

@CBenghi CBenghi pinned this issue May 4, 2024
giuseppeverduciALMA pushed a commit to giuseppeverduciALMA/IDS that referenced this issue May 9, 2024
The commit adds a new file "tolerance.md" to the Documentation directory. This file summarizes the discussions and comments made in issues buildingSMART#78 and buildingSMART#36 regarding tolerance in IDS
@SteezoTheEngineer
Copy link

Hi all

Interesting discussion so far. I encountered a scenario where the tolerance factor is necessary, as shown in the image. The IFC file was exported by a software, and the saved value is -8.744999999999999.
image
I also want to point out that the test failed not because the values don't exactly match, but because the Python library currently cannot handle negative values.
I believe the proposed solution by @aothms is a good one, as the tolerance is relative to the actual value. What's great is that it can handle negative values as well.

Another approach would be to add and subtract a very small number. However, this approach also presents the issue of deciding how small the number should be. Regardless, I would greatly appreciate it if you could set the tolerance in the IDS file and ensure that it can handle negative values.

@andyward
Copy link
Contributor

Another approach would be to add and subtract a very small number. However, this approach also presents the issue of deciding how small the number should be. Regardless, I would greatly appreciate it if you could set the tolerance in the IDS file and ensure that it can handle negative values.

I think the principal is that the IDS author doesn't need to specify a tolerance at all for this kind of precision issue - it needs an implementors agreement rather than an IDS schema change. i.e. the software doing the testing must allow for tolerance of these small amounts occurring on FP numbers. That exact agreement/algorithm is what we need to agree here e.g. in #294

@Moult
Copy link
Contributor

Moult commented May 23, 2024

Here's another proposal: what if the number was rounded to the same number of decimal places as what the IDS author specifies?

E.g. if the author asks for a length of 1m, then round(value, 0 decimal places) will be compared. So 0.7, 0.9, 0.99, 0.9999, 1.0000001, 1.49999999 all pass. If they want 1m to the nearest mm, then they can specify a length of 1.000m.

@TLiebich
Copy link

I would not agree to this proposal - very difficult to explain to end users and potentially conflicting with standard implementations. In my understanding it would be best to keep IDS V1.0 as it is (1e-6 tolerance to avoid computational rounding errors in BIM authoring software), and add proper tolerance handling, e.g., using percentage or construction tolerances as fixed values (like +/- 2cm) in a future version.

@andyward
Copy link
Contributor

Here's another proposal: what if the number was rounded to the same number of decimal places as what the IDS author specifies?

While I can see the initial logic, it feels like this going to raise as many problems as it solves?

  1. I'd imagine a user being surprised when finding 0.5m = 1m - when they forgot to specify decimals
  2. IDS authoring and validation software need to understand there's a difference between a simplevalue of 1, 1.0. and 1.00 etc and preserve that semantic all through the application layers. i.e. we now have this issue where parseFloat("1.0000").toString() => "1" .... any time we need to round-trip a numeric type from a string value. i.e. we need to preserve the Decimal precision alongside the value
  3. We'd have to account for non SI units - an imperial specification using imperial: 1" will store 0.0254m. But also edgecases like planar angles being in Radians (i.e. irrational numbers) will create challenges?
  4. We get all the same edgecases on ranges in the current proposal but magnified: i.e presumably 0.5 would satisfy a range restriction of minInclusive = "1"

Sounds like a big can of worms 😄

@atomczak atomczak linked a pull request May 24, 2024 that will close this issue
@atomczak atomczak linked a pull request May 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Critical Discussion documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants