982 Rewrite of scan-left and scan-right #1296

michaelhkay · 2024-06-23T22:57:59Z

The "equivalent expression" is replaced with one that is much shorter and hopefully easier to understand, though hopelessly inefficient as an actual implementation.
The result no longer includes the zero value. This seems simpler, and is consistent with other expositions I have read, e.g. of the Scala functions.
The signature of scan-left and scan-right is now identical to fold-left and fold-right, which apart from having the virtue of consistency, makes it much easier to specify one in terms of the other. The change is that the callback function now allows a position argument.

dnovatchev · 2024-06-24T01:30:43Z

First of all, thank you for finding the minor typo in the example. I have submitted a pull request (1297) that corrects this.

As explained in the "xpath-ng" channel of slack (https://xmlcom.slack.com/archives/C01GVC3JLHE/p1719185521735999?thread_ts=1719181439.481479&cid=C01GVC3JLHE), the results in the spec are correct if the mistyped expression:

  scan-right(1 to 5, 0, op('+'))

is corrected to what has really been intended:

  scan-right(1 to 10, 0, op('+'))

Again, as explained in the same thread (https://xmlcom.slack.com/archives/C01GVC3JLHE/p1719188848301309?thread_ts=1719181439.481479&cid=C01GVC3JLHE), the correct result is provided:

[ 55 ], [ 54 ], [ 52 ], [ 49 ], [ 45 ], [ 40 ], [ 34 ], [ 27 ], [ 19 ], [ 10 ], [ 0 ]

Please, read the explanation there, and I would be glad to answer any additional questions.

As for whether or not the initial accumulator value should be provided, this is being done in accordance with how the function is defined in Haskell, and the initial accumulator value is definitely needed to be provided in case the input is the empty sequence.

As the pull request #1297 fixes the typo and the questions raised have been answered, can we close this pull request (#1296) now?

dnovatchev · 2024-06-24T01:34:13Z

specifications/xpath-functions-40/src/function-catalog.xml

@@ -31345,7 +31345,7 @@ path with an explicit <code>file:</code> scheme.</p>
         <fos:proto name="scan-left" return-type="array(*)*">
            <fos:arg name="input" type="item()*" usage ="navigation"/>           
            <fos:arg name="zero" type="item()*"/>           
-            <fos:arg name="action" type="fn(item()*, item()) as item()*" usage="inspection"/>   
+            <fos:arg name="action" type="fn(item()*, item(), xs:integer) as item()*" usage="inspection"/>   


A position argument is not needed and is artificial burden on the user's cognitive, learning and understanding process.

dnovatchev · 2024-06-24T01:36:55Z

specifications/xpath-functions-40/src/function-catalog.xml

@@ -31354,11 +31354,16 @@ path with an explicit <code>file:</code> scheme.</p>
         <fos:property>focus-independent</fos:property>
      </fos:properties>
      <fos:summary>
-        <p>Produces the complete (ordered) sequence of all partial results from every new value 
-        the accumulator is assigned to during the evaluation of fn:fold-left.</p>
+        <p>Produces the complete (ordered) sequence of all intermediate 


More precisely:

<p>Produces the complete (ordered) sequence of all intermediate results of an evaluation of fn:fold-left immediately after an item in the sequence has been processed.</p>

dnovatchev · 2024-06-24T01:42:29Z

specifications/xpath-functions-40/src/function-catalog.xml

-         <p>The function is equivalent to the following implementation in XPath (return clause added in comments for completeness):</p>
+         <p>The result of the function is the value of the expression:</p>
+         <eg>(1 to count($input)) ! 
+   array{ slice($input, end := .) => fold-left($zero, $action) }</eg>


It isn't obvious at all if this expression is equivalent to the current XPath 3.1 executable code.
What are any pressing reasons to replace the XPath 3.1 - executable code with this one?

dnovatchev · 2024-06-24T01:51:30Z

specifications/xpath-functions-40/src/function-catalog.xml

+           given in the rules above is provided purely for formal specification
+           purposes.</p>
+           <p>Each intermediate result is placed in a separate array. The number of arrays
+           in the result is the same as the number of items in <code>$input</code>.</p>


There must be an array containing only the initial accumulator value - exactly as done in Haskell's scanl function.

This is necessary for the case when the input is the empty sequence. If nothing is produced, then the function fails to provide the steps of processing - the decision to produce and return as final result the initial-accumulator-value.

dnovatchev · 2024-06-24T01:54:23Z

specifications/xpath-functions-40/src/function-catalog.xml

        </fos:notes>                
      <fos:examples>
         <fos:example>
            <fos:test>
               <fos:expression><eg>scan-left(1 to 5, 0, op('+'))</eg></fos:expression>
-               <fos:result>[ 0 ], [ 1 ], [ 3 ], [ 6 ], [ 10 ], [ 15 ]</fos:result>
+               <fos:result>[ 1 ], [ 3 ], [ 6 ], [ 10 ], [ 15 ]</fos:result>


As pointed out above, we do need the initial-accumulator-value included.

dnovatchev · 2024-06-24T01:57:17Z

specifications/xpath-functions-40/src/function-catalog.xml

-        <p>Produces the complete (ordered) sequence of all partial results from every new value 
-        the accumulator is assigned to during the evaluation of fn:fold-right.</p>
+        <p>Produces the complete (ordered) sequence of all intermediate 
+        results of an evaluation of <code>fn:fold-right</code>.</p>


More precisely:

<p>Produces the complete (ordered) sequence of all intermediate results of an evaluation of `fn:fold-right` immediately after an item in the sequence has been processed. The intermediate result of processing the k-th item of the input sequence is immediately followed by the intermediate result of processing the k+1th item in the input sequence</p>

dnovatchev · 2024-06-24T01:59:53Z

specifications/xpath-functions-40/src/function-catalog.xml

      <fos:examples>
         <fos:example>
            <fos:test>
               <fos:expression><eg>scan-right(1 to 5, 0, op('+'))</eg></fos:expression>
-               <fos:result><eg>[ 55 ], [ 54 ], [ 52 ], [ 49 ], [ 45 ],
-[ 40 ], [ 34 ], [ 27 ], [ 19 ], [ 10 ], [ 0 ]</eg></fos:result>
+               <fos:result><eg>[ 5 ], [ 9 ], [ 12 ], [ 14 ], [ 15 ]</eg></fos:result>


No, this is wrong.

The correct result (also produced by Haskell) is:

[15], [14], [12], [9], [5], [0]

The original example may be better in showing the details of how scan-right works:

scan-right(1 to 5, 0, op('+'))

produces:

[55 ], [ 54 ], [ 52 ], [ 49 ], [ 45 ], [ 40 ], [ 34 ], [ 27 ], [ 19 ], [ 10 ], [ 0 ]

dnovatchev · 2024-06-24T02:02:05Z

specifications/xpath-functions-40/src/function-catalog.xml

+
+         <eg>reverse(1 to count($input)) ! 
+   array{ slice($input, start := .) => fold-right($zero, $action) }</eg>
+         <!--<p>The function is equivalent to the following implementation in XPath (return clause in comments added for completeness):</p>


As pointed below, this produces the wrong result

dnovatchev · 2024-06-24T02:02:38Z

specifications/xpath-functions-40/src/function-catalog.xml

@@ -31381,36 +31386,34 @@ let $scan-left := fn(
  $scan-left-inner($input, $zero, $action, $scan-left-inner)
 }
 (: return $scan-left(1 to 10, 0, op('+'))  :)    
-]]></eg>         
+]]></eg>-->         


What are any pressing reasons to comment out the XPath 3.1 - executable code?

dnovatchev · 2024-06-24T02:04:19Z

specifications/xpath-functions-40/src/function-catalog.xml

+
+         <eg>reverse(1 to count($input)) ! 
+   array{ slice($input, start := .) => fold-right($zero, $action) }</eg>
+         <!--<p>The function is equivalent to the following implementation in XPath (return clause in comments added for completeness):</p>


It isn't obvious at all if this expression is equivalent to the current XPath 3.1 executable code.
What are any pressing reasons to replace the XPath 3.1 - executable code with this one?

dnovatchev · 2024-06-24T02:04:38Z

specifications/xpath-functions-40/src/function-catalog.xml

@@ -31477,37 +31484,34 @@ let $scan-right := function(
  $scan-right-inner($input, $zero, $f, $scan-right-inner)
 }        
 (: return $scan-right(1 to 10, 0, op('+')) :)  
-]]></eg>         
+]]></eg>         -->


What are any pressing reasons to replace the XPath 3.1 - executable code with this one?

dnovatchev · 2024-06-24T02:08:57Z

specifications/xpath-functions-40/src/function-catalog.xml

-            the first argument is any item in the sequence <code>$input</code>, and the second is either
-            the value of <code>$zero</code> or the result of a previous application of
-            <code>$action</code>.</p>
+         <p>See <code>fn:fold-left</code>: errors are raised in the same situations.</p>      


Did you mean:

See fn:fold-right: errors are raised in the same situations.

dnovatchev · 2024-06-24T02:11:16Z

specifications/xpath-functions-40/src/function-catalog.xml

+           purposes.</p>
+           <p>Each intermediate result is placed in a separate array. The number of arrays
+           in the result is the same as the number of items in <code>$input</code>.</p>
+           <p>The fact that the function has the same signature as <code>fn:fold-right</code>


There must be an array containing only the initial accumulator value - exactly as done in Haskell's scanr function.

This is necessary for the case when the input is the empty sequence. If nothing is produced, then the function fails to provide the steps of processing - the decision to produce and return as final result the initial-accumulator-value.

dnovatchev · 2024-06-24T02:16:02Z

specifications/xpath-functions-40/src/function-catalog.xml

+           <p>The fact that the function has the same signature as <code>fn:fold-right</code>
+           means that this function can conveniently be used to study the behavior of
+           an call on <code>fn:fold-right</code> with the same arguments, perhaps for
+           diagnostic purposes.</p>


A more common use-case is for producing the running totals of a continuous computation

dnovatchev · 2024-06-24T02:18:58Z

specifications/xpath-functions-40/src/function-catalog.xml

-                </olist>
-        </fos:notes>                
+           <p>A practical implementation might be expected to evaluate the result
+           incrementally in a single right-to-left pass of the input; the equivalent expression


"a single right-to-left-pass of the input" is not exactly precise and could be misleading.

There is initially a "left-to-right" pass in which the items are fixed and waiting for the remaining computation on the right to be performed.

dnovatchev · 2024-06-24T02:25:08Z

specifications/xpath-functions-40/src/function-catalog.xml

-               <fos:expression><eg>scan-right(1 to 3, 0, op('-'))</eg></fos:expression>
-               <fos:result>[ 2 ], [ -1 ], [ 3 ], [ 0 ]</fos:result>
+               <fos:expression><eg>scan-right(1 to 5, 0, op('-'))</eg></fos:expression>
+               <fos:result>[ 5 ], [ -1 ], [ 4 ], [ -2 ], [ 3 ]</fos:result>


This is wrong. The correct result is:

[3], [-2], [4], [-1], [5], [0]

michaelhkay · 2024-06-24T11:36:54Z

Thanks for review. I'll pick up your corrections to the examples.

I think that leaves three substantive points.

Firstly, the question of whether to include the zero value in the result. It does seem to be a convention that the zero value is included, though I have to say I can't see why. But I'll back out that change and include it since it's conventional.

Secondly the position argument. If it were just this function I wouldn't have a strong argument to make for it. But consistency is absolutely essential. If we have this argument on the fold functions, then we must have it on the scan functions. It's completely unacceptable to make them different. We made a decision to include a position argument uniformly on all the item-by-item callbacks and unless we want to reverse that decision, we should respect it.

Thirdly the style of specification. I agree with you that it's hard to verify that my formulation is equivalent to yours. But the reason that it's hard to verify is that your formulation is very hard to reason about. The way I've written it (at least for scan-left) it's very clear that the formal exposition corresponds directly to the informal explanation: In fact, we could express it clearly in narrative prose as "the function returns a sequence of arrays in which the value of the Nth array is the result of applying fold-left, with the same $zero and $action arguments, to the subsequence of $input starting at the first item and ending at the Nth item". By contrast, working out what the function actually does from the current 20 lines of recursive code essentially involves executing the algorithm in your head.

The recursive exposition would be a lot clearer if we wrote it in XQuery, as we do with other functions:

declare function fn:scan-left($input as item(*), $zero as item(*), $action as fn(....)) {
   [$zero], 
   if (exists($input)) {
       fn:scan-left(tail($input), $action($zero, head($input), $action))
   }
}

Perhaps we should go with that one. (Though adding the position argument would require reintroduction of a helper function, as is done with fold-left.)

Finally, I notice that I introduced a change to the form of the output. For the third example, your code (ignoring the stray ']' at the end) returns

[ () ], [ 2 ], [ (2, 4) ], [ (2, 4, 6) ]

whereas mine returns

[], [ 2 ], [2, 4], [2, 4, 6]

Either would work, but I feel that the second result is more usable (for example, it can be rendered as JSON). This can be achieved simply by changing [$zero] to array{$zero}.

michaelhkay · 2024-06-24T12:13:21Z

I'm inclined to propose dropping the position argument for both fold and scan. It complicates the specification and the use cases are unconvincing. I believe it has been incorrectly specified (for fold-left, the first time $action is called, the value supplied for $pos is 2, whereas for fold-right it is count($input)-1; and the "Error conditions" section talks of $action being applied to 2 arguments). For the -right forms in particular, the semantics are mind-bending enough without introducing this complication.

dnovatchev · 2024-06-24T14:31:54Z

Thanks for review. I'll pick up your corrections to the examples.

I think that leaves three substantive points.

Firstly, the question of whether to include the zero value in the result. It does seem to be a convention that the zero value is included, though I have to say I can't see why. But I'll back out that change and include it since it's conventional.
👍 👍 👍

Secondly the position argument. If it were just this function I wouldn't have a strong argument to make for it. But consistency is absolutely essential. If we have this argument on the fold functions, then we must have it on the scan functions. It's completely unacceptable to make them different. We made a decision to include a position argument uniformly on all the item-by-item callbacks and unless we want to reverse that decision, we should respect it.

I greatly support the suggestion in your next comment to drop this position-argument from both folds and scans!

💯 💯 💯

Thirdly the style of specification. I agree with you that it's hard to verify that my formulation is equivalent to yours. But the reason that it's hard to verify is that your formulation is very hard to reason about. The way I've written it (at least for scan-left) it's very clear that the formal exposition corresponds directly to the informal explanation: In fact, we could express it clearly in narrative prose as "the function returns a sequence of arrays in which the value of the Nth array is the result of applying fold-left, with the same $zero and $action arguments, to the subsequence of $input starting at the first item and ending at the Nth item". By contrast, working out what the function actually does from the current 20 lines of recursive code essentially involves executing the algorithm in your head.

The recursive exposition would be a lot clearer if we wrote it in XQuery, as we do with other functions:
declare function fn:scan-left($input as item(*), $zero as item(*), $action as fn(....)) {
   [$zero], 
   if (exists($input)) {
       fn:scan-left(tail($input), $action($zero, head($input), $action))
   }
}
Perhaps we should go with that one.

OK, why not provide both? The pure XPath code is intended for people who need to use the functions right now as we speak - not willing to wait for two more years until they get these functions.

Finally, I notice that I introduced a change to the form of the output. For the third example, your code (ignoring the stray ']' at the end) returns

[ () ], [ 2 ], [ (2, 4) ], [ (2, 4, 6) ]

whereas mine returns

[], [ 2 ], [2, 4], [2, 4, 6]

Either would work, but I feel that the second result is more usable (for example, it can be rendered as JSON). This can be achieved simply by changing [$zero] to array{$zero}.

We could go either way. My understanding is that the on each step we get a single result of type item()* - and this is a single sequence that we put in an array, in order to model "a sequence of sequences".

dnovatchev · 2024-06-24T14:34:15Z

I'm inclined to propose dropping the position argument for both fold and scan. It complicates the specification and the use cases are unconvincing. I believe it has been incorrectly specified (for fold-left, the first time $action is called, the value supplied for $pos is 2, whereas for fold-right it is count($input)-1; and the "Error conditions" section talks of $action being applied to 2 arguments). For the -right forms in particular, the semantics are mind-bending enough without introducing this complication.

A great decision, which I absolutely support.

This allows me to skip writing my own proposal for removing the position-argument from the folds!

💯 💯 💯

dnovatchev · 2024-06-25T20:45:17Z

specifications/xpath-functions-40/src/function-catalog.xml

-(: return $scan-left(1 to 10, 0, op('+'))  :)    
-]]></eg>         
+) as item()* {
+  array{$zero},


Wouldn't it be better to construct the array as:

[$zero]

This is more understandable, because we treat the $action function as producing a single result - even when this result is a sequence.

michaelhkay · 2024-06-27T08:03:02Z

In light of the fact that we have discovered a use-case for the positional fold-left within our own code, and in the light of Christian's comments, I have reinstated the 3-argument callback for the fold and scan functions. I found a new way to formulate the semantics that I hope is conceptually clearer and less error-prone, by defining the arity-3 version in each case in terms of the arity-2 version. Certainly for fold-right and scan-right, I think it makes it much clearer how the position argument works.

dnovatchev · 2024-06-27T18:12:48Z

The fact that it required so much effort and fantasy just to construct a single use-case, only further shows that adding a position-argument to folds and scans is absolutely unnecessary and, with good will, can be avoided.

Let's adhere to the KISS principle in our design work and try to learn from other great sources, such as Microsoft.

In case someone so badly needs the position-argument, this could perfectly be added as a separate, new function, keeping clean the original ideas behind folds and scans.

As for the "uniformity" ...

"Where all think alike, no one thinks very much."

Walter Lippman

ChristianGruen · 2024-06-27T20:09:43Z

I feared this would cause new controversy.

No one is forced to use the index argument, it’s completely optional.
The index argument is also available in JavaScript.
Kotlin, which is loved for its conciseness and consistency, provides a dedicated reduceIndexed function for that purpose. Adding an extra function would obviously be contrary to our approach to add the position argument to existing HOF functions.

dnovatchev · 2024-06-28T15:25:26Z

I feared this would cause new controversy.

* No one is forced to use the index argument, it’s completely optional.

* The index argument is also available in JavaScript.

* Kotlin, which is loved for its conciseness and consistency, provides a dedicated `reduceIndexed` function for that purpose. Adding an extra function would obviously be contrary to our approach to add the position argument to existing HOF functions.

I am not replying, because I have already said everything on this subject, and all the above arguments do not hold. I wouldn't be discussing the X-languages if I was strict Javascript fan.

And I will hate the waste of time, confusion and possible errors due to this, having to read longer than necessary documentation describing an argument that I would never use and that I should never use.

The ridiculous attempts to find use-cases for having this argument in folds are good evidence, we don't need to waste our time on arguing further.

dnovatchev · 2024-06-29T00:24:40Z

I feared this would cause new controversy.
* No one is forced to use the index argument, it’s completely optional.

OK, following this logic let us then add a dozen more meaningless and unnecessary arguments - they will be completely optional and no one would be forced to use them.

ChristianGruen · 2024-07-11T16:14:52Z

Here’s a (simplified) example for using the positional argument in folds that I just come across today. The code creates intersections of multiple ID sequences:

let $id-arrays := [
  (1, 3, 5, 10, 15, 21),
  (4, 7, 10, 15),
  (8, 10, 15, 18)
]
return array:fold-left($id-arrays, (), fn($result, $ids, $pos) {
  if($pos = 1) then $ids else $ids[. = $result]
})

Of course, it will only have a chance to convince those who are not immune to open discussions in principle.

dnovatchev · 2024-07-11T20:04:12Z

Of course, it will only have a chance to convince those who are not immune to open discussions in principle.

Alas, this is another contrived example.

Here is a proper and do note: simpler, fold solution without any positional argument:

let $id-arrays := [
  (1, 3, 5, 10, 15, 21),
  (4, 7, 10, 15),
  (8, 10, 15, 18)
]
return
  array:fold-left($id-arrays, $id-arrays[1], fn($result, $ids) {
    $ids[. = $result]
  })

@ChristianGruen So, who is immune or not to open discussions?

ChristianGruen · 2024-07-11T20:39:35Z

Here is a proper and do note: simpler, fold solution without any positional argument:

In the original use case, it’s not that easy as more operations are performed on the input sequence in the HOF body before the final intersection is created. Next, your solution requires the input to be bound to a variable. You can’t do things like fold-left(EXPR, ...).

@ChristianGruen So, who is immune or not to open discussions?

In general, people who have already decided what’s right or wrong, no matter what others will say.

dnovatchev · 2024-07-11T21:01:26Z

Here is a proper and do note: simpler, fold solution without any positional argument:

In the original use case, it’s not that easy as more operations are performed on the input sequence in the HOF body before the final intersection is created.

There was no "original use case" specified - just an artificial example that uses unnecessarily a positional argument - just for the sake to show that having the positional argument is useful - when in fact it is not at all.

Next, your solution requires the input to be bound to a variable. You can’t do things like fold-left(EXPR, ...).

Hmm... ??? This is not "my solution" . This is exactly the solution that was originally provided by @ChristianGruen , having been optimized by just throwing out the unneeded 3-argument action-function and replacing it by a simpler 2-argument action function that has 2 times less operations.

As for:

Next, your solution requires the input to be bound to a variable. You can’t do things like fold-left(EXPR, ...).

This statement is completely wrong. We can have

fold-left(ANY_EXPR_PRODUCING_SEQUENCE, ...)

As for performing "more operations", the 2-argument action function performs 2 times less operations as the whole "or" has been eliminated - it indeed was not needed and superficially added.

@ChristianGruen So, who is immune or not to open discussions?

In general, people who have already decided what’s right or wrong, no matter what others will say.

It is not at all matter who says what. It is a matter of facts and evidence. The fact is that for years there has not been even a single example specified for the necessity of a 3-argument action function, and the fact that so many wise people could not provide for a significantly long period any non-contrived example.

Thus everyone can make the conclusion just based on the facts.

ChristianGruen · 2024-07-11T21:10:01Z

Thus everyone can make the conclusion just based on the facts.

Let’s stop it; we should let others speak. Everything else is waste of time and energy.

dnovatchev · 2024-07-11T21:28:27Z

Thus everyone can make the conclusion just based on the facts.

Let’s stop it; we should let others speak. Everything else is waste of time and energy.

Absolutely agreed!

The facts are there - let everyone make their conclusion.

benibela · 2024-07-13T17:04:48Z

An index argument is extremely useful

I need it all the time

But too many variables make the code hard too read. And the implementation becomes slow, when it has to handle too many arguments. Especially with function coercion adding further type checks

XPath had the best solution with the position() function. You get the index if you need it and it if you do not need it, you do not have to deal with it

michaelhkay · 2024-07-15T16:19:15Z

I can't see why people are getting so emotional about this. It's a simple design trade-off, of the kind that software designers have to make all the time: do we design this function to handle only the use cases that we can identify, or do we make it consistent with other functions? There are arguments for both approaches and the decision is a subjective one.

In general I'm a strong believer in orthogonality as a language design principle: that means making different constructs work the same way, even if the effect is to provide functionality for which there is no known use case. The arguments in favour of that are that it's easier for users to remember how things work if everything works the same way, and it's often easier for implementors as well because they can reuse the same code and don't have to deal with so many special cases. In addition, it's often the case that when the language designers can't think of a use case, the users will soon find one; we don't have the benefit of having a complete requirements statement from our users.

There are exceptions where a non-orthogonal design decision is appropriate. For example, in general we want all numeric functions to operate on all numeric types, but we made a decision that trigonometric functions should only operate on double-precision floating point, not on single-precision or decimals. That decision is justified both (a) by the absence of use cases, and (b) by considerations of implementation complexity (libraries for trigonometric functions on decimals are not readily available).

As a group, we can't make decisions on a matter like this unless everyone starts by recognising that there are valid arguments for and against both options; we need to cut out the kind of comment that asserts that either option is obviously correct or obviously wrong. We should also (and this is much more difficult) avoid spending more time on it than the matter deserves. We could go either way without upsetting many users - we need a bit more of a sense of perspective.

michaelhkay · 2024-07-15T16:29:13Z

XPath had the best solution with the position() function. You get the index if you need it and it if you do not need it, you do not have to deal with it

If only that were true! At least in XSLT, it's not possible to determine by static analysis whether position() and last() will be used on a particular sequence, so they have to be maintained "just in case". The XQuery solution of binding a position variable explicitly is much better from that point of view.

michaelhkay · 2024-07-18T12:12:28Z

A nice simple use case for positional arguments on fold-left is to implement something like string-join() or intersperse() that uses separators:

declare function fn:intersperse($input as item()*, $separator as item()*) as item()* {
   fold-left($input, (), fn($accum, $next, $pos){$accum, if($pos gt 1) {$separator}, $next})
}

Of course, that doesn't prove that the positional fold is a must-have, but it perhaps illustrates how it might be useful.

ChristianGruen · 2024-07-18T12:55:02Z

In principle, I see three types of use cases:

1. Special treatment of first input

Expressed with Michael’s example, and the one before for intersecting sequences. Code like…

let $input := 1 to 5
return fold-left(tail($input), head($input), fn($a, $b) { $ac '-', $b })

…can instead be written as:

fold-left(1 to 5, (), fn($a, $b, $p) { $a, '-'[$p > 1], $b }),

The input must only be specified once here, which is e.g. helpful when chaining function calls.

2. Positional computations

The availability of the position can be helpful when it contributes to the result:

fold-left((10, 1, 12, 2, 13, 3), 0, fn($acc, $n, $p) {
  $acc + (if($p mod 2) then $n else -$n)
})

It also allows us to write irregular join patterns:

fold-left(1 to 5, (), fn($a, $b, $p) { $a, '-'[$p mod 5 = 0], $b }),

3. Debugging

When developing code, or when writing logs, it can be helpful to output intermediate results along with the number of folds that have already been performed:

fold-left(1 to 5, 0, fn($a, $b, $p) { trace($a + $b, $p || '. ') })

As so often, real use cases are too complex to be represented with simple examples. I experienced that’s a general challenge when teaching folds to newcomers.

All examples could also be written with an additional argument in a recursive function. All folds can be written recursively in general.

dnovatchev · 2024-07-18T15:43:30Z

A nice simple use case for positional arguments on fold-left is to implement something like string-join() or intersperse() that uses separators:
declare function fn:intersperse($input as item()*, $separator as item()*) as item()* {
   fold-left($input, (), fn($accum, $next, $pos){$accum, if($pos gt 1) {$separator}, $next})
}
Of course, that doesn't prove that the positional fold is a must-have, but it perhaps illustrates how it might be useful.

Of course, this is simpler written as a call to the classic left fold with a simpler, 2-arguments action function:

 fold-left(tail($input), $input[1], fn($accum, $next){$accum, $separator, $next})

And this call is noticeably shorter and simpler than the one that uses a 3-arguments action-function.

No conditions, no braces, no need to make it more complex.

So, how can we say that "it might be useful" to enable the user to produce more complex and difficult to understand solutions?

dnovatchev · 2024-07-18T16:04:09Z

@ChristianGruen, Thank you for the provided classification.

Please, see my previous comment to the example by @michaelhkay that falls into group 1. To repeat it here:

The re-write using just the 2-argument action-function is noticeably shorter and simpler than the one that uses a 3-arguments action-function.

No conditions, no braces, no need to make it more complex.

So, how can we say that "it might be useful" to enable the user to produce more complex and difficult to understand solutions?

For the other 2 cases, we have a general and extremely useful solution provided by @michaelhkay ( thanks Mike! ). This is a 2-step solution. The first step produces a map with entries {"item": $input[$pos], "pos": $pos}.

In the 2nd step this map is passed as the $input argument of a function call to fold-left, that uses just a 2-argument action-function.

I strongly support this great way of specifying "positional computation" - it is more readable and understandable. It is less complex.

This is a very helpful discussion that has strongly convinced me even more that Microsoft was right not to add any overloads with a 3-argument action-function to methods of the Enumerable class that are folds or results of executing folds: Aggregate, Any, All, Min, Max, MaxBy, Average.

Was the specification of these 7 methods (without an action function that takes a position-argument) an accidental omission? Obviously not!

ChristianGruen · 2024-07-18T16:41:39Z

This is a very helpful discussion that has strongly convinced me even more

I agree it’s helpful. We had the chance to read a lot about different personal preferences and convictions. Personally, I have learnt to appreciate the enhanced positional flexibility in the past weeks. This is my own practical experience; it’s obviously not up to me to judge about the perception of anyone else. I also know I have no chance to deliver arguments that are convincing enough to everyone reading this.

Similarly, we have no objective criteria to decide whether Microsoft or JavaScript/Kotlin have got it right. I rather think a majority vote may help us here.

What I think we all can do is gather more opinions and interview other users (ideally, without bias). At least for me, it has always been constructive to get to know which recently added features are embraced, rejected and ignored by our user community. The judgements didn’t necessarily reflect my own opinion. As I have already mentioned, the positional argument for fn:for-each has already been appreciated and utilized by someone who writes productive code with our implementation (and I was surprised to learn that fn:for-each is used at all, as I always use for or !).

dnovatchev · 2024-07-18T17:04:13Z

This is a very helpful discussion that has strongly convinced me even more

I agree it’s helpful. We had the chance to read a lot about different personal preferences and convictions. Personally, I have learnt to appreciate the enhanced positional flexibility in the past weeks. This is my own practical experience; it’s obviously not up to me to judge about the perception of anyone else. I also know I have no chance to deliver arguments that are convincing enough to everyone reading this.

Similarly, we have no objective criteria to decide whether Microsoft or JavaScript/Kotlin have got it right. I rather think a majority vote may help us here.

What I think we all can do is gather more opinions and interview other users (ideally, without bias). At least for me, it has always been constructive to get to know which recently added features are embraced, rejected and ignored by our user community. The judgements didn’t necessarily reflect my own opinion. As I have already mentioned, the positional argument for fn:for-each has already been appreciated and utilized by someone who writes productive code with our implementation (and I was surprised to learn that fn:for-each is used at all, as I always use for or !).

Just a reminder that I am not against a function that accepts as an argument a position-sensitive action-function.

I would be happy if we have such a function in case this is a new, different function from the well-established folds.

Having such a separate function will give us a better visibility into its actual frequency of usage - thus we will have really objective data.

ChristianGruen · 2024-07-18T17:13:01Z

Just a reminder that I am not against a function that accepts as an argument a position-sensitive action-function.

Currently, this is the consistent solution, as the other HOF functions have been enhanced with position arguments as well. In this matter, Microsoft and JavaScript have a commonality: They have both made a consistent design choice.

If we believe that the addition of position arguments should be discussed again, the most productive solution could be to discuss it in a separate issue.

dnovatchev · 2024-07-18T17:20:35Z

Just a reminder that I am not against a function that accepts as an argument a position-sensitive action-function.

Currently, this is the consistent solution, as the other HOF functions have been enhanced with position arguments as well. In this matter, Microsoft and JavaScript have a commonality: They have both made a consistent design choice.

If we believe that the addition of position arguments should be discussed again, the most productive solution could be to discuss it in a separate issue.

Yes, as I demonstrated at our meeting this week, the folds definitions either produce totally wrong result or even compile-time errors. (Maybe because of the attempt to add too much complexity and new "functionality" ?)

And yes, this is a problem concerning all folds and therefore scans. Folds are the base for scans, thus we need this resolved first for folds. Discussing this in isolation just for scans is not too-meaningful.

ChristianGruen · 2024-07-18T17:40:24Z

Yes, as I demonstrated at our meeting this week, the folds definitions either produce totally wrong result or even compile-time errors. (Maybe because of the attempt to add too much complexity and new "functionality" ?)

This was definitely an editorial lack of attention on my part. I remember I did not spend enough much time on it. Sorry for that.

And yes, this is a problem concerning all folds and therefore scans.

Not just folds indeed: It affects the full range of sequence functions (more than 10) that have function parameters. It’s been a while ago when we have included it to the spec after the proposal was accepted in the weekly meeting – but you could certainly create a new issue for that to discuss it anew.

dnovatchev · 2024-07-18T18:12:35Z

And yes, this is a problem concerning all folds and therefore scans.

Not just folds indeed: It affects the full range of sequence functions (more than 10) that have function parameters. It’s been a while ago when we have included it to the spec after the proposal was accepted in the weekly meeting – but you could certainly create a new issue for that to discuss it anew.

Hasn't this discussion been going on for the last 2-3 weeks? Aren't we really in the midst of this discussion?

And yes, folds are some specific group of functions, more differentiated than the general group of all sequence functions.

ChristianGruen · 2024-07-18T18:15:27Z

Hasn't this discussion going on for the last 2-3 weeks? Aren't we really in the midst of this discussion?

If I got it right, the discussion in this PR is not positional arguments in general. I believe that moving it into a separate issue would be more productive.

Sorry for closing the PR in between (the Close button is too easy to reach).

michaelhkay requested a review from dnovatchev June 23, 2024 22:57

michaelhkay added Enhancement A change or improvement to an existing feature Tests Needed Tests need to be written or merged XQFO An issue related to Functions and Operators labels Jun 23, 2024

dnovatchev reviewed Jun 24, 2024

View reviewed changes

michaelhkay added 3 commits June 25, 2024 21:03

Rewrite of scan-left and scan-right

411427d

Updated proposal in the light of comments and discussion

4f01b0a

Rebased; minor edits

2b68b96

michaelhkay force-pushed the 982-scan-left-right branch from 1209589 to 2b68b96 Compare June 25, 2024 20:17

dnovatchev reviewed Jun 25, 2024

View reviewed changes

michaelhkay removed the Blocked PR is blocked (has merge conflicts, doesn't format, etc.) label Jun 27, 2024

ChristianGruen closed this Jul 18, 2024

ChristianGruen reopened this Jul 18, 2024

dnovatchev mentioned this pull request Jul 22, 2024

Remove the $position argument from the $action function passed to folds #1341

Open

ChristianGruen added the Blocked PR is blocked (has merge conflicts, doesn't format, etc.) label Sep 6, 2024

982 Rewrite of scan-left and scan-right #1296

Are you sure you want to change the base?

982 Rewrite of scan-left and scan-right #1296

Conversation

michaelhkay commented Jun 23, 2024

dnovatchev commented Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

dnovatchev Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnovatchev Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

dnovatchev Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelhkay commented Jun 24, 2024

michaelhkay commented Jun 24, 2024

dnovatchev commented Jun 24, 2024

dnovatchev commented Jun 24, 2024

Choose a reason for hiding this comment

michaelhkay commented Jun 27, 2024

dnovatchev commented Jun 27, 2024 • edited Loading

ChristianGruen commented Jun 27, 2024

dnovatchev commented Jun 28, 2024

dnovatchev commented Jun 29, 2024

ChristianGruen commented Jul 11, 2024 • edited Loading

dnovatchev commented Jul 11, 2024 • edited Loading

ChristianGruen commented Jul 11, 2024

dnovatchev commented Jul 11, 2024 • edited Loading

ChristianGruen commented Jul 11, 2024

dnovatchev commented Jul 11, 2024

benibela commented Jul 13, 2024

michaelhkay commented Jul 15, 2024 • edited Loading

michaelhkay commented Jul 15, 2024

michaelhkay commented Jul 18, 2024

ChristianGruen commented Jul 18, 2024

1. Special treatment of first input

2. Positional computations

3. Debugging

dnovatchev commented Jul 18, 2024

dnovatchev commented Jul 18, 2024

ChristianGruen commented Jul 18, 2024

dnovatchev commented Jul 18, 2024

ChristianGruen commented Jul 18, 2024

dnovatchev commented Jul 18, 2024

ChristianGruen commented Jul 18, 2024

dnovatchev commented Jul 18, 2024 • edited Loading

ChristianGruen commented Jul 18, 2024 • edited Loading

dnovatchev commented Jun 24, 2024 •

edited

Loading

dnovatchev Jun 24, 2024 •

edited

Loading

dnovatchev Jun 24, 2024 •

edited

Loading

dnovatchev Jun 24, 2024 •

edited

Loading

dnovatchev commented Jun 27, 2024 •

edited

Loading

ChristianGruen commented Jul 11, 2024 •

edited

Loading

dnovatchev commented Jul 11, 2024 •

edited

Loading

dnovatchev commented Jul 11, 2024 •

edited

Loading

michaelhkay commented Jul 15, 2024 •

edited

Loading

dnovatchev commented Jul 18, 2024 •

edited

Loading

ChristianGruen commented Jul 18, 2024 •

edited

Loading