Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doctest allow int/line-number prefix #9430

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

S0AndS0
Copy link

@S0AndS0 S0AndS0 commented Feb 12, 2025

⚠️ nested case does not spark joy and RegExp might be a better choice for this feature request!

This patch aims to formally begin the discussion here about feature parity between Erlang OTP doctest, and the library of the same name maintained by williamthome

Specifically this patch targets > vs 1> syntax for declaring the start of a testable code line or set of lines, ex;

Erlang OTP doctest

-doc """
This test is how we do with `erlang/otp` doctests

```
> true.
true
> totallyOkay.
totallyOkay
```
"""
woot() -> ok.

williamthome doctest

-doc """
This test is how we do with `williamthome/doctest` (prior to version `0.10.0`)

```erlang
1> false.
false
2> thisIsFine.
thisIsFine
```
"""
muchJoy() -> ok.

@CLAassistant
Copy link

CLAassistant commented Feb 12, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

github-actions bot commented Feb 12, 2025

CT Test Results

    2 files     97 suites   1h 8m 53s ⏱️
2 195 tests 2 146 ✅ 47 💤 2 ❌
2 562 runs  2 511 ✅ 49 💤 2 ❌

For more details on these failures, see this check.

Results for commit 9ed5b46.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@S0AndS0
Copy link
Author

S0AndS0 commented Feb 12, 2025

Here are some results of manual testing, because I'm not yet smort enough to run only doctests from the CLI;

  • Mixing of > and INT> seem fine, so I've not broken anything... yet
    > parse_tests([<<"> SomeCode.">>, <<"419> OtherCode.">>], []).
    [{test,[<<"SomeCode.">>,10,<<"OtherCode.">>],"_"}]
  • Multi-line code seems to work fine too
    > parse_tests([<<"68> ReallyLongLine,">>, <<" ConinuedOnAnotherLine.">>], []).
    [{test,[<<"ReallyLongLine,">>,10,
            <<"ConinuedOnAnotherLine.">>],
           "_"}]
    > parse_tests([<<"> ReallyLongLine,">>, <<" ConinuedOnAnotherLine.">>], []).
    [{test,[<<"ReallyLongLine,">>,10,
            <<"ConinuedOnAnotherLine.">>],
           "_"}]
  • Lines that contain > but not prefixed by an integer are ignored as usual
    > parse_tests([<<"Something > OtherThang.">>], []).
    [{test,[],[<<"Something > OtherThang.">>]}]
  • 🐛 Lines prefixed by any number of blank space are considered code
    > parse_tests([<<" wat.">>], []).
    [{test,[<<"wat.">>],"_"}]

    Note; double-checking against pre-PR code this is true there too 🤷

I still be learning the ways of Erlang, and least-bad practices, so any tips towards reducing future costs to maintenance are very much appreciated!

CC: @garazdawi I figured it'd be wise to nudge things specific to OTP parity from williamthome/doctest -- Issue 47 to here ;-)

@garazdawi
Copy link
Contributor

One of the things I like a lot about doctests is that it adds uniformity to the examples. So I think that I would like it to be very oppionionated about how you write examples. So I would like for the examples to either use > or 1>, not a mix. The 1> adds one extra unnecessary number in front, but it makes the examples closer to what you normally see in an Erlang shell.

Which one do you prefer and why?

@garazdawi
Copy link
Contributor

Here are some results of manual testing, because I'm not yet smort enough to run only doctests from the CLI;

1> shell_docs:test(ModuleToTest, []).

@S0AndS0
Copy link
Author

S0AndS0 commented Feb 13, 2025

Which one do you prefer and why?

For compatibility with preexisting doctest tooling and tests I prefer number prefixed code lines, also it should hopefully mean fewer examples need edited. Plus having an incrementing number makes it obvious to readers, and copy/pasters, of examples that there likely is an intended order. Ex;

1> State = "World".
2> Greeter = fun(State) -> "Hello " ++ State end.
3> Result = Greeter(State).
"Hello World"

At the same time I prefer >, without a line number prefix, because it could be used in doc-comments/tests where asserts are independent of one another, ex;

> true.
true
> false.
false

True having both is a more overhead to maintain and parse, and I'm but one opinionated noob ;-)

@williamthome
Copy link
Contributor

williamthome commented Feb 13, 2025

I think just the > is fine and less tedious. Elixir uses the iex> prefix and optionally allows the line numbers. See examples.

Maybe it would be nice to use erl> as the prefix to distinguish between Erlang and Elixir examples when they are on the same documentation, but I'm for the simplicity of the > prefix.


@garazdawi Sorry for the off-topic question below:

Is it possible for the shell to capture pasted content (Ctrl + v)? If so, it would be great if the shell could recognize the first character, and if it is a >, paste all the lines starting with the > in sequence. For instance:

```erlang
> A = 1.
1
> B = 1.
> A + B.
2
```

The shell would then print:

  1. A = 1.
  2. Enter
  3. B = 1.
  4. Enter
  5. A + B.
  6. Enter

It would be nice to have this feature to experiment with copied examples from the documentation.

@michalmuskala
Copy link
Contributor

For the second point, the way it works with ex_doc, is that when you copy the example, the iex> parts are not actually copied. I'm sure something similar could work for Erlang examples
Screenshot 2025-02-13 at 16 27 19

@S0AndS0
Copy link
Author

S0AndS0 commented Feb 14, 2025

Ooof, so seems to be lots of variants that could be supported, but could don't mean should... 🤔 still I'm game to be nerd-sniped for a bit ;-)

TLDR RE = "^(\\w*(?:\\(\\d+\\))?)(>)" and re:split seem to cover all conditions! We either get a list of length 4 with last element being the code, or list of length 1 which should be a different branch's responsibility.

Here's a synopsis of what I could find of variants;

  • Elixer ExUnit v1.12.3
    • iex> starter with ...> code continuation
    • iex(1)> starter with ...(1)> code continuation
  • Erlang williamthome/doctest:v0.9.3
    • 1> starter with .. code continuation
  • Erlang williamthome/doctest:v0.10.2
    • 1> starter with .. code continuation
    • > starter with .. code continuation
  • Erlang OTP 29138f5dd1:lib/stdlib/src/shell_docs_test.erl
    • > starter with code continuation

... Double-checking dates it seems williamthome/doctest is about nine-months old, so perhaps my thinking about backwards compatibility and smooth transition should be adjusted, because the code continuation differences would need tackled too.

Simplicity is nice, and usually I'm all for it, but I like the idea of broader support for documentation tests too 🤷

@williamthome
Copy link
Contributor

Just checked and the iex> prefix in Elixir is because of the Elixir shell (v1.17):

iex(1)> a = ~s"
...(1)> foo
...(1)> "
"\nfoo\n"
iex(2)> b =
...(2)> 1 +
...(2)> 1
2
iex(3)> c = :foo
:foo

The line numbers can be omitted in Elixir doctest:

iex> a = ~s"
...> foo
...> "
"\nfoo\n"
iex> b =
...> 1 +
...> 1
2
iex> c = :foo
:foo

In Erlang shell (OTP-28) it is:

1> A = ~"
   foo
   ".
<<"\nfoo\n">>
2> B =
   1 +
   1.
2
3> C = foo.
foo

Omitting line numbers in Erlang doctest:

> A = ~"
  foo
  ".
<<"\nfoo\n">>
> B =
  1 +
  1.
2
> C = foo.
foo

Erlang should not always adhere to what's implemented in Elixir, but maybe the optional line count support is a good idea. I do not have a strong opinion of the "why", only because of the shell ;)

@S0AndS0
Copy link
Author

S0AndS0 commented Feb 15, 2025

I do not have a strong opinion of the "why", only because of the shell ;)

Having noodled it about in my brain, and skimmed bits of doc-comments within various Erlang OTP sources, I believe the strongest argument I can think of for supporting line-numbers in doctests is reducing Git Diff churn.

Here be but two examples I could find that should just work were we to support both ways;

  • erlang/otp/lib/stdlib/src/re.erl
    1> re:run("aaaaaaaaaaaaaz","(a+)*z").
    {match,[{0,14},{0,13}]}
    2> re:run("aaaaaaaaaaaaaz","(a+)*z",[{match_limit_recursion,5}]).
    nomatch
    3> re:run("aaaaaaaaaaaaaz","(a+)*z",[{match_limit_recursion,5},report_errors]).
    {error,match_limit_recursion}
  • erlang/otp/lib/stdlib/src/binary.erl
    > binary:bin_to_list(<<"erlang">>, {1,3}).
    "rla"
    %% or [114,108,97] in list notation.

And though glancing elsewhere I found a few instances where either would require some touch-work, ex;

  • erlang/otp/lib/stdlib/src/argparse.erl
    1> argparse:parse([], #{arguments => [#{name => myarg, short => $m}]}).
    
    {ok,#{}, ...
    2> argparse:parse([], #{arguments => [#{name => myarg, short => $m, default => "def"}]}).
    
    {ok,#{myarg => "def"}, ...

... I've got a hunch it'll be less touch-work than picking but one style.

@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Feb 17, 2025
@garazdawi
Copy link
Contributor

I do not have a strong opinion of the "why", only because of the shell ;)

Lets do what the shell does, so that we get a similar look and feel as the shell. That is the regexp to match a test start is [0-9]+> . It will cause a bit of churn to change all the modules already using the new doctests, but better now than later. Will you do that change @S0AndS0 ?

For the second point, the way it works with ex_doc, is that when you copy the example, the iex> parts are not actually copied. I'm sure something similar could work for Erlang examples.

@michalmuskala it already works like this :)

@S0AndS0
Copy link
Author

S0AndS0 commented Feb 21, 2025

@garazdawi
Lets do what the shell does... That is the regexp to match a test start is [0-9]+> ... Will you do that change @S0AndS0?

Sure thing! I'll try to get something prettier (more maintainable than my current questionable code suggestions) sorted and submitted for public comment soon™

@S0AndS0
Copy link
Author

S0AndS0 commented Feb 22, 2025

TLDR update I'm blocked by something upstream of shell_docs_test:run_test/2 filtering-out doctest blocks that include line numbers, and though RegExp mostly works as intended something within run_tests(Test, Bindings) or downstream of it does not like the format of {test, ...} tuples edits have returned.

A hint, or two, would be much appreciated!

Unabridged update

I think I won the battle with RegExp, and by doing so also flattened that pesky case statement out of existence!

Like the most recent commit message says, local/isolated testing of shell_docs_test:parse_tests/2 shows changes to be behaving themselves. However, I'm still a bit touched by "skill issues" for getting inputs injected into the changes correctly. Near as I can tell, after adding some io:format/2 sanity checks, the shell_docs_test:test/2 and/or shell_docs_test:run_test/2 functions are now needing attention 🤔

If we get doctest input similar to;

-module(wat).

-export([woot/0]).

-doc """
```erlang
> wat:woot().
false
```
""".
woot() -> woot.

... Then I can observe the run_test(Code, InitialBindings) function gets the doctest lines, but requested RegExp changes means > wat:woot() is filtered out. And adding a line number shows shell_docs_test:run_test/2 never gets any doctest lines, ex;

 -doc """
 ```erlang
-> wat:woot().
+1> wat:woot().
 false
 ```
 """.
 woot() -> woot.

Altering the RegExp to allow for optional line-numbers, (?'line_number'[0-9]+)(?'line_number'(?:[0-9]+)?), and not using line-numbers in doctests, ex...

--define(RE_CAPTURE, "(?'indent'^\s*)(((?'line_number'[0-9]+)(?'prefix'>\s|\s))|(?'prefix'%))(?'content'[^%]*)(?'comment'(?:.*%.*)?)").
+-define(RE_CAPTURE, "(?'indent'^\s*)(((?'line_number'(?:[0-9]+)?)(?'prefix'>\s|\s))|(?'prefix'%))(?'content'[^%]*)(?'comment'(?:.*%.*)?)").
 -define(RE_OPTIONS, [ {capture, [indent, line_number, prefix, content, comment] ,binary}, dupnames ]).
 parse_tests([], []) ->

... does show things are mostly parsed as they use-to. I.E. [{test, ...}] filters back up to shell_docs_test:run_test/2 but then I'm not observing expected failure of woot =:= false comparison.


Edit 🤦 test({pre,[],[{code,Attrs,[<<">",_/binary>> = Code]}]}, Bindings) is what be pre-filtering

@garazdawi
Copy link
Contributor

Let me know when you are ready for a review!

The examples in lists, binary, maps, zstd and shell_docs_test also need to be updated to follow the new style.

@S0AndS0
Copy link
Author

S0AndS0 commented Feb 26, 2025

Thanks for being patient with me!

I think I've got most of the {match, ...} things sorted, and will be happy to update doctests that were previously functional too.

For single line test/match blocks it seems to be mostly doing what it should... mostly... but for reasons I've yet to figure-out it ain't at all happy with mutli-line stuff and I either may need a bit of help or more time to resolve my skill issues x-)

@S0AndS0
Copy link
Author

S0AndS0 commented Mar 1, 2025

I think (hope) changes are ready for review!

Turns out I was over complicating things with the RegExp and stuff... no big surprise there (-x

Updated doctest comments to include line numbers and adjusted indentation too for

  • lib/stdlib/src/binary.erl
  • lib/stdlib/src/lists.erl
  • lib/stdlib/src/maps.erl
  • [¿?] lib/stdlib/src/shell_docs_test.erl please check 49765b5d1e commit message
  • lib/stdlib/src/zstd.erl

Please do let me know if I've broken anything, that wasn't already misbehaving, and I'll do my best to make corrections!

@garazdawi
Copy link
Contributor

We did some changes in the lists module, would you mind rebasing on latest master?

I’ll have a closer look on Monday.

S0AndS0 added 10 commits March 1, 2025 13:49
⚠️ nested `case` does not spark joy and RegExp might be a better
choice for this feature request!

This patch aims to formally begin the discussion here about feature
parity between Erlang OTP doctest, and the library of the same name
maintained by [williamthome](https://github.com/williamthome/doctest/)

Specifically this patch targets `>` vs `1>` syntax for declaring the
start of a testable code line or set of lines, ex;

**Erlang OTP doctest**

````erlang
-doc """
This test is how we do with `erlang/otp` doctests

```
> true.
true
> totallyOkay.
totallyOkay
```
"""
woot() -> ok.
````

**williamthome doctest**

````erlang
-doc """
This test is how we do with `williamthome/doctest`

```erlang
1> false.
false
2> thisIsFine.
thisIsFine
```
"""
muchJoy() -> ok.
````
⚠️ Local testing of changes, in isolation, shows good behavior!
But additional edits, and testing, will be necessary for me to have
confidence in changes.

Change summary between `c23347df2c` and here:

- define RegExp capture and options
- remove `parse_match/2` completely
- add `parse_tests/2` RegExp pattern matching within function prams
- remove `parse_tests/2` nested `case` in favor of recursion
🤦 Now I get what `Match` means in `parse_tests/2` and why for
the calls to `parse_match/2` too!

This _should_ now collect code lines, like it once did, and expected
match line(s)...  Though it seems the last line and/or empty lines now
need ignored or skipped 🤔
Attempting to match in-line comments was a bad idea from my past self,
as was trying to dedent 🤦

⚠️ Still need to sort-out multi-line `Cmd` and `Match`
Still gotta sort-out that issue of multi-line `Code`/`Match` 🤔
⚠️ If we need to support `.. ` command continuation in the future
that'll require touching the `RE_CAPTURE` pattern and adding a few more
variants for `parse_tests/2` too
Spotting the following error/stack-trace popping for;

````erlang
-doc """

```
1> Prebound.
hello
```
""".
````

...  Here's some `io:format` output and resulting error;

```
run_test(Code, InitialBindings)
 | Code -> <<"1> Prebound.\nhello\n">>
 | InitialBindings -> []
 | ReLines -> [ {match, [<<"1> ">>, <<"> ">>, <<"Prebound.">>]},
                {match, [<<>>, <<>>, <<"hello">>]},
                {match, [<<>>, <<>>, <<>>]} ]
 | Tests -> [{test, [<<"Prebound.">>], [<<"hello">>]}]
[{match,1,{atom,1,hello},{block,1,[{var,1,'Prebound'}]}}]
** exception error: {unbound_var,'Prebound'}
     in function  erl_eval:exprs/2 (erl_eval.erl, line 217)
     in call from shell_docs_test:run_tests/2 (shell_docs_test.erl, line 202)
     in call from lists:foldl/3 (lists.erl, line 2146)
     in call from shell_docs_test:test/2 (shell_docs_test.erl, line 153)
     in call from shell_docs_test:test/2 (shell_docs_test.erl, line 153)
     in call from shell_docs_test:parse_and_run/3 (shell_docs_test.erl, line 137)
     in call from shell_docs_test:'-parse_and_run/3-lc$^0/1-0-'/3 (shell_docs_test.erl, line 133)
     in call from shell_docs_test:'-module/2-lc$^0/1-0-'/3 (shell_docs_test.erl, line 122)
```
S0AndS0 added 3 commits March 1, 2025 13:49
Tip, use the following command to filter out this commit's churn;

```bash
git diff --ignore-space-change
```
Usually doing something like;

```bash
git pull --rebase -X theirs origin master
```

...  is fine, but this time Git lost the plot in a few places.

Fortunately it was only one file, so doing a selective show via;

```bash
git show origin/master:lib/stdlib/src/lists.erl > lib/stdlib/src/lists.erl
```

...  And then manually going through to re-add doctest line numbers, and
adjust indentation, wasn't too much of a chore.
For _reasons_ Git felt it important to (re-)introduce trailing spaces
@S0AndS0 S0AndS0 force-pushed the add-doctest-int-prefix branch from 2d96680 to 9ed5b46 Compare March 1, 2025 22:20
@S0AndS0
Copy link
Author

S0AndS0 commented Mar 1, 2025

I don't mind at all, and ya can consider the rebasing done :-)

Looking forward to Monday's review I will, and 'ntil then I'll be wishing y'all a wonderful weekend!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants