Skip to content

Design note: Explicit discard

Herb Sutter edited this page Jul 7, 2023 · 26 revisions

Design summary

Note: This first short section is a brief summary of the current design. The rest of this page provides more detail about requirements and rationale.

Stakes in the ground: Data flow is important, and data-lossy operations should be explicit (certainly never default). In this case, the current design is to always make explicit discards mandatory.

When authoring a function in Cpp2: A Cpp2 function's outputs (its return value(s), and arguments to its inout and out parameters) are always "not discardable," and even when called from Cpp1 code its return values are not discardable. There is no "discardable" opt-out.

When calling a function using x.f() unified function call syntax (UFCS): The return value is always treated as "not discardable", even when calling an existing Cpp1 function that does not say Cpp1 [[nodiscard]].

To explicitly ignore a value: Write _ = to assign it to the "don't care" wildcard. For example: _ = vec.emplace_back(1, 2, 3);. This works for both return values and arguments to inout and out parameters (see examples below).

Design rationale

Requirements

When I considered how to design discarding values, I had the following key requirements.

For consistency:

  • All "out"puts of a function should be treated consistently, which means not only return values but also arguments to inout and out parameters.

For function authors:

  • The default must be "not discardable." Cpp2 has always done this.
  • If there is a way of marking a function return value or inout or out parameter as "discardable," then that syntax should ideally have some symmetry with the call site syntax that discards values (e.g., a hypothetical f: () -> discardable int = ... could perhaps be called with discard f();).

For function callers:

  • There must be a way to explicitly discard a function output.
  • That must work consistently for both unused return values and unused arguments to inout or out parameters.

Rationale for the current design

The design I've chosen to try for Cpp2 is the most elegant I can think of:

  • Stake in the ground: Data flow is important, and so it should always be explicit.
  • Corollary: All outputs are important, so there should be no way for a function to say "discardable." This also simplifies the design by avoiding adding a modifier on a return value, and especially avoids adding a modifier on an inout or out parameter which would add a complication to parameter passing.
  • There is one way to discard: Assign to the _ "don't care" wildcard. For example: _ = func();. This is natural because Cpp2 already uses _ as a "don't-care" wildcard everywhere, and even if I didn't support assigning to the wildcard, users would try to write it.

That's it.

Q: Does this also cover the inout and out parameter cases?

A: Yes. Given:

increment: (inout x) = { x++; }

Consider this call site:

test: () = {
    value := 42;
    increment(value);     // A
    std::cout << value;   // B: ok, prints 43
}

This is okay because there is another subsequent use of value in line B. But what if line B wasn't there? Then line A would be a definite last use, and lo and behold:

test2: () = {
    value := 42;
    increment(value);     // A: error, cannot pass rvalue to 'increment'
}

This is an error. What's going on? Cpp2 recognizes that line A is the last use of value, and therefore automatically treats it as an rvalue... which cannot bind to an inout T (Cpp1 T&) parameter. This is a feature, not a bug -- Cpp2 is correctly diagnosing that a last use of value is not compatible with a parameter that has an out component, because it means no later code looks at the output value.

So, in the current Cpp2 design, what's the answer? Just write the same _ = as usual, on a new line:

test2: () = {
    value := 42;
    increment(value);     // A: now ok
    _ = value;            // B: ok, explicit discard of argument to `inout`
}

So the _ = discard syntax works naturally hand-in-hand with (a) Cpp2 dataflow-direction-based parameter passing and (b) Cpp2 definite last use.

Note: All of the above is equally true if the increment(value) calls are written value.increment() instead, using UFCS. It's all the same.

Experience in C++

Please see this comment about [[nodiscard]] by Jonathan Wakely, ISO C++ committee Library working group chair.

Quoting the highlights:

the Right Thing is to use it widely ...

IMHO compilers should warn about discarded values for all comparison operators, all begin and end accessors, nearly all const member functions, etc. etc. ...

tl;dr do not assume from the absence of [[nodiscard]] attributes in the standard specification that the committee thinks it should be used sparingly. The opposite is true.

Experience in other languages

I spoke with the designers of C#, F#, and Python about their experience.

C# supports explicit discarding with _ = func(); syntax. I asked Mads Torgersen, who currently leads C# language design, and he reports that the C# language doesn't require using that, and that it's common in C# code to implicitly drop a return value. However, C# also ships with an opt-in analyzer rule IDE0058 that does flag all implicitly dropped return values. C# does not offer a way for a function author to mark a return value as either "discardable" or "not discardable."

F# is the most like what I'm trying in Cpp2: F# never allows implicitly discarding return values, but to discard a value you must write |> ignore explicitly. Importantly, however, F# also has to interop nicely with existing APIs that were never designed for that, and still requires explicit discard when using those APIs. I asked Don Syme, designer of F#, whether this has been a significant pain point for F#, and he reports that it has not. That's an encouraging real-world data point. F# does not offer a way for a function author to mark a return value as "discardable."

Python's only special language support for _ is in pattern matching. It does allow writing _ as an ordinary identifier to designate unused elements when doing tuple unpacking, but that's only a programmer convention. I asked Guido van Rossum, designer of Python, about Python's experience with discarding (non-unpacking cases of) return values, and he reported two cases: (1) It occasionally would be useful to flag such things in examples like f.close # forgot to call it!. (2) Python users do regularly complain about calling something that returns a Future (or "awaitable") and forgetting to use await on that result (e.g., data = sock.recv() # should be 'await sock.recv()'!); here the problem is that the Python compiler doesn't know types, and while type checkers (mypy, pyright) do know types they've been reluctant to special-case this because type checkers prefer to focus on checking types rather than linting. Python doesn't have a way for function authors to mark a return value as "not discardable."

This experience in other languages tends to support that:

  • implicitly discarding function outputs is a pitfall at least sometimes in all languages that allow it;
  • allowing "discardable" function outputs is not necessary; and
  • mandatory explicit discarding is not a usability impediment in practice even when using APIs that were not designed for mandatory explicit discard.

Alternatives considered

For some of the interim design discussion, see the Issue 231 comment thread and the Issue #305 comment thread.

Among other things:

I considered allowing a "discardable" opt-out, under various syntaxes. It was hard to find a good name that worked well for inout and out parameters and that was consistent and nice to use at all sites too. It probably could have been done, but it was a real breakthrough when I instead tried the path of "what if there's no discardable opt-out at all?"

I considered various "discard" syntax alternatives, including among others:

  • A language statement or qualifier, such as discard f();.

  • Writing a global helper discard: (_) = {}, which thanks to UFCS could be used as either discard(f()); or the fluent-style f().discard() (the latter similar to F# |> ignore).

In the end, since Cpp2 already had _ as a don't-care wildcard including for local variables that don't need a name (e.g., lock guards), the obvious and discoverable discard syntax just kept naturally gravitating toward _ =. (If I had used a discard f(); or f.discard() style, I would have still identified the _ = style because it's so natural, but emitted an error with a "use ((this other syntax)) instead" helpful diagnostic. Since the motive for that was that _ = would be natural, eventually I stopped fighting it and embraced the natural notation.)

Initial usage experience

When I made this change in cppfront, I had to use explicit discard myself in my Cpp2 code. Here is the initial experience with how often code had to write discards, first in reflect.h2 which is the currently self-hosted part of cppfront, and in the regression tests. Both pretty heavily use the existing C++ standard library, which was not designed for mandatory-nodiscard.

reflect.h2 discard statistics

lines	767
#hits	7
rate	1 per 85 LOC

All the hits were calls to the same function:

7	STL container emplace_back

I actually think this added value: Before this, I hadn't noticed that C++20 had added a return value to emplace_back. My personal taste is that I like having to explicitly say I don't want to use its return value (which, thanks to this, I now know about).

If mandatory discard had been a lot noisier (required a lot more often) I might feel less charitable. But this hit rate of just over 1 hit per 100 LOC didn't feel bad when writing new code.

On to the regression test cases... there's a lot more regression test code, so will the experience be worse? I was surprised to see the result:

Regression test discard statistics

#lines	2,929
#hits	14
rate	1 per 209 LOC

It was actually better, and half of those results were a single function in a somewhat-artificial/"toy" UFCS test harness:

7	calling a local ufcs() test case support function
 		(in pure2-ufcs-member-access-and-chaining.cpp2)
3	STL variant emplace
2	fprintf
1	STL container insert
1	fclose

In most of the lower 7 cases, in my opinion the code readability was better or at least no worse than the previous implicit-discard version. My general reaction even to cases like fprintf was, "oh, right, that has a return value, doesn't it? it just feels right to (easily) explicitly say I'm not going to look at that return value."

So, for my taste at least with the initial ~4 KLOC of examples, this feels like a good experience to me so far. We'll see how the experiment develops as we write more code with mandatory discard...