diff --git a/docs/Commentary/Code-Generation.rst b/docs/Commentary/Implementation/Code-Generation.rst
similarity index 100%
rename from docs/Commentary/Code-Generation.rst
rename to docs/Commentary/Implementation/Code-Generation.rst
diff --git a/docs/Commentary/Compiler.rst b/docs/Commentary/Implementation/Compiler.rst
similarity index 86%
rename from docs/Commentary/Compiler.rst
rename to docs/Commentary/Implementation/Compiler.rst
index 8256200..32f78c8 100644
--- a/docs/Commentary/Compiler.rst
+++ b/docs/Commentary/Implementation/Compiler.rst
@@ -165,17 +165,111 @@ Profile data consists of several collection of info:
 
 * detailed: instruction-level counts, several invocation/backends counts with timestamp, data on branches, call receiver types, typechecks (checkcast, instanceof, aastore). but collecting it adds 35% overhead over just per-method counters
 
+12B. Separately translated units may be assembled into operational systems. It shall be possible for a separately translated unit to reference exported definitions of other units. All language imposed restrictions shall be enforced across such interfaces. Separate translation shall not change the semantics of a correct program.
 
 Whole-Program Compilation - all code must be available at compile-time. This allows several optimizations
-• Enables monomorphization which increases inlining opportunities and avoids the need to box primitives.
-• Enables aggressive dead code elimination and tree shaking which significantly reduces code size.
-• Enables cross namespace/module optimizations.
+
+* Enables monomorphization which increases inlining opportunities and avoids the need to box primitives.
+* Enables aggressive dead code elimination and tree shaking which significantly reduces code size.
+* Enables cross namespace/module optimizations.
 
 In the past, requiring access to the entire source code of a program may been impractical. Today, systems are sufficiently performant that JavaScript, Python, PHP, and Rust have ecosystems where there is no separate compilation, and arguably Java pioneered this model with JIT compilation not paying any attention to module boundaries. Similarly Google and Facebook use monolithic repositories of source code, but have caching optimizations so that developers may use the cloud.
 
+future JIT compiler
+
+specializing adaptive interpreter
+collecting runtime profiling data for user code
+generating the interpreter cases, the main `switch`, from a domain specific language - error handling, reference counts, adding counters, stats
+generate multiple interpreters from a single source of truth
+generate lots of metadata by analyzing that single source of truth
+internal pipeline for detecting, optimizing, and executing hot code paths
+find hot loops in your code, build a trace of that loop, and break it up into a more efficient representation, do some light optimization – and execute it in a second interpreter
+
+Being able to break big complicated bytecode instructions down into more simple atomic steps is only possible because of specialization and defining bytecode instructions in terms of smaller steps
+
+
+Example: Fibonacci function
+def fibonacci(n)
+  a, b = 0, 1
+  for _ in range(n): # inner loop
+    a, b = b, a + b # update a and b by adding them together
+  return a
+
+The bytecode for the loop is something like this:
+FOR_ITER
+STORE_FAST
+LOAD_FAST_LOAD_FAST
+LOAD_FAST
+BINARY_OP
+STORE_FAST_STORE_FAST
+JUMP_BACKWARD
+
+these are generic operations like FOR_ITER and BINARY_OP which have been around for years. But we can specialize these at runtime - like FOR_ITER into FOR_ITER_RANGE and BINARY_OP into BINARY_OP_ADD_INT. Then we build micro-op traces - smaller, more atomic steps that each individual instructions is broken up into.
+
+FOR_ITER_RANGE - _SET_IP, _ITER_CHECK_RANGE, _IS_ITER_EXHAUSTED_RANGE, _POP_JUMP_IF_TRUE, _ITER_NEXT_RANGE
+STORE_FAST - _SET_IP, STORE_FAST
+LOAD_FAST_LOAD_FAST - _SET_IP, LOAD_FAST, LOAD_FAST
+LOAD_FAST - _SET_IP, LOAD_FAST
+BINARY_OP_ADD_INT  - _SET_IP, _GUARD_BOTH_INT, _BINARY_OP_ADD_INT
+STORE_FAST_STORE_FAST - _SET_IP, STORE_FAST, STORE_FAST
+JUMP_BACKWARD - _SET_IP, _JUMP_TO_TOP
+
+So a small instruction expands to two micro-ops, but the more complicated ones may have several different parts. Then we can optimize this - remove redundant frame pointer updates (only needed for JUMP_BACKWARD), remove range checks, remove int guard. And what's left is basically the bare minimum amount of work required to actually execute this hot inner loop. So now that it's translated and optimized, we have to do just-in-time code generation.
+
+
+burning in - encode constants, caches, arguments directly into the machine code, e.g. immediate arguments
+move data off of frames and into registers - eliminate intermediate reads and writes to memory
+eliminate jumps back into the interpreter
+
+options:
+copy-and-patch compilation
+WebAssembly baseline compiler (Liftoff)
+LLVM toolchain (LLVM -O0)
+LuaJIT
+
+see paper for benchmarks, of course multiple tiers are better, but tl;dr is copy-and-patch is a nice middle tier. It is a template JIT compiler. In particular, it works by copying over a static pre-compiled machine code "template" into executable memory, and then going through that machine code and patching up instructions that need to have runtime data encoded in them. This is sort of like the relocation phase of linking/loading an ELF file. And actually we can use LLVM to build an ELF object file and generate our templates. For example:
+
+extern int MAGICALLY_INSERT_THE_OPARG;
+extern int MAGICALLY_CONTINUE_EXECUTION(_PyInterpreterFrame *frame, PyObject **stack_pointer);
+int load_fast(_PyInterpreterFrame *frame, PyObject **stack_pointer)
+{
+  int oparg = &MAGICALLY_INSERT_THE_OPARG;
+  PyObject *value = frame->localsplus[oparg];
+  Py_INCREF(value);
+  *stack_pointer++ = value;
+  __attribute__((musttail)) return MAGICALLY_CONTINUE_EXECUTION(frame, stack_pointer);
+}
+
+So there are extern placeholders for inserting the oparg and continuing execution.
+For the oparg, we use the address of the extern for our oparg. This generates more efficient code because the relocation inserts the constant directly, instead of needing to dereference the address.
+And for continuing execution, we use LLVM's `musttail` so we get a single jump to the next opcode, and even better, if that jump happens to be of length zero, we can just skip the jump entirely. So, the object file that we get out of this looks like this:
+
+.static
+00: 48 b8 00 00 00 00 00 00 00 00 movabsq $0x0, %rax
+0a: 48 98 cltq
+0c: 49 8b 44 c5 48 movq 0x48(%r13,%rax,8), %rax
+11: 8b 08 movl (%rax), %ecx
+13: ff c1 incl %ecx
+15: 74 02 je 0x19 <load_fast+0x19>
+17: 89 08 movl %ecx, (%rax)
+19: 48 89 45 00 movq %rax, (%rbp)
+1d: 48 83 c5 08 addq $0x8, %rbp
+21: e9 00 00 00 00 jmp 0x26 <load_fast+0x26>
+.reloc
+02: R_X86_64_64 MAGICALLY_INSERT_THE_OPARG
+22: R_X86_64_PLT32 MAGICALLY_CONTINUE_EXECUTION - 0x4
+
+We have the machine code, and the relocations, and we know the calling convention. And so we can take this, parse it out and put it in static header files as data, and then we can implement copy and patch for real. There is python code https://github.com/brandtbucher/cpython/tree/justin/Tools/jit (c4904e44167de6d3f7a1f985697710fd8219b3b2) that handles actually extracting all the cases, compiling each one, parsing out the ELF (by dumping with LLVM to JSON), and then generating the header files. Then the final build has no LLVM dependency and is a self-contained JIT. And because clang/LLVM is portable, you can cross-compile for all platforms from Linux, or do whatever.
+
+and you can play with the templates, like compiling super-instructions for common pairs or triples, or adding more oparg holes. and it mixes well with handwritten assembly or a more aggressive compilation strategy, you just special-case the opcode / basic block and say "use this assembly instead".
+
+debugging: can we re-use unwind the ELF tables or DWARF in the JITted code? Also look at Java for how they dealt with debugging
+
 Optimization
 ============
 
+11F. Programs may advise translators on the optimization criteria to be used in a scope. It shall be possible in programs to specify whether minimum translation costs or minimum execution costs are more important, and whether execution time or memory space is to be given preference. All such specifications shall be optional. Except for the amount of time and space required during execution, approximate values beyond the specified precision, the order in which exceptions are detected, and the occurrence of side effects within an expression, optimization shall not alter the semantics of correct programs, (e.g., the semantics of parameters will be unaffected by the choice between open and closed calls).
+
 All software consumes resources: time, memory, compute, power, binary size, and so on. The software's performance may be defined as the rate of resource consumption. There are various measurements of different aspects of performance. E.g. at compile time, one may measure execution time, memory usage, power usage. Similarly at runtime, there are more metrics: execution time, power usage, memory usage, executable size, throughput (work/time), latency (time from request to response). Pretty much anything that can be estimated or measured is fair game. Overall, there is a set of resource constraints and the goal is to maximize performance within those constraints. This performance goal can be a flexible one like "as fast as possible" or a hard one like "cost to operate must be within our budget". Performance is influential and can be a competitive advantage. Many projects track performance metrics and wish to get the best performance possible subject to limitations on manpower.
 
 It would be great to support optimizing the code for any objective function based on some combination of these criteria. But that's hard. So let's look at some use cases:
diff --git a/docs/Commentary/CompilerOutput.rst b/docs/Commentary/Implementation/CompilerOutput.rst
similarity index 98%
rename from docs/Commentary/CompilerOutput.rst
rename to docs/Commentary/Implementation/CompilerOutput.rst
index bbf45a4..46880ba 100644
--- a/docs/Commentary/CompilerOutput.rst
+++ b/docs/Commentary/Implementation/CompilerOutput.rst
@@ -15,7 +15,7 @@ The simplest compiler writes out a file like:
 
 Except in the proper object format.
 
-Then the rest is optimizing/specializing this to run more efficiently.
+Then the rest is optimizing/specializing this to run more efficiently. Particularly, there is `tree shaking <https://en.wikipedia.org/wiki/Tree_shaking>`__. Steelman 1D defines this as "unused and constant portions of programs will not add to execution costs. Execution time support packages of the language shall not be included in object code unless they are called."
 
 Formats
 =======
diff --git a/docs/Commentary/CoreSyntax.rst b/docs/Commentary/Implementation/CoreSyntax.rst
similarity index 100%
rename from docs/Commentary/CoreSyntax.rst
rename to docs/Commentary/Implementation/CoreSyntax.rst
diff --git a/docs/Commentary/Errors.rst b/docs/Commentary/Implementation/Errors.rst
similarity index 94%
rename from docs/Commentary/Errors.rst
rename to docs/Commentary/Implementation/Errors.rst
index 5436f37..1090668 100644
--- a/docs/Commentary/Errors.rst
+++ b/docs/Commentary/Implementation/Errors.rst
@@ -1,6 +1,8 @@
 Errors
 ######
 
+13D. Translators shall be responsible for reporting errors that are detectable during translation and for optimizing object code. Translators shall be responsible for the integrity of object code in affected translation units when any separately translated unit is modified, and shall ensure that shared definitions have compatible representations in all translation units. Translators shall do full syntax and type checking, shall check that all language imposed restrictions are met, and should provide warnings where constructs will be dangerous or unusually expensive in execution and shall attempt to detect exceptions during translation. If the translator determines that a call on a routine will not terminate normally, the exception shall be reported as a translation error at the point of call.
+
 The compiler gives clear and helpful error messages that make it easier for developers to identify and fix problems in their code.
 
 Error levels
@@ -35,6 +37,11 @@ W2 is fine and will not cause any problems upgrading. But W1 and W3 will break t
 
 We can formalize this process with two presets: a 'before' preset warns rather than fatally erroring on any issues that the old compiler didn't bother about, and an an 'after' preset that fatally errors on such issues. Of course this doesn't help with downgrading or cases where the fatal error cannot be turned into a warning.
 
+Automatic fixing
+----------------
+
+Steelman 1B says "Translators shall produce explanatory diagnostic and warning messages, but shall not attempt to correct programming errors." I think the part about not correcting programming errors is stupid - Java IDE's have offered "quick fix" options for ages. Now maybe there is some room for making the process interactive, so that the compiler does not completely mess up the program without prompting, but neither does this mean that auto-fix isn't a goal.
+
 Error types
 ===========
 
diff --git a/docs/Commentary/Fastest.rst b/docs/Commentary/Implementation/Fastest.rst
similarity index 97%
rename from docs/Commentary/Fastest.rst
rename to docs/Commentary/Implementation/Fastest.rst
index 5be41e5..af2b350 100644
--- a/docs/Commentary/Fastest.rst
+++ b/docs/Commentary/Implementation/Fastest.rst
@@ -3,7 +3,7 @@ As fast as C
 
 How do you prove that Stroscot is "as fast as C"? Well, we must show that every C program has a natural Stroscot translation, that performs at least as fast. Since Stroscot is an expressive language there may be many natural translations, and all must perform as fast.
 
-Stroscot should also have fast performance on programs that aren't natural translations of C programs, but this is secondary to the claim. Even the lambda calculus, after years of study, has no optimal implementation on modern machines. So long as the implementation is reasonable the actual performance on non-C-like programs doesn't matter - worst-case, the feature becomes a footgun for people are performance sensitive.
+Stroscot should also have fast performance on programs that aren't natural translations of C programs, but this is secondary to the claim. Even the lambda calculus, after years of study, has no optimal implementation on modern machines. So long as the implementation is reasonable, the actual performance on non-C-like programs doesn't matter - worst-case, the feature becomes a footgun for people are performance sensitive.
 
 Benchmarks
 ==========
diff --git a/docs/Commentary/IR.rst b/docs/Commentary/Implementation/IR.rst
similarity index 71%
rename from docs/Commentary/IR.rst
rename to docs/Commentary/Implementation/IR.rst
index 147bf77..3f10893 100644
--- a/docs/Commentary/IR.rst
+++ b/docs/Commentary/Implementation/IR.rst
@@ -188,6 +188,80 @@ More on IR
 
 Expanding machine code instructions into unpack, mathematical operations, round/repack means that there is a lot of overhead in code generation recognizing patterns of operations as instructions. On the other hand it allows writing a fewer number of more generic and powerful optimizations, instead of many processor-specific instruction patterns. So this choice favors ahead-of-time compilation at the expense of interpretation and JITing.
 
+Lambda lifting/dropping
+-----------------------
+
+Turner writes https://www.cs.kent.ac.uk/people/staff/dat/miranda/manual/30.html: If your script consists of six functions, one of which solves a problem and the other five of which are auxiliary to it, it is probably not a good style to put the five subsidiary functions inside a where clause of the main one. It is usually better to make all six top level definitions, with the important one written first, say. There are several reasons for this:
+
+1. easier to read - six separate chunks of information rather than one big one
+2. easier to debug - each of its functions can be exercised separately, on appropriate test data, within a Miranda session
+3. more robust for future development - for example if we later wish to add a second `main' function that solves a different problem by using the same five auxiliary functions in another way, we can do so without having to restructure any existing code.
+4. in the current implementation, functions defined inside a "where" clause cannot have their types explicitly specified
+
+In practice, programmers tend to use fewer than ten parameters, but little nesting, with a mixture of parameter lifting/dropping and block floating/sinking.
+
+Turner also writes that nested blocks are “slower to compile”, but provides no evidence. Empirically danvy observed the opposite. Across a wide variety of functional languages (Miranda, Standard ML, Moscow ML, Chez Scheme GHC, SCM), block-structured programs were faster to compile and run than the corresponding recursive equations. The number of parameters is a bottleneck both in the compiler's processing and at runtime as procedure-calling conventions have a limited number of registers. This is especially noticeable for automatically produced recursive equations, which can have dozens of parameters.
+
+Lambda lifting transforms a block-structured (let/lambda) program into a program consisting only of global (potentially recursive) combinator definitions / rewrite rules. This global definition style maps more closely to practical machine code than the Y combinator. The main complications are handling free variables, i.e. nested variable scope, and preserving function applications with known (constant) functions.
+
+Algorithm:
+Use unique fresh identifiers to avoid name capture
+Assign a name to each lambda expression, like \x y. e --> let f = \x y. e in f
+For each local definition f, let Ef denote the set of free identifiers that has to be abstracted out of the body of f. The function names themselves are treated as constants, and need not be abstracted out.
+Set up a system of equations involving Ef, Eg etc. like Ef = {a} union Eg, Eg = {b} union Ef.
+Find the least solution of this (transitive closure / repeated substitution, O(n^3)), which produces the values Ef for each f
+In the definition of f: add the free variables Ef as arguments to the definition of f
+For each occurrence of f: change the use f to f applied to its arguments Ef
+Then there are no free variables in any definition body and they can all be lifted (floated) out to global scope, producing a set of recursive equations
+
+Lambda-dropping transforms a set of recursive equations into one with block structure and lexical scope. This can be divided into block sinking, placing each block at its lowest point of use, and parameter dropping, using lexical scope to avoid passing parameters explicitly.
+
+What really matters is the parameter passing and def-use paths. In a fully parameter-lifted program, all def-use paths are made explicit - recursive equations must carry around all the parameters that any of their callees might possibly need. In a fully parameter-dropped program, all def-use paths are made implicit - any variable that can be made a free variable is not passed explicitly.
+
+To summarize, there are 3 intermediate forms:
+- block-structured: uses lexical scope and block nesting (let/letrec/module scope) freely
+- scope-insensitive program: all blocks have no free variables, but may be nested
+- recursive equations: no nested blocks, all live variables explicitly passed as actual parameters.
+
+The general pattern is parameter lift -> block float -> block sink -> parameter drop. This is a proper source-to-source transformation and corresponds to transforming a program into the functional representation of its optimal SSA form.
+
+Parameter lifting makes a block-structured program "scope-insensitive" or "parameter-lifted", so that no function has any free variables. With first-order programs with is done by passing extra variables to each function to account for variables occurring free further down the call path. With higher-order functions there is a little currying trick.
+Block floating eliminates block structure by moving local functions to the enclosing scope level whenever they do not use locally defined functions and do not refer to local free variables. After block floating, each function either is global, contains free variables (defined by an enclosing function), or free functions (defined in the same block). If the source program is parameter-lifted, then a block-floated program degenerates to recursive equations; otherwise, there will be SCCs that cannot float above their enclosing abstractions.
+Block sinking restores block structure by localizing (strongly connected) groups of equations in the call graph. Any set of functions that is referenced by a single function only is made local to this function, by moving the set inside the definition of the function. More precisely, localization stops at the closest common context of all the uses. Going any further would entail code duplication.
+Parameter dropping exploits scope. If every invocation of a function with a formal parameter uses the same argument value, and the argument value is lexically visible at the definition site of the function, that function parameter can be dropped.
+
+Both stages can be computed formally using the call graph. A function f that calls some other function g depends on g. A function g that is dominated by some other function f in the call graph can only be called from the definition of f. The dominator tree of the call graph is thus a tree that induces a new block struc-
+ture into the program. In the new program, each function is declared locally to its predecessor.
+
+For higher-order programs, it requires complex analysis in the general case
+P. Sestoft, Replacing function parameters by global variables
+O. Shivers, Control-flow analysis of higher-order languages or taming lambda
+
+In an imperative program in SSA form, each variable is assigned only at a single point in the program.
+Values from different branches in the ow graph of program are merged using “phi-nodes”.
+A naive algorithm for transforming an imperative program into SSA form creates phi-nodes for all variables, at all merge points in the program.
+Minimizing the number of phi-nodes yields a program in optimal SSA form.
+
+In his textbook on compiler construction [3], Appel connects SSA and functional programming
+an imperative SSA flow graph corresponds to a block-structured functional program, via block-sinking
+optimizing SSA corresponds to parameter-dropping
+
+Together, lambda-lifting, lambda-dropping and control-flow analysis allow one to convert Curry’s fixpoint operator into Turing’s fixpoint operator.
+
+instead of lifting only free variables, one lifts maximally free expressions
+
+In both closure-converted and lambda-lifted programs, lambda abstractions are
+named. Closure conversion differs from lambda-lifting for the following two rea-
+sons:
+•  In a closure-converted program, free variables are passed only when the name is defined. In a lambda-lifted program, free variables are passed each time the
+name is used.
+• Closure conversion only considers the free variables of a lambda-abstraction. Lambda-lifting also considers those of the callees of this lambda-abstraction.
+In the latter sense, lambda-lifting can be seen as the transitive closure of closure conversion.
+
+
+With its inverse translation defined, lambda-lifting acts like CPS. The CPS transformation reveals control-flow information, while lambda-dropping reveals scope information. Similarly to A transformations / A normal form and Sequent IR, is there is a happy medium between lambda-lifted and lambda dropped programs?
+
+
 Sequent Core
 ============
 
@@ -282,3 +356,63 @@ Values
 ======
 
 Since all values are representable in memory, we could use bytes in the IR for values. But this would lose the type information. So instead we must support all the value types listed in :ref:`Values`.
+
+Thorin
+======
+
+higher-order functions - lambda mangling transformation
+non-explicit scope nesting via a dependency graph - control-flow restructuring program transformations are cumbersome to implement otherwise
+
+closure: A record that contains the pointer to a function and value bindings of its free variables
+functions and closures need to be first-class citizens
+
+In Thorin, each definition is a node. Every use is an edge to this node. Each value is a node in a graph and every reference to a value is an edge to this node.
+
+Names are solely present for readability. They have no semantic meaning. Thorin is graph-based with no named variables, therefore does not require explicit scope nesting.
+
+Partial inlining is the crucial step in closure elimination.
+
+control flow form (CFF) Thorin programs can be translated to control flow graphs (CFGs) without dynamically-allocated closures, and CFF-convertible programs are optimizable to CFF using lambda mangling. This superset includes typical higher-order programming idioms like map, fold, range, and generators.
+
+Nameless: The β-reduction λa.(λx.(λa.x)) a ⇒ λa.λb.a requires renaming one of the a's. Without this capture-avoiding substitution, naı̈ve reduction λa.(λx.(λa.x)) a ⇒ λa.λa.a incorrectly does "name capture" of a. The inliner of GHC uses "the rapier" strategy of renaming as little as necessary, but it still adds significant complexity and overhead. Thorin's solution is to use a graph instead of names.
+
+Blockless: Consider adding a conditional branch,
+a (x: int, ret: fn(int)) = ret x
+a (x: int, ret: fn(int)) = let {c = ret x} in branch (x = 0, c, c)
+Similarly consider inlining:
+let f a b = (let g x = x+1 in a + (g b)) in f 1 2
+let {g x = x+1; f a b = a + (g b)} in f 1 2
+let {g x = x+1; f a b = a + (g b)} in 1 + (g 2)
+Languages with block/let nesting require applying "let floating" / "block sinking" normalization to repair the nesting. Although it is called normalization, there is generally not a canonical or minimal IR under these operations - c.f. :cite:`kennedyCompilingContinuationsContinued2007`'s discussion of commuting conversions and conditional expressions. Thorin represents nesting implicitly - if a function f uses a variable defined in another function g, f is implicitly nested in g. Therefore it does not need any floating or sinking normalization - we say Thorin is blockless.
+
+Memory operations or side-effects are explicitly tracked by a functional store [27, 28].
+
+Liveness: We need to know all direct and indirect dependencies from the view of a function. We call all functions which are live from the view of an another function f the scope of f. We call f the entry function of its scope.
+
+Lambda Mangling: Lambda mangling is a combination of lambda lifting and dropping. We take a map M, then each parameter is cloned to a fresh identifier and rewritten to its looked up form.
+
+for constant parameters we drop the parameter
+then we do local optimizations (constant propagation, common subexpression elimination)
+to eliminate function dependencies we parameter-lift the return address
+
+Recursion
+
+In a CPS program, all calls are tail-call form, like ``foo (...)``, but some occurrences of functions are not, e.g. they capture state in closures.
+Thorin defines as follow:
+1. Create a node for each Thorin function
+2. For each function f, draw an edge to all functions which occur in f's body.
+A strongly connected component (SCC) in this graph forms a recursion.
+We classify:
+recursive call: Any call within an SCC
+simple recursive call: A recursive call within the scope of the callee
+mutual recursive call: All other recursive calls
+static parameter: A parameter which does not change its value within an SCC
+first-order recursive call: A recursive call which only uses static parameters as higher-order arguments
+loop: An SCC formed by recursive calls that are only of first order
+
+we classify Thorin functions in the following way:
+basic block (BB)-like function: A first-order function.
+returning function: A second-order function with exactly one first-order parameter.
+top-level function: A function whose scope does not contain free variables.
+bad function: A function that is not BB-like, top-level, or returning.
+control flow form: A scope that does not use bad functions.
diff --git a/docs/Commentary/Implementation/Implementation.rst b/docs/Commentary/Implementation/Implementation.rst
new file mode 100644
index 0000000..aba5a7d
--- /dev/null
+++ b/docs/Commentary/Implementation/Implementation.rst
@@ -0,0 +1,10 @@
+Implementation
+##############
+
+Steelman 1F. "The language shall be composed from features that [...] can be implemented. [...] To the extent that it does not interfere with other requirements, the language shall facilitate the production of translators that are easy to implement and are efficient during translation."
+
+13C. Translators shall implement the standard definition. Every translator shall be able to process any syntactically correct program. Every feature that is available to the user shall be defined in the standard, in an accessible library, or in the source program.
+
+13E. Translators for the language will be written in the language and will be able to produce code for a variety of object machines. The machine independent parts of translators should be separate from code generators. Although it is desirable, translators need not be able to execute on every object machine. The internal characteristics of the translator (i.e., the translation method) shall not be specified by the language definition or standards.
+
+13F. Translators shall fail to translate otherwise correct programs only when the program requires more resources during translation than are available on the host machine or when the program calls for resources that are unavailable in the specified object system configuration. Neither the language nor its translators shall impose arbitrary restrictions on language features. For example, they shall not impose restrictions on the number of array dimensions, on the number of identifiers, on the length of identifiers, or on the number of nested parentheses levels.
diff --git a/docs/Commentary/Profiling.rst b/docs/Commentary/Implementation/Profiling.rst
similarity index 100%
rename from docs/Commentary/Profiling.rst
rename to docs/Commentary/Implementation/Profiling.rst
diff --git a/docs/Commentary/Reduction-Example.rst b/docs/Commentary/Implementation/Reduction-Example.rst
similarity index 100%
rename from docs/Commentary/Reduction-Example.rst
rename to docs/Commentary/Implementation/Reduction-Example.rst
diff --git a/docs/Commentary/Reduction.rst b/docs/Commentary/Implementation/Reduction.rst
similarity index 100%
rename from docs/Commentary/Reduction.rst
rename to docs/Commentary/Implementation/Reduction.rst
diff --git a/docs/Commentary/Implementation/TermRewriting.rst b/docs/Commentary/Implementation/TermRewriting.rst
new file mode 100644
index 0000000..2583d94
--- /dev/null
+++ b/docs/Commentary/Implementation/TermRewriting.rst
@@ -0,0 +1,39 @@
+Term rewriting
+##############
+
+Cycle detection
+===============
+
+So the first strategy is to prove the system is acyclic. Per :cite:`ketemaViciousCirclesRewriting2005` this includes orthogonal weakly head normalizing higher-order TRSs.
+
+For general-purpose detection there are SCC computation algorithms; Wikipedia has a `list <https://en.wikipedia.org/wiki/Strongly_connected_component#Algorithms>`__. The DFS algorithms seem most appropriate as they can naturally be maintained during the reduction graph search; finding the normal forms of a term essentially already is a DFS. Kosaraju's algorithm is not appropriate as computing the transpose / converse of the reduction relation is not easy. Comparing Tarjan and the path-based algorithms, Tarjan uses a second index (pointer) while the path-based uses a stack. The stack manipulation of the path-based algorithm is simpler to understand than the invariants of Tarjan; the Wikipedia page for Tarjan is constantly vandalized with people who do not understand it. So I would say the path-based algorithm is better.
+
+For associativity and commutativity there are special unification algorithms, where we represent terms as lists or bags rather than trees. There are some PhD theses and so on for this, Maude has references. I would say these are optimizations and for now acyclic detection plus general-purpose cycle handling is sufficient.
+
+Nondeterminism
+==============
+
+This sounds a bit tricky to implement but it is not too bad. We can split into two steps:
+
+* Find a value: Evaluate the expression to a value, any value including exceptional values. Because of the meaningless term reduction, every expression will evaluate to some kind of value. The search should be biased towards finding non-exception values but it does not need to be perfect, for example there are reduction strategies such as parallel outermost that are guaranteed to be normalizing for some classes of TRS. This is where cycle detection and infinite value handling come in.
+* Non-determinism check: We can analyze confluence and the reduction sequence of the value to see if the expression can evaluate to anything else. If there are no other values or all other values are exceptions, we are done; otherwise, we handle the non-determinism appropriately, such as replacing an exception with a non-exception, erroring on multiple non-exception values at the top-level, collecting and returning the multiple values if inside a logical combinator such as ``allOf``, or skipping the non-determinism check entirely for ``oneOf``.
+
+Check for non-exception values: Here is where we have to analyze all reduction sequences for the possibility of avoiding exceptions. , so if these apply and the expression evaluates to an exception we can rule out non-exception values. Also,
+* Nondeterminism check: If , collect all of the possible values;  skip the check; if at the top-level, analyze confluence to see if the expression can evaluate to multiple distinct non-exception values, and error if multiple values.
+* Find exceptional value:  There may be multiple exceptional values but we don't care; the implementation is just allowed to pick one arbitrarily.
+
+The common notions of an ARS carry over to infinitary reductions: :cite:`endrullisInfinitaryTermRewriting2014`
+
+* transitive reduction: irreflexive kernel of reduction closure
+* normal form: irreducible term
+* strongly normalizing (terminating): every infinite reduction sequence has a limit
+* nonterminating reduction: infinite reduction sequence with no limit or that does not reduce to its limit
+* weakly normalizing (normalizing): every term has a reduction to a normal form
+* confluence: if t reduces to t1 and t2, then there is a common term s such that t1 and t2 reduce to s.
+* Church-Rosser: if t1 is equivalent to t2, then there is a common term s such that t1 and t2 reduce to s.
+* normal form property w.r.t. reduction:: if u reduces to t and s, and s is a normal form, then t reduces to s
+* normal form property: if t is equivalent to s and s is a normal form, then t reduces to s
+* unique normalization w.r.t. reduction: if t reduces to t1 and t2, and t1, t2 are normal forms, then t1=t2
+* unique normalization: if t1 is equivalent to t2, and t1, t2 are normal forms, then t1=t2
+
+However common theorems such as Newman's lemma do not, so it is not clear how useful these are.
diff --git a/docs/Commentary/Tools.rst b/docs/Commentary/Implementation/Tools.rst
similarity index 96%
rename from docs/Commentary/Tools.rst
rename to docs/Commentary/Implementation/Tools.rst
index d60b25e..572df0e 100644
--- a/docs/Commentary/Tools.rst
+++ b/docs/Commentary/Implementation/Tools.rst
@@ -1,6 +1,8 @@
 Tools
 #####
 
+13G. The language should be designed to work in conjunction with a variety of useful software tools and application support packages. These will be developed as early as possible and will include editors, interpreters, diagnostic aids, program analyzers, documentation aids, testing aids, software maintenance tools, optimizers, and application libraries. There will be a consistent user interface for these tools. Where practical software tools and aids will be written in the language. Support for the design, implementation, distribution, and maintenance of translators, software tools and aids, and application libraries will be provided independently of the individual projects that use them.
+
 Nowadays every language comes with a tool suite, because a compiler alone is not enough. As Walter Bright points out, if it's a separate download, what happens is people never use it. A third-party tool is never installed, or it's the wrong version. Users do not want to read the manual for a third party tool and will never consider using it. But when it's built into the language, they magically start using it. And there's synergies, like you can integrate the documentation with the style checker and stuff like that. The tools can all be integrated with the compiler and use the same parser, ABI, etc. so that they are always compatible.
 
 Bright says this is transformative - it's like a night and day difference. And the other advantage is, there's only one. If you leave it to third parties, you get an endless number of tools, all different. Two teams each pick a different documentation generator and now they can't be merged or they can't work together and then you'll have a big fight and what a waste of time. The built-in tool may not be the best, but it's always there. By building it in you sort of browbeat people into using it.
diff --git a/docs/Commentary/Implementation/index.rst b/docs/Commentary/Implementation/index.rst
new file mode 100644
index 0000000..af88d7b
--- /dev/null
+++ b/docs/Commentary/Implementation/index.rst
@@ -0,0 +1,10 @@
+Implementation
+##############
+
+This section is focused on implementation details of the language - what algorithms and concepts are needed to efficiently run the semantics.
+
+.. toctree::
+  :maxdepth: 2
+  :glob:
+
+  *
\ No newline at end of file
diff --git a/docs/Commentary/sv.md b/docs/Commentary/Implementation/sv.md
similarity index 100%
rename from docs/Commentary/sv.md
rename to docs/Commentary/Implementation/sv.md
diff --git a/docs/Commentary/Aspects.rst b/docs/Commentary/Language/Aspects.rst
similarity index 100%
rename from docs/Commentary/Aspects.rst
rename to docs/Commentary/Language/Aspects.rst
diff --git a/docs/Commentary/Assembly.rst b/docs/Commentary/Language/Assembly.rst
similarity index 97%
rename from docs/Commentary/Assembly.rst
rename to docs/Commentary/Language/Assembly.rst
index 9ddeb09..7722843 100644
--- a/docs/Commentary/Assembly.rst
+++ b/docs/Commentary/Language/Assembly.rst
@@ -1,6 +1,12 @@
 Assembly
 ########
 
+8D. The language shall not require the presence of an operating system. [Note that on many machines it will be necessary to provide run-time procedures to implement some features of the language.]
+
+8E. There shall be a few low level operations to interrogate and control physical resources (e.g., memory or processors) that are managed (e.g., allocated or scheduled) by built-in features of the language.
+
+11E. There shall be a machine independent interface to other programming languages including assembly languages. Any program element that is referenced in both the source language program and foreign code must be identified in the interface. The source language of the foreign code must also be identified.
+
 Creating a high-level programming language with absolutely no runtime overhead is a challenging task. For example C has runtime overhead: the generated instruction sequences may not be optimal, memory management with malloc/free may be slower than a custom allocator, data type layouts may not be optimal, function calls follow the C calling convention which is not necessarily the fastest, and the standard library may be slower than hand-written routines.
 
 So writing code in assembly can be advantageous. It is best to think of a high-level programming language as a macro assembler with particularly complex macros - this begs the question of why most languages make it so hard to use inline assembly. Even in a JIT-style execution model, some amount of code is compiled to assembly. The difference is there are also jumps into the interpreter, which sort of "glues together" regions of compiled code.
diff --git a/docs/Commentary/Concurrency.rst b/docs/Commentary/Language/Concurrency.rst
similarity index 88%
rename from docs/Commentary/Concurrency.rst
rename to docs/Commentary/Language/Concurrency.rst
index 360d454..f267919 100644
--- a/docs/Commentary/Concurrency.rst
+++ b/docs/Commentary/Language/Concurrency.rst
@@ -1,11 +1,17 @@
 Concurrency
 ###########
 
+
+
 As Go says, the rise of multicore CPUs means that a language should provide first-class support for concurrency and parallelism. But concurrency and multi-threaded programming have over time developed a reputation for difficulty. So let's get it right.
 
 Model
 =====
 
+9A. It shall be possible to define parallel processes. Processes (i.e., activation instances of such a definition) may be initiated at any point within the scope of the definition. Each process (activation) must have a name. It shall not be possible to exit the scope of a process name unless the process is terminated (or uninitiated).
+9B. The parallel processing facility shall be designed to minimize execution time and space. Processes shall have consistent semantics whether implemented on multicomputers, multiprocessors, or with interleaved execution on a single processor.
+9D. The semantics of the built-in scheduling algorithm shall be first-in-first-out within priorities. A process may alter its own priority. If the language provides a default priority for new processes it shall be the priority of its initiating process. The built-in scheduling algorithm shall not require that simultaneously executed processes on different processors have the same priority. [Note that this rule gives maximum scheduling control to the user without loss of efficiency. Note also that priority specification does not impose a specific execution order among parallel paths and thus does not provide a means for mutual exclusion.]
+
 The general idea with concurrency is there are multiple threads of execution. But practically there are several types of threads:
 
 * An OS thread is the basic unit to which the operating system allocates processor time. A thread can execute any part of the process code, including parts currently being executed by another thread. A thread may be bound to a core or have that decided by the OS. There is thread-local storage but generally fiber-local storage should be preferred. (`1 <https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=102989>`__, :cite:`nishanovFibersMagnifyingGlass2018` section 2.3.1)
@@ -26,6 +32,12 @@ As far as storage, all immutable data exists in ambient space and is transparent
 Concurrent operations
 =====================
 
+9C. It shall be possible efficiently to perform mutual exclusion in programs. The language shall not require any use of mutual exclusion.
+9E. On any control path, it shall be possible to delay until at least a specified time before continuing execution.
+9H. It shall be possible to pass data between processes that do not share variables. It shall be possible to delay such data transfers until both the sending and receiving processes have requested the transfer.
+9I. It shall be possible to set a signal (without waiting), and to wait for a signal (without delay, if it is already set). Setting a signal, that is not already set, shall cause exactly one waiting path to continue.
+9J. It shall be possible to wait for, determine, and act upon the first completed of several wait operations (including those used for data passing, signalling, and real time).
+
 At the lowest level, a thread uses a combination of hardware and OS operations. The hardware operations are read/write on mutable shared memory, and various memory barrier/fence instructions. The OS syscalls are a mess but can be roughly tagged as files, networking, process, (shared) memory, permissions, system clock, futex, System V IPC, and signals. In the cloud, a thread uses mostly network-centric operations, message-passing between nodes, while on a node the operations are mostly shared memory operations due to current machine architectures.
 
 There are many higher-level I/O concurrency abstractions: mutexes, condition variables, channels, MVars, transactional memory. These high-level interfaces are conceptually much simpler to reason about than mutable shared memory, in particular avoiding aliasing, even if there is still mutable shared memory under the covers. But mutable shared memory is a key feature of modern C++ concurrency implementations and it would significantly reduce expressiveness to forbid it from Stroscot. Sharing is caring.
diff --git a/docs/Commentary/Dispatch.rst b/docs/Commentary/Language/Dispatch.rst
similarity index 91%
rename from docs/Commentary/Dispatch.rst
rename to docs/Commentary/Language/Dispatch.rst
index ad590b0..c850367 100644
--- a/docs/Commentary/Dispatch.rst
+++ b/docs/Commentary/Language/Dispatch.rst
@@ -222,6 +222,8 @@ A lambda or builder works for anything tricky.
 Implicit arguments
 ------------------
 
+12D. Functions, procedures, types, and encapsulations may have generic parameters. Generic parameters shall be instantiated during translation and shall be interpreted in the context of the instantiation. An actual generic parameter may be any defined identifier (including those for variables, functions, procedures, processes, and types) or the value of any expression.
+
 These behave similarly to arguments in languages with dynamical scoping. Positional arguments can be passed implicitly, but only if the function is used without applying any positional arguments. If the LHS contains positional arguments only that number of positional arguments are consumed and they are not passed implicitly.
 
 ::
@@ -298,6 +300,10 @@ Infix operators can accept implicit arguments just like prefix functions:
 Output arguments
 ----------------
 
+7F. There shall be three classes of formal data parameters: (a) input parameters, which act as constants that are initialized to the value of corresponding actual parameters at the time of call, (b) input-output parameters, which enable access and assignment to the corresponding actual parameters, either throughout execution or only upon call and prior to any exit, and (c) output parameters, whose values are transferred to the corresponding actual parameter only at the time of normal exit. In the latter two cases the corresponding actual parameter shall be determined at time of call and must be a variable or an assignable component of a composite type.
+
+7I. Aliasing shall not be permitted between output parameters nor between an input-output parameter and a nonlocal variable. Unintended aliasing shall not be permitted between input-output parameters. A restriction limiting actual input-output parameters to variables that are nowhere referenced as nonlocals within a function or routine, is not prohibited.
+
 ::
 
   b = out {a:3}; 2
@@ -437,7 +443,13 @@ Guards are handled by calling next-method on failure:
 Overloading
 ===========
 
-Overloading is having methods with the same name but different type signatures. A special case is operator overloading, so you can add both floats and ints with ``+``. Even `the Go FAQ <https://go.dev/doc/faq#overloading>`__, an opponent of overloading, admits that overloading is useful and convenient. Overloading enables ad-hoc polymorphism and solves the expression problem. It is incredibly useful.
+Overloading is having methods with the same name but different type signatures. A special case is operator overloading. Even `the Go FAQ <https://go.dev/doc/faq#overloading>`__, an opponent of overloading, admits that overloading is useful and convenient. Per :cite:`ichbiahRationaleDesignADA1979` there are several reasons:
+
+* The careful use of overloading can be a valuable asset for readability since it offers an additional degree of freedom for the choice of suggestive identifiers that convey the properties of the entities they denote.
+* It is natural to use the same name for an output operation such as `print`` over many different types, or a numeric operation such as + over floats and ints. (ad-hoc polymorphism)
+* [edited] The writers of a package can certainly not foresee all possible uses. In particular they may not (and should not) be aware of the fact that their package is used with overloaded operations at a different type than the authors envisioned. Nonetheless extending a package to a novel user-defined type is in general quite valuable. The developer should be able to use suggestive names so as to reuse code with no changes necessary. Overloading achieves this reuse.
+
+To these I would add that overloading solves the expression problem: if all functions are open, then naturally it is easy to add both new operations and novel instances of operations at new types.
 
 Although in many cases using separate function names can lead to a clearer API, supporting overloading does not prevent this. And there are cases where overloading is the right choice. Suppose you are trying to wrap a Java library that makes heavy use of overloading. Name mangling using simple rules will give relatively long names like ``divide_int_int``. More complicated rules will give shorter name like ``div_ii``, but the names will be hard to remember. Either way, overloading means no mangling is needed in the API at all, a strictly better alternative. Similarly the Go API is full of families of functions like ``FindAll : [Byte] -> [[Byte]], FindAllString : String -> [String]`` which differ only by their type. These additional name parts cannot be guessed easily and require looking up the API docs to determine the appropriate version. Overloading means only the base name needs to be remembered.
 
@@ -491,6 +503,12 @@ Also, this dispatch applies to implicit parameters. So when a function takes an
 
 This context sensitivity does make moving code between files more difficult. Generally, you must copy all the imports of the file, as well as the code, and fix any ambiguity errors. Actually, that happens already in other languages. But it's really terrible if there are implicit keywords in use - then you will to figure it out in debugging. But the IDE can probably assist here.
 
+Steelman 1C says "[The language] shall require explicit specification of programmer decisions and shall provide defaults only for instances where the default is stated in the language definition, is always meaningful, reflects the most frequent usage in programs, and may be explicitly overridden." The decisions part seems suspect - programmers make lots of decisions, like what to eat for lunch, and these should not all be specified. For the defaults, the method of choosing a specific overload is stated in the language definition. It will always be a user-written definition, so it is of course meaningful. "Most frequent usage" is sort of a cop out as not all usages are common yet programmers nonetheless expect them to work. But it is certainly the case with the nondeterminism solution described later that only correct usages will work, and thus those will be the most frequent.
+
+The explicit overrides part does make sense, and so there is a syntax for referring to clauses within a specific file and even a specific clause within a file.
+
+There is also Steelman 2D "Multiple occurrences of a language defined symbol appearing in the same context shall not have essentially different meanings." This we get around by making all overloaded symbols library-defined.
+
 Return type overloading
 =======================
 
@@ -568,6 +586,8 @@ Despite the naysayers I still like my original idea of using nondeterminism: an
 Implicit conversions
 ====================
 
+Steelman 3B: "There shall be no implicit conversions between types."
+
 Implicit conversions per Carbon are defined as "When an expression appears in a context in which an expression of a specific type is expected, the expression is implicitly converted to that type if possible." So for example you have ``foo : i64 -> String`` and ``i : u32`` and you can write ``foo i`` and it is equivalent to ``foo (convert_u32_to_i64 i)``. (Assume here that ``i64`` and ``u32`` are disjoint)
 
 There are some issues with implementing this in Stroscot. First is that Stroscot has "sets" rather than "types". Second is that Stroscot has no expectations, only runtime exceptions. So revising this definition it is something like "When an expression appears in a context which throws an exception on certain values, the expression is implicitly converted to not be one of those exception-causing values." So for example we have ``foo (i : i64) = ...`` and ``i = u32 1``. ``foo i`` evaluates to an no-matching-clause exception, so again the expression is implicitly converted by inserting ``foo (convert i)``. Notice we have lost some information - we do not know that ``foo`` expects an ``i64`` or the type of ``i``. In ``convert``, we have the actual value of ``i``, which is for our purposes probably better than its type. But to produce a ``u64`` it is the return-type-overloading problem above, and we will have to follow the same solution patterns of a blob value or nondeterministically producing all implicit conversions. If there are multiple valid implicit conversions though, then it gets very complex to prove they are equivalent (and maybe they aren't - like a print command, it will of course print the full value and give different results for different values).
@@ -603,7 +623,7 @@ The way Stroscot optimizes dispatch is:
 
 The standard vtable implementation of Java/C++ arises naturally as a table dispatch on a method name. It looks like 'load klass pointer from object; load method from klass-vtable (fixed offset from klass pointer); load execution address from method (allows you to swap execution strategies, like interp-vs-C1-vs-C2, or multiple copies of C2 jit'd depending on context); jump'. But usually we can do better by building a custom table and fast-pathing the hot cases.
 
-Cliff says what we do for overloaded calls doesn't matter so long as in practice, >90% of calls are statically resolved to one clause.
+Cliff says what we do for overloaded calls doesn't matter so long as in practice, >90% of calls are statically resolved to one clause. As Steelman 1D puts it, we must "avoid execution costs for available generality where it is not needed".
 
 THe inline-cache hack observes that, while in theory many different things might get called here, in practice there's one highly dominant choice.  Same for big switches: you huffman-encode by execution frequency.
 
diff --git a/docs/Commentary/Evaluation-Strategy.rst b/docs/Commentary/Language/Evaluation-Strategy.rst
similarity index 99%
rename from docs/Commentary/Evaluation-Strategy.rst
rename to docs/Commentary/Language/Evaluation-Strategy.rst
index eb03c1a..92bd9bc 100644
--- a/docs/Commentary/Evaluation-Strategy.rst
+++ b/docs/Commentary/Language/Evaluation-Strategy.rst
@@ -15,7 +15,6 @@ This page summarizes the arguments for different types of evaluation strategies:
 
 Traditionally a function is only defined on values, but lazy evaluation allows functions to produce useful behavior for non-values as well.
 
-
 Pure vs impure
 ==============
 
diff --git a/docs/Commentary/Exceptions.rst b/docs/Commentary/Language/Exceptions.rst
similarity index 95%
rename from docs/Commentary/Exceptions.rst
rename to docs/Commentary/Language/Exceptions.rst
index d6d5887..dd9b1aa 100644
--- a/docs/Commentary/Exceptions.rst
+++ b/docs/Commentary/Language/Exceptions.rst
@@ -1,11 +1,19 @@
 Exceptions
 ##########
 
+10A. There shall be an exception handling mechanism for responding to unplanned error situations detected in declarations and statements during execution.
+
 Exceptions allow us to focus on the normal (non-exceptional) case when performing operations that might fail. The downside is that it is more tricky to reason about how the failure cases are handled, because the handling code may be many levels of function calls removed from the operation.
 
 Exception menagerie
 ===================
 
+10A. The exception situations shall include errors detected by hardware, software errors detected during execution, error situations in built-in operations, and user defined exceptions.
+
+10B. The errors detectable during execution shall include exceeding the specified range of an array subscript, exceeding the specified range of a variable, exceeding the implemented range of a variable, attempting to access an uninitialized variable, attempting to access a field of a variant that is not present, requesting a resource (such as stack or heap storage) when an insufficient quantity remains, and failing to satisfy a program specified assertion. [Note that some are very expensive to detect unless aided by special hardware, and consequently their detection will often be suppressed (see 10G).]
+
+10C. Exceptions that can be raised by built-in operations shall be given in the language definition.
+
 The first question is what an exception is. Let us try to classify all the exceptions, everything that can use an exception-like semantic. The set of exceptions is unquestionably large so this list is probably incomplete.
 
 Failure
@@ -138,6 +146,8 @@ These are thrown in the runtime or core standard libraries when safety invariant
 Cancellation
 ------------
 
+9G. It shall be possible to terminate another process. The terminated process may designate the sequence of statements it will execute in response to the induced termination.
+
 SIGKILL/SIGSTOP cannot be blocked or handled by the program, so aren't exceptions. Similarly C's ``exit`` function and the Linux ``exit`` syscall always shut the program down and don't return. But Haskell provides a ``ProcessCancelled`` exception that propagates normally and does a hard process exit when it reaches the top level. Often processes are too coarse and one wishes to gracefully cancel a thread, so there is also a ``ThreadCancelled`` exception. Even finer is a ``TaskCancelled`` exception for a task runner library.
 
 Generally with a cancellation exception you should only do cleanup. Cancellation is a message from outside of your current execution saying “you must die as soon as possible.” If you swallow the exception, you break the very nature of the cancellation mechanism. Similarly cleanup in response to cancellation should be minimal, avoiding long pauses, to ensure quick cancellation.
@@ -266,6 +276,8 @@ The most common behavior is unwinding, followed by containment or termination. R
 Traces
 ======
 
+10A. Exception identifiers shall have a scope.
+
 A trace is built by keeping track of the exception as it propagates. The semantics are a little different with lazy evaluation because the propagation is demand-driven, but should be close enough. E.g. ``case {}.x of 1 -> ...`` produces ``MissingCaseException { trace = NoSuchAttributeException {...}, ...}``. With fancy formatting the nested exceptions will look like a stacktrace. Space considerations limit the depth and detail of stack traces.  For example if you accumulate over an infinite list, traces are theoretically infinite, but properly the trace display should compress this somehow. Similarly tail calls mean entries may be added or missing. So the trace is a best-effort guess subject to compiler whims - it has no formal contract. Traces are mainly useful as a light reminder to the programmer of where to look in the code in a large codebase.
 
 Alas, building a trace is expensive. Throwing an exception should be cheap. What do?
@@ -282,6 +294,8 @@ But the main solution IMO is to determine that the exception is caught by a hand
 Top level
 =========
 
+10C. Exceptions that are not handled within a process shall terminate the process.
+
 There is always a top-level catch-all exception handler, which is guaranteed to not throw during handling. So exceptions never pop off the whole stack.
 
 For a stateful function, the top-level handler has to figure out what to do in the case of an exception. For the most part the exceptions are known in advance, so this simply means running the exception or failure continuation in the ``Task`` instead of the success continuation. The failure continuation will in turn most likely retrieve the exception from the state and return it to the program continuation as an exception value. But the failure continuation could also stop the program or do something completely separate from the main program.
@@ -289,6 +303,8 @@ For a stateful function, the top-level handler has to figure out what to do in t
 Automatic propagation
 =====================
 
+10C. Raising an exception shall cause transfer of control to the most local enclosing exception handler for that exception without completing execution of the current statement or declaration, but shall not of itself cause transfer out of a function, procedure, or process. Exceptions that are not handled within a function or procedure shall be raised again at the point of call in their callers.
+
 The `Swift error handling rationale <https://github.com/apple/swift/blob/main/docs/ErrorHandlingRationale.md#kinds-of-propagation>`__ tries to say that there are multiple dimensions of error handling. But considering propagation, there are really only two choices:
 
 * Manual, marked, explicit: The language requires all potential exception sites to be syntactically marked with a visible operator or control structure.
@@ -308,10 +324,6 @@ One example is the case where you are in a call chain several levels deep, in a
 
 Another example is quick scripting. Let's say you're doing some data analysis: you open the file, read the CSV, add some numbers, and write out another file. Almost every operation here does some OS syscall or invokes some assembly instruction and can fail. Particularly, a syscall can throw tons of different exceptions, so properly handling all exceptions would require more code than the original script. With automatic propagation, you just write the "happy path" and all exceptions propagate implicitly and crash the program. With manual propagation, you would at least have to mark each OS call, and probably mark the addition as assert-non-overflowing too.
 
-How does automatic propagation work? Well, the closest analogue is that it unwinds the stack similarly to Java's unchecked exceptions. Vaguely, it decorates every expression with an early return of the form ``case expr of (e : Exception) -> return e; x -> x``. For non-exception-aware functions ``foo``, ``foo ... Exception ...`` reduces to ``Exception``. But this definition doesn't specify which exception gets returned, e.g. from ``throw a + throw b``. The more correct (operational) semantics is that as soon as an exception is evaluated (thrown) it is immediately propagated to the nearest applicable exception-catching context and the remaining part of the expression is discarded. This exposes the evaluation strategy of the language implementation. Java says left-to-right, but this prevents many optimizations. Stroscot's answer is that the exception returned is a deterministic function of the expression and compiler version. However the compiler's evaluation strategy is not exposed to the static verification system, so type signatures must be written as if either exception could be returned.
-
-:cite:`peytonjonesSemanticsImpreciseExceptions1999` says that because automatic propagation is "nondeterministic", ``catch`` should be an operation of the I/O monad - but in fact nothing in their semantics makes use of the I/O monad. ``getException`` is just ``return`` and pattern matching (section 4.4, page 9). Their approach merely uses the I/O monad as a "sin bin" for nondeterminism. Stroscot's choice to expose the nondeterminism allows more concise and flexible pure exception handling. But since the verification system models the set of exceptions and ``catch`` as randomly picking one, it robustly checks all evaluation strategies, including strange possibilities such as ``let x = throw 1 + throw 2 in try x == try x`` evaluating to false. (CBN expansion duplicates x, then try/catch picks different branches)
-
 According to `Joel <https://www.joelonsoftware.com/2003/10/13/13/>`__ automatic propagation sucks because the early returns mean magic gotos are invisibly sprinkled throughout your code. It does take some training to learn to read code as if every line, expression, and subexpression could throw an exception and to use finalizers appropriately. But automatic propagation gives streamlined syntax. With automatic propagation it does not require any changes to a call chain to throw an exception and catch it several layers higher up. Generally, it is easy to quickly write code for the happy path using automatic propagation because you don't mark any error paths.
 
 The correctness of code using automatic propagation is hard to judge. An exception code path may unwind too soon and not restore its state properly, but this may not be obvious. There are a few mutable state strategies that are easy to check:
@@ -322,19 +334,22 @@ The correctness of code using automatic propagation is hard to judge. An excepti
 
 But in general, these cannot guarantee that the returned state is correct. So to satisfy the Joels, Stroscot uses manual handling by default, allows opting into automatic propagation on a per-exception value basis, and out on a per-file/function basis via a warning. Stroscot also allows manual handling all the time, regardless of whether or not automatic propagation is enabled.
 
+Exception order
+---------------
 
+10E. The order in which exceptions in different parts of an expression are detected shall not be guaranteed by the language or by the translator.
 
- With a variant type like ``a -> b|Exception`` a function returns either a value or an exception. So just use the normal ``return`` keyword to return exceptions. Then to respond to specific exceptions programmatically, returned exception-or-values can be pattern-matched like any other return value:
-
-The case handling syntax seems easy and clear, and it's possible to locally reason about and decide how best to react to exceptions.
-But a Quorum-style study should check on what's clearest to beginners. Limiting ``return`` to normal values and using ``throw`` for ``Exception`` values is also a possibility.
-
-Just because there is shared syntax doesn't mean exceptions don't propagate, exceptions still unwind if they aren't caught by the case statement. They can be wrapped up in a Result type though to prevent propagation.
+How does automatic propagation work? Well, the closest analogue is that it unwinds the stack similarly to Java's unchecked exceptions. Vaguely, it decorates every expression with an early return of the form ``case expr of (e : Exception) -> return e; x -> x``. For non-exception-aware functions ``foo``, ``foo ... Exception ...`` reduces to ``Exception``. But this definition doesn't specify which exception gets returned, e.g. from ``throw a + throw b``. The more correct (operational) semantics is that as soon as an exception is evaluated (thrown) it is immediately propagated to the nearest applicable exception-catching context and the remaining part of the expression is discarded. This exposes the evaluation strategy of the language implementation. Java says left-to-right, but this prevents many optimizations. Stroscot's answer is that the exception returned is a deterministic function of the expression and compiler version. However the compiler's evaluation strategy is not exposed to the static verification system, so type signatures must be written as if either exception could be returned.
 
+:cite:`peytonjonesSemanticsImpreciseExceptions1999` says that because automatic propagation is "nondeterministic", ``catch`` should be an operation of the I/O monad - but in fact nothing in their semantics makes use of the I/O monad. ``getException`` is just ``return`` and pattern matching (section 4.4, page 9). Their approach merely uses the I/O monad as a "sin bin" for nondeterminism. Stroscot's choice to expose the nondeterminism allows more concise and flexible pure exception handling. But since the verification system models the set of exceptions and ``catch`` as randomly picking one, it robustly checks all evaluation strategies, including strange possibilities such as ``let x = throw 1 + throw 2 in try x == try x`` evaluating to false. (CBN expansion duplicates x, then try/catch picks different branches)
 
 Syntax
 ======
 
+10C. There shall be an operation that raises an exception.
+
+10D. There shall be a control structure for discriminating among the exceptions that can occur in a specified statement sequence. The user may supply a single control path for all exceptions not otherwise mentioned in such a discrimination. It shall be possible to raise the exception that selected the current handler when exiting the handler.
+
 ``throw`` / ``catch`` are the common syntax, like in Java:
 
 .. code-block:: java
@@ -375,6 +390,13 @@ Swift:
 try X else catch - wraps into Either type, an exception value (failure) or a normal value (success)
 try X else Y - presubstitute Y on exception
 
+With a variant type like ``a -> b|Exception`` a function returns either a value or an exception. So just use the normal ``return`` keyword to return exceptions. Then to respond to specific exceptions programmatically, returned exception-or-values can be pattern-matched like any other return value:
+
+The case handling syntax seems easy and clear, and it's possible to locally reason about and decide how best to react to exceptions.
+But a Quorum-style study should check on what's clearest to beginners. Limiting ``return`` to normal values and using ``throw`` for ``Exception`` values is also a possibility.
+
+Just because there is shared syntax doesn't mean exceptions don't propagate, exceptions still unwind if they aren't caught by the case statement. They can be wrapped up in a Result type though to prevent propagation.
+
 Finally
 =======
 
@@ -390,6 +412,10 @@ no-throw
 
 The standard C++ ecosystem uses exceptions. But the extra paths introduced by exceptions `add measurable overhead <https://grenouillebouillie.wordpress.com/2022/05/09/the-hidden-cost-of-exception-handling/>`__.  So people create forks of existing libraries that eradicate exceptions. The Windows kernel, for instance, has its own fork of the STL that doesn't use exceptions. This bifurcation of the ecosystem is neither pleasant nor practical to sustain.
 
+10G. It shall be possible during translation to suppress individually the execution time detection of exceptions within a given scope. The language shall not guarantee the integrity of the values produced when a suppressed exception occurs. [Note that suppression of an exception is not an assertion that the corresponding error will not occur.]
+
+This is sort of a goal. I disagree though that suppression is not an assertion. IMO the "correct" behavior of a suppression when the suppressed exception occurs is for the program to crash. An assertion failure similarly crashes the program. Similarly, if you can prove that the suppressed exception can never occur, then the suppression is a no-op, exactly how an assertion is a no-op if it never fails. The difference is solely in the case where the compiler cannot verify the assertion / lack of exceptions: with the assertion, the compiler outputs a warning, while with the suppression, the compiler assumes the programmer knows what they're doing and it is only an informational diagnostic.
+
 Exception API
 =============
 
@@ -702,17 +728,15 @@ There is also a "closed-world" style enforced by ``upper``, where the code is no
 Composability
 -------------
 
-Function types which cannot generate exceptions are subtypes of function types which can. To use higher order functions like map, polymorphic types like ``forall a b. (a -> b) -> t a -> t b`` should allow ``a`` and ``b`` to contain exceptions. To reason about this properly ``upper`` or ``precise`` types are needed. There doesn't seem to be any downside to deferring exception propagation and treating exceptions as values when applying higher-order functions. Higher-order exception handling constructs are possible, but in practice most Haskell code seems to stick with catch or try. The only really complex construct is ``bracket`` but that's handled in Stroscot with finalizers.
+Function types which cannot generate exceptions are subtypes of function types which can. To use higher order functions like map, polymorphic types like ``forall a b. (a -> b) -> t a -> t b`` should allow ``a`` and ``b`` to contain exceptions. To reason about this properly ``upper`` or ``precise`` types are needed. There doesn't seem to be any downside to deferring exception propagation and treating exceptions as values when applying higher-order functions. Higher-order exception handling constructs are possible, but in practice most Haskell code seems to stick with catch or try. The only really complex construct is ``bracket`` but that's handled in Stroscot with destructors.
 
 It is quite useful to know the domain for which a function cannot generate exceptions. So usually a function will have two signatures, a "narrow" type for which the function doesn't throw exceptions and a "wide" type for which it does, e.g. ``(/) : Int -> (Int\{0}) -> Int`` and ``(/) : Int -> Int -> Int|DivisionByZero``. Ideally the compiler can prove that the narrow type is appropriate and specialize code to not use exceptions. This can be ensured by specifying a signature at the usage site that excludes the exceptions.
 
-Stroscot's sets allow unions, e.g. you can express throwing ``MyException`` or ``HisException`` as ``x|MyException|HisException``. This makes combining libraries and their exception types fairly straightforward. This is impossible in many languages. Java's workaround is to instead use superclass catch-all types such as IOException and ReflectiveOperationException. It's not clear how useful these superclasses are - Swift claims reacting to an arbitrary IOException is difficult. IOExceptions can at least use an operation failure path that for example retries the operation a couple times, while Exceptions are so general that retrying may not make sense. But Storscot's subsets allow fine-grained definition so are much more expressive.
+Stroscot's sets allow unions, e.g. you can express throwing ``MyException`` or ``HisException`` as ``x|MyException|HisException``. This makes combining libraries and their exception types fairly straightforward. This is impossible in many languages. Java's workaround is to instead use superclass catch-all types such as IOException and ReflectiveOperationException. It's not clear how useful these superclasses are - Swift claims reacting to an arbitrary IOException is difficult. IOExceptions can at least use an operation failure path that for example retries the operation a couple times, while Exceptions are so general that retrying may not make sense. But Stroscot's subsets allow fine-grained definition so are much more expressive. Swift has recently added `typed throws <https://forums.swift.org/t/se-0413-typed-throws/68507/>`__ which are similar but only allow specifying a single exception type. From the discussion, it seems unions were requested many times but the authors decided it would be better to propose sum/union types as a separate feature.
 
 Snoyman `discusses <https://www.fpcomplete.com/blog/2016/11/exceptions-best-practices-haskell/>`__ using a ``Text`` type - it avoids the need for a real exception type, but means all exceptions are unstructured and can't be handled appropriately. His preferred approach is the constraint ``MonadThrow m``, but this throws ``Exception`` and isn't fine-grained. We could generalize by adding a type parameter to ``MonadThrow``, ``(MonadThrows m MyException, MonadThrows m HisException) => String -> m Int``, but now it's clear that this is the `existential typeclass antipattern <https://web.archive.org/web/20200510033212/https://lukepalmer.wordpress.com/2010/01/24/haskell-antipattern-existential-typeclass/>`__ and ``String -> Int|MyException|HisException`` is much clearer.
 
-Another note is that Stroscot's signatures are independent - they all are checked against the implementation, rather than a type interface. For example the following:
-
-::
+Another note is that Stroscot's signatures are independent - they all are checked against the implementation, rather than a type interface. For example, in the following::
 
   a : Int -> Int|Exception
   a x = if x > 0 then x else NegativeException
@@ -728,7 +752,7 @@ Another note is that Stroscot's signatures are independent - they all are checke
 Implementation
 ==============
 
-The implementation needs to transfer control from throw to catch, and run finalizers.
+The implementation needs to transfer control from throw to catch, and run destructors.
 
 To illustrate take a simple example:
 
@@ -818,11 +842,13 @@ Microsoft's implementation of C++ exceptions allocates on the stack, and delays
 Zero overhead
 -------------
 
-Whatever marketing you have heard about zero-overhead C++ exceptions is misleading. Per `measurements <https://grenouillebouillie.wordpress.com/2022/05/09/the-hidden-cost-of-exception-handling/>`__ (also in Herb Sutter's thing IIRC), just turning on exception handling support in a C++ project previously compiled without exception support, not throwing any exceptions at all, gives a 15-52% binary size increase. The overhead arises from jump tables, additional stack space per thread (e.g., a 1K reservation, to save a dynamic allocation) and additional thread-local storage. For this reason many C++ projects disable exceptions.
+10A. Exceptions should add to the execution time of programs only if they are raised.
+
+Whatever marketing you have heard about zero-overhead C++ exceptions is misleading. Per `measurements <https://grenouillebouillie.wordpress.com/2022/05/09/the-hidden-cost-of-exception-handling/>`__ (also in Herb Sutter's thing IIRC), just turning on exception handling support in a C++ project previously compiled without exception support, not throwing any exceptions at all, gives a 15-52% binary size increase. The overhead arises from jump tables, additional stack space per thread (e.g., a 1K reservation, to save a dynamic allocation) and additional thread-local storage. This binary size increase comes with an accompanying cost in program load time and cache misses during execution (not measured, varies widely). For this reason many C++ projects disable exceptions.
 
 With a proper design, a flag to turn off exceptions should not be needed. The compiler should be able to prove that code cannot throw exceptions and optimize away catch handlers accordingly. Only assertions that a function cannot throw are needed, to help the compiler with its proofs.
 
-As far as strategy I'm thinking to switch between return codes and continuations depending on how hot the exception path is. Exception propagation for common exceptions (above 30%) should be handled by return codes. It would be good to bias the implementation somewhat towards the hot path (exception or not), e.g. by moving cleanup code on the cold path to the end of the function, out of the hot code path, but missing this isn't enough to seriously compromise performance. Continuation-based unwinding should be reserved for really rare exceptions, 1 in 100 or less, where cache misses predominate.
+As far as strategy I'm thinking to switch between return codes and continuations depending on how hot the exception path is, based on `this post <https://shipilev.net/blog/2014/exceptional-performance/>`__. Exception propagation for common exceptions (above 30%) should be handled by return codes. It would be good to bias the implementation somewhat towards the hot path (exception or not), e.g. by moving cleanup code on the cold path to the end of the function, out of the hot code path, but missing this isn't enough to seriously compromise performance. Continuation-based unwinding should be reserved for really rare exceptions, 1 in 100 or less, where cache misses predominate.
 
 Interruptible cleanup
 =====================
diff --git a/docs/Commentary/FunctionalLogic.rst b/docs/Commentary/Language/FunctionalLogic.rst
similarity index 100%
rename from docs/Commentary/FunctionalLogic.rst
rename to docs/Commentary/Language/FunctionalLogic.rst
diff --git a/docs/Commentary/Logic.rst b/docs/Commentary/Language/Logic.rst
similarity index 99%
rename from docs/Commentary/Logic.rst
rename to docs/Commentary/Language/Logic.rst
index fc3842f..15bbd28 100644
--- a/docs/Commentary/Logic.rst
+++ b/docs/Commentary/Language/Logic.rst
@@ -368,6 +368,8 @@ Proof of the substitution property: For :math:`𝕁^-` we use the right rule to
 Recursion and infinite structures
 =================================
 
+Steelman 7B. It shall be possible to call functions and procedures recursively.
+
 There is the question of representing recursion, as standard derivations only allow finite (well-founded) proofs.Sequent Core :cite:`downenSequentCalculusCompiler2016` introduces Let/MultiCut and Rec, which "serve two purposes: to give a shared name to the result of some computation, and to express (mutual) recursion."
 
 .. math::
diff --git a/docs/Commentary/LogicProgramming.rst b/docs/Commentary/Language/LogicProgramming.rst
similarity index 100%
rename from docs/Commentary/LogicProgramming.rst
rename to docs/Commentary/Language/LogicProgramming.rst
diff --git a/docs/Commentary/Macros.rst b/docs/Commentary/Language/Macros.rst
similarity index 97%
rename from docs/Commentary/Macros.rst
rename to docs/Commentary/Language/Macros.rst
index 53bc754..77730b9 100644
--- a/docs/Commentary/Macros.rst
+++ b/docs/Commentary/Language/Macros.rst
@@ -1,8 +1,8 @@
 Macros
 ######
 
-code generation: code generation, automatically generate code based on templates or other input.
-metaprogramming: metaprogramming, write code that can manipulate and generate other code at runtime.
+* code generation: automatically generate code based on templates or other input.
+* metaprogramming: write code that can manipulate and generate other code at runtime.
 
 Macros are a facility that allows defining syntactic extensions to a programming language. Macros implement the essence of Lisp: code can be used as data, and data can be used as code. Although in Lisp the transformation of code into data (specifically, an AST tree) is obvious as the S-expression syntax is just nested lists, in Stroscot we choose to have more complex syntax and accompanying mental overhead. It should be noted that even McCarthy the inventor of Lisp intended to use an M-expression (mathematical expression) syntax and S-expressions were only a matter of convenience.
 
diff --git a/docs/Commentary/Memory-Management.rst b/docs/Commentary/Language/Memory-Management.rst
similarity index 81%
rename from docs/Commentary/Memory-Management.rst
rename to docs/Commentary/Language/Memory-Management.rst
index 5dd6e7c..864b4cf 100644
--- a/docs/Commentary/Memory-Management.rst
+++ b/docs/Commentary/Language/Memory-Management.rst
@@ -6,6 +6,10 @@ The language should have automatic memory management. Manual memory management i
 Memory models
 =============
 
+3-3I. It shall be possible to define types whose elements are indirectly accessed. Elements of such types may have components of their own type, may have substructure that can be altered during execution, and may be distinct while having identical component values. Such types shall be distinguishable from other composite types in their definitions. An element of an indirect type shall remain allocated as long as it can be referenced by the program. [Note that indirect types require pointers and sometimes heap storage in their implementation.]
+
+3-3J. Each execution of the constructor operation for an indirect type shall create a distinct element of the type. An operation that distinguishes between different elements, an operation that replaces all of the component values of an element without altering the element's identity, and an operation that produces a new element having the same component values as its argument, shall be automatically defined for each indirect type.
+
 Per :cite:`kangFormalMemoryModel2015` there are pretty much two models of memory, pointers and references. Pointers model memory as an integer-indexed array of 2^32 or 2^64 words, accessed by the OS/hardware APIs. References model memory as an associative array from symbolic "references" (potentially infinite in number) to "cells", values (stored in some unspecified format, but with lossless storage).
 
 Kang describes how combinations of these can be made, for example the "quasi-concrete model" which uses a data type that starts out containing a reference, implements various arithmetic operations symbolically, but switches to a pointer once an integer address is requested. You can also imagine the other direction, a pointer that masquerades as a reference but errors when attempting to store a value larger than the allocation. But references and pointers are the fundamental ideas and serve to implement all other possibilities.
@@ -28,6 +32,13 @@ As far as using a reference as a pointer, as long as we don't want to do pointer
 
 I guess it is possible that someone will have two versions of a function, one that implements it in pure Stroscot via references and one that calls out a C library with pointers. But externally, I think both of them should interact with the rest of the code using references. Using pointers with the C code might avoid a few conversion calls, but references are a lot cleaner to use, e.g. avoiding the use of callbacks, and there is the guaranteed optimization that you can use a reference as a pointer with zero-cost. So I don't think this poses an issue. Even if the C wrapper did use pointers because it was easier than converting to/from references all the time, that's a judgement call on the part of the library author and I don't think there is a solution that would let everyone standardize on one universal type. The best a "pointerOrRef" type can support, even restricted to a small type like ``int8``, is get/set like a regular reference.
 
+Aliasing
+--------
+
+Steelman 7I. The language shall attempt to prevent aliasing (l.e., multiple access paths to the same variable or record component) that is not intended, but shall not prohibit all aliasing. [...] All aliasing of components of elements of an indirect type shall be considered intentional.
+
+The language is convoluted and hard to understand, but the way I read this is that anyone who uses an indirection expects aliasing and the language should not do anything to prevent it. Certainly, if you don't need aliasing, you could just use a constant directly.
+
 Pointers
 ========
 
@@ -82,7 +93,7 @@ Eliminating a pointer write requires proving that the address is never read befo
 CHERI
 -----
 
-CHERI pointers are 129-bit, consisting of a 1-bit validity tag, bounds, permissions, object type, and actual pointer. Valid pointers may only be materialized in a register or memory by transforming an initial unbounded pointer obtained from the OS. This means that the simple model of pointers as integers is no longer valid. Instead, a pointer is the combination of an integer address and a capability. The `CHERI C/C++ API <https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-947.pdf>`__ represents capabilities as ``void*`` and addresses as ``vaddr_t``.
+CHERI pointers are 129-bit, consisting of a 1-bit validity tag, bounds, permissions, object type, and actual pointer. Valid pointers may only be materialized in a register or memory by transforming an initial unbounded pointer obtained from the OS. This means that the simple model of pointers as integers is no longer valid. Instead, a pointer is the combination of an integer address and a capability. The `CHERI C/C++ API <https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-947.pdf>`__ represents the address+capability value as ``void*`` and addresses as ``vaddr_t``; there doesn't seem to be a way to refer to a capability without an address.
 
 I tried to read further, but the model is complicated, essentially implementing a GC to avoid dangling pointers, so I am not sure it will ever become mainstream.
 
@@ -94,11 +105,16 @@ The pointer API, assembly wrapping, and OS calls cover using persistent memory v
 References
 ==========
 
+5B. Each variable must be declared explicitly. Variables may be of any type. The type of each variable must be specified as part of its declaration and must be determinable during translation. [Note, "variable" throughout this document refers not only to simple variables but also to composite variables and to components of arrays and records.]
+5E. There shall be no default initial-values for variables.
+5F. Assignment and an implicit value access operation shall be automatically defined for each variable.
+9C. It shall be.possible to mark variables that are shared among parallel processes. An unmarked variable that is assigned on one path and used on another shall cause a warning.
+
 A reference is a symbolic index into a global associative array of objects, ``Map Reference Object``. The array allows allocating new references, deleting them, and reading/writing the reference. Reference symbols can be compared for equality, hashed to an integer, and packed/unpacked to/from an integer.
 
 The packing and hashing requires a little explanation. Packing the same reference always returns the same value during a program execution, and the packed value is distinct from the packed value of any other reference. But the exact value is internal to the memory system - it is an "adversarial" model similar to pointers where if the program's behavior depends on the choice of packed value it is incorrect. The hashing is similar to packing, it is again the same value for the same reference, it is just that there is no distinctiveness constraint (so the program must have the same behavior even if all references hash to 0), and also no way to unhash the value, so there is no need to worry about resolving unpack invocations.
 
-There are higher-level types like immutable references and reference wrappers, but those all translate away to normal references or pointer access and don't need involvement from the compiler.
+There are higher-level types like immutable references and reference wrappers, but those all translate away to normal references or pointer access and don't need involvement from the compiler. Per :cite:`ichbiahRationaleDesignADA1979` we should provide a "freeze" operation which recursively removes all reference indirections and turns a reference-containing value into a truly immutable/constant object, as this is "the most useful and should be retained as the unique meaning of constancy".
 
 Pointer conversion
 ------------------
@@ -109,161 +125,6 @@ Foreign operations like OS calls require a pointer to a memory address, because
 
 But, in the fallback case of storing a few words, where a memory allocation is appropriate, the reference translates directly to a pointer allocation. The memory is configured to trap on stray user-level access, so that only the compiler-generated code has access. Even in this case, though, the reference's internal value is not the pointer itself, rather there is a more complex strategy of using a "handle" identifier that allows moving the data around after it is allocated.
 
-.. _finalizers:
-
-Finalizers
-==========
-
-So far we have just seen manual memory management. For both pointers and references, you allocate the memory, and then "when you are done with it", you have to free the pointer or delete the reference. Many other resources work the same way, like thread handles, file handles, and sockets. But of course, forgetting to free resources (leaking) is a common error. Finalizers are inspired by ASAP and allow the prompt freeing of allocated memory and resources, RAII-style. They assume there is exactly one best place to free the resource, "immediately after the last use of the operation". The compiler determines this location and automatically inserts the free operation there. Such an analysis is more precise than traditional GC, because GC looks at what references are "in scope" and cannot free an unused reference embedded in a structure whose other parts are in use. But semantics-wise, finalization is a static, completely solvable problem. It is just of a high complexity :math:`\Sigma^0_1`. So if we are willing to accept potentially long compile times, we can eliminate memory errors.
-
-Formal definition
------------------
-
-More formally, a finalizer is a magic value created with the one-argument function ``newFinalizer : (free : Command) -> Op Finalizer``. It supports equality, hashing, and command ``use : Finalizer -> Command`` and ``useForever : Finalizer -> Op Command``. The semantics is that ``free`` will be called as soon as it is known that ``use`` and ``useForever`` will not be called. Calling ``use`` delays finalization until after the ``use``, and ``useForever`` cancels the finalizer and returns the free operation. The general transformation:
-
-::
-
-  reduce (NewFinalizer free c) =
-    f = freshSymbol
-    transform (c f) {free,f}
-  reduce (Use f c) = c
-  reduce (UseForever f c) = c free
-
-  transform : Task -> Task
-  transform c =
-    if !(could_call (Use f) c || could_call (UseForever f) c)
-      reduce (free {continuation = c})
-    else if will_definitely_call (UseForever f) c
-      reduce c
-    else
-      if not (will_definitely_call (Use f) c)
-        info("Delaying finalizer due to conditional usage")
-      let c' = continuation c
-      reduce (c { continuation = transform c' })
-
-
-Non-prompt finalization
------------------------
-
-Finalizers do not really free memory "immediately after the last use", as the `info("Delaying finalizer due to conditional usage")`` message points out. This situation happens when the location to free depends on further input:
-
-::
-
-  af = print "a"
-  a = newFinalizer af
-  b = input Bool
-  if b then
-    print "b"
-    exit
-  else
-    print "c"
-    use a
-    print "d"
-    exit
-
-Because ``a`` might be used in the else branch, it cannot be freed between the ``newFinalizer af`` and ``input Bool`` statements, even though this would be the earliest place to free for a "true" input. Instead, ``a`` is freed as soon as it is known it will (unconditionally) not be used, hence this program is equivalent to:
-
-::
-
-  af = print "a"
-  b = input Bool
-  if b then
-    af
-    print "b"
-    exit
-  else
-    print "c"
-    af
-    print "d"
-    exit
-
-Non-prompt finalization can be made into an error/warning if prompt memory management is desired. I was calling prompt finalizers "destructors" for a while, because they behave a lot more like C++-style RAII. The API was slightly different, instead of a ``free`` operation that gets called at random times, destructors had this operation ``lastUse : Destructor -> Op Bool``, that returns false for every call except the last. The pattern for using this API is to make each use check if it's the ``lastUse``, like ``use (PromptFinalizer free d) = { l = lastUse d; if l { free } }``. But eventually I proved that this destructor API is equivalent to just using finalizers with the error enabled: the ``use`` function I presented lets you wrap a destructor as a finalizer, and destructors can be implemented using finalizers by::
-
-  isFinalized = mut false
-  f = newFinalizer { isFinalized := true }
-  lastUse =
-    use f
-    read isFinalized
-
-This implementation of destructors works just fine in programs where the warning is not triggered. But with non-prompt finalization, delaying until known, the contract is not valid because ``lastUse`` could return false even though it is the last use (delaying the finalizer after the read).
-
-Subsuming manual memory management
-----------------------------------
-
-By construction, finalizers without the warning are as prompt as finalizers with the warning, on the programs where the warning does not trigger. In particular, finalizers subsume prompt finalizers subsume manual memory management. Taking a program written with standard ``malloc/free``, we can gradually change it:
-
-1. ``malloc`` is wrapped to return a tuple with ``newFinalizer``, ``free`` is replaced with ``use``
-2. every operation is modified to call ``use``
-3. the finalizer warning is turned off
-4. The ``use`` calls corresponding to ``free`` are removed
-
-Up until step 4, the finalizer program compiles identically to the original. It's step 4 that's a bit fragile - the lifetime of the finalizer could be shortened and, depending on the program structure, the point at which ``free`` should be called may be hard to compute. But hopefully the analysis will be fairly robust and able to handle most cases. At worst, the programmer will have to provide additional help to the finalizer analysis in the form of inserting the ``use`` statements corresponding to ``free``. Either way, since all operations call ``use``, the program behavior is not changed, only its resource management.
-
-Finalization order
-------------------
-
-If multiple finalizers simultaneously become able to call ``free``, then finalizer instruction insertions are run in the order of creation, first created first. This means the free calls will execute most recent first.
-
-::
-
-  a = newFinalizer (print "a")
-  b = newFinalizer (print "b")
-
-  if randomBool then
-    print "c"
-    exit
-  else
-    print "c"
-    use a
-    use b
-    exit
-
-  # when bool is false: cab
-  # when bool is true: bac
-
-Freed on exit
--------------
-
-Many resources are automatically freed by the OS on exit: memory, file handles, etc. This automatic freeing is generally more efficient than releasing each resource one by one. So as an optimization, one would like to *not* free these resources earlier, but rather hold on to them until the program exits and the OS frees them itself. So what we need is an analysis that determines at what point in the program there are sufficient spare resources that any further allocation can be satisfied without deallocation. This measure "the remaining amount of additional memory the program might use" will not necessarily be compared against the remaining memory amount of free physical memory actually available, but more likely a configurable parameter like 2MB. Once this point is determined the compiler can insert ``useForever`` calls to mark all the in-use resources as not needing manual finalization.
-
-Sloppy frees
-------------
-
-GC is more composable and it can also be faster than manual memory management :cite:`appelGarbageCollectionCan1987`. As Appel points out, even if freeing an individual object is a single machine instruction, such as a stack pop, freeing a lot of objects still has significant overhead compared to copying out a small amount of useful data and just marking a whole region of objects as free. In a similar vein, sometimes we do not actually want the finalizer to run as promptly as possible, but rather batch it with other allocations and free it all in one go. The opportunities for batching are hard to detect and even harder to implement by hand. Setting some "slop factor" of memory that can be delayed-freed is quite useful - the only downside is that if the program is pushing the limits of memory maybe it will crash at 1.9GB instead of 2GB.
-
-Really, we are distinguishing "unused" or "dead" memory from memory that is released back to the OS or the rest of the program.
-
-Evaluation order
-----------------
-
-There are also "space leaks" where memory could be freed earlier by evaluating expressions in a specific order but some other order is chosen. Certainly there is some evaluation order that results in minimum RAM usage, but maybe a less compact order is more time-efficient. So there is some amount of time-space tradeoff for this category. Finalizers kind of skirt this issue by being completely imperative, but with unsafePerformIO this becomes relevant again.
-
-On borrowing
-------------
-
-Rust has gotten much attention with the borrow checker, documented in :cite:`weissOxideEssenceRust2019`. Similar to finalizers, Rust also has a concept of the "lifetime" of each reference. But, whereas in Stroscot the lifetime is simply the set of program states during which the reference is not dead, in Rust a lifetime is a *region* consisting of annotating each program point with the set of *loans* of the reference, where each loan is either unique or shared. At each point, a reference may have no loans, one unique loan, or many shared loans - no other possibilities are allowed. This restrictive set of allowed access patterns means that Rust does not allow simple cyclic pointer patterns such as doubly-linked lists.
-
-Similarly, Val's `mutable value semantics <https://www.jot.fm/issues/issue_2022_02/article2.pdf>`__ is even more restrictive than Rust, dropping references altogether and simply using the function parameter annotation ``inout``. But it once again cannot represent any sort of cyclic pointer structure. It is really just the trick for representing state as the type ``s -> (a,s)``, and doesn't handle memory management at all.
-
-In practice, Rust developers have a variety of escapes from the borrow checker.  code frequently switches to the ``Rc`` reference counted type, which besides cycles has the semantics of GC. There is even a `library <https://github.com/Others/shredder>`__ for a ``Gc`` type that does intrusive scanning.
-
-Per :cite:`proustASAPStaticPossible2017`, finalizers and the "free after last use" criterion subsume both region-based memory management and reference counting. :cite:`corbynPracticalStaticMemory2020` implemented a buggy incomplete version and showed even that version is comparable to Rust.
-
-Timeliness
-----------
-
-If doing automatic static memory management is so easy, why hasn't it been tried before? Well, it has. For example, :cite:`guyerFreeMeStaticAnalysis2006` has a similar notion of automatically inserting frees, and they report good results. But that paper focused on reachability, rather than lack of use, and their analysis was local to function blocks, rather than global. So it didn't see much adoption.
-
-:cite:`proustASAPStaticPossible2017` presented the theory and the formulation of the problem fairly well, but he fell into the trap of thinking that since the complexity of determining "waste blocks" was :math:`\Sigma_0^1`, any analysis had to be approximate. There are techniques for solving such high-complexity problems precisely, as evidenced in the TERMCOMP termination analysis competition, but such techniques really only got started in 2007 or so. From his citations list, Proust didn't really get into this area of the literature.
-
-So the answer is, it seems novel to try to apply techniques from formal verification to memory management, and that's the only technique that seems powerful enough to implement finalizers in the way presented here, where the point of finalization is guaranteed. All previous approaches have focused on approximate analyses that aren't powerful enough to subsume manual memory management.
-
-Certainly there is some risk involved in implementing a novel analysis. But it doesn't seem like a `"cursed problem" <https://www.youtube.com/watch?v=8uE6-vIi1rQ>`__ where even trying to solve it is a waste of time - :cite:`corbynPracticalStaticMemory2020` got decent results with just 8 months or so of part-time work. I'd rather be spending a lot of effort on solving the right problem, even if it's hard, than getting sidetracked solving the wrong easy problem.
-
-Design
-------
-
-Java's finalizers have inherent problems because they are associated with GC. In particular, because the GC may not run, Java's finalizers have no guarantee of timeliness, and hence cannot be used to free resources. In contrast Stroscot's finalizers free as soon as it is statically known that they are no longer used. Java's finalizers have no ordering; Stroscot's run in the order defined. Java's finalizers do not report exceptions; Stroscot's finalizer methods are inserted into the program at the point the finalizer is run and can report exceptions. But like Java, the finalizer is considered done regardless of whether it throws an exception. Stroscot's finalizers are functions and are not directly associated with objects, so there is no possibility of resurrection like in Java.
 
 
 
@@ -299,6 +160,8 @@ Portable mmap:
 Representation
 ==============
 
+11A. The language shall permit but not require programs to specify a single physical representation for the elements of a type. These specifications shall be separate from the logical descriptions. Physical representation shall include object representation of enumeration elements, order of fields, width of fields, presence of "don't care" fields, positions of word boundaries, and object machine addresses. In particular, the facility shall be sufficient to specify the physical representation of any record whose format is determined by considerations that are entirely external to the program, translator, and language. The language and its translators shall not guarantee any particular choice for those aspects of physical representation that are unspecified by the program. It shall be possible to specify the association of physical resources (e.g., interrupts) to program elements (e.g., exceptions or signals).
+
 A lot of languages have a fixed or default memory representation for values, e.g. a C struct, a Haskell ADT, and a Python object are always laid out in pretty much the same way. The more systems-level languages allow controlling the layout with flags, for example Rust has `type layout <https://doc.rust-lang.org/reference/type-layout.html>`__ and also C compatibility. Layout is then defined by its size, alignment, padding/stride, and field offsets. Now it's great to have a compact representation of the memory layout - but only if you can actually write the memory layout you want using these features. But these flags are't really that powerful. Here's some examples of what can't generally be done with the current memory DSL's:
 
 * specify the in-memory order of fields differently from their logical order
diff --git a/docs/Commentary/Modules.rst b/docs/Commentary/Language/Modules.rst
similarity index 81%
rename from docs/Commentary/Modules.rst
rename to docs/Commentary/Language/Modules.rst
index ae82423..b2fe7ad 100644
--- a/docs/Commentary/Modules.rst
+++ b/docs/Commentary/Language/Modules.rst
@@ -3,19 +3,27 @@ Modules
 
 Developers can add new language features and functionality through libraries and modules.
 
+3-5A. It shall be possible to encapsulate definitions. An encapsulation may contain declarations of anything (including the data elements and operations comprising a type) that is definable in programs. The language shall permit multiple explicit instantiations of an encapsulation.
+
+3-5B. An encapsulation may be used to inhibit external access to implementation properties of the definition. In particular, it shall be possible to prevent external reference to any declaration within the encapsulation including automatically defined operations such as type conversions and equality. Definitions that are made within an encapsulation and are externally accessible may be renamed before use outside the encapsulation.
+
+3-5C. Variables declared within an encapsulation, but not within a function, procedure, or process of the encapsulation, shall remain allocated and retain their values throughout the scope in which the encapsulation is instantiated.
+
 Terminology
 ===========
 
-* an *entity* is a value, such as a function or constant. It is generally referred to by an associated name path. For example, in ``stdlib.getTime stdlib.clocks.SystemClock``, ``stdlib.getTime`` is resolved to an entity which is an I/O action and ``stdlib.clocks.SystemClock`` is resolved to an entity which is a symbol.
+(based on Carbon)
+
 * a *name path* is a dot-separated list of identifiers or symbols. For example, ``stdlib.getTime`` is a name path and ``stdlib`` and ``getTime`` are identifiers. A bare identifier ``getTime`` is a trivial name path; generally name paths can be shortened to bare identifiers using imports.
-* *libraries* are considered an output format in Stroscot. Similarly to building an executable, you can build a shared or static library. The nomenclature of "library" is not otherwise used in programming Stroscot.
-* a *module* or namespace is a convenience grouping of entities, more formally a map from names to entities. Modules may be nested, recursively/cyclically, and an entity may appear in multiple modules. The dot-notation and name paths are closely associated with accessing a module's entities.
-* a *package* is a group of files, and is the standard unit for distribution (i.e., a package is typically downloaded as a zip file or such). A package may contain many modules or none at all as-is (providing only top-level entities), but with careful importing any package may be encapsulated as a module and most are.
+* an *entity* is a value, such as a function or constant. It is generally referred to by an associated name path. For example, in ``stdlib.getTime stdlib.clocks.SystemClock``, ``stdlib.getTime`` is resolved to an entity which is an I/O action and ``stdlib.clocks.SystemClock`` is resolved to an entity which is a symbol. Multiple name paths may refer to the same entity.
+* *libraries* are considered an output format in Stroscot. Similarly to building an executable, you can build a shared or static library. The nomenclature of "library" is not otherwise used in programming Stroscot as it is ambiguous.
+* a *module* or namespace is a convenience grouping of entities, more formally a map from names to entities. Modules may be nested, recursively/cyclically, and an entity may appear in multiple modules. The dot-notation and name path syntax is closely associated with accessing a module's entities.
+* a *package* is a group of files, and is the standard unit for distribution (i.e., a package is typically downloaded as a zip file or such). A package may contain many modules or none at all (providing only top-level entities). Generally however a package is imported and encapsulated as a module.
 
 Module
 ======
 
-Modules are reusable components containing code that are themselves first-class objects, replacing more simplistic ways of namespacing and encapsulating units of code such as structs of methods. Modules allow code reuse and abstraction. For example, one can create an "abstract data type" by defining a type in a module ``set C = { Csym a b c | a : Int, b : Float, c : Double }``  and not exporting its underlying constructor symbol ``Csym``. This is not really the mathematical notion of an ADT, where one takes the "free object" defined by the operations and constraints, but since Stroscot supports term rewriting it is possible to define the mathematical model as a series of rewrite rules and then export the symbols, fulfilling the concept of ADT in both theory and practice.
+Modules are reusable components containing code that are themselves first-class objects, replacing more simplistic ways of namespacing and encapsulating units of code such as structs of methods. Modules allow code reuse and abstraction. For example, one can create an "abstract data type" by defining a type in a module ``set C = { Csym a b c | a : Int, b : Float, c : Double }`` and not exporting its underlying constructor symbol ``Csym``. This is not really the mathematical notion of an ADT, where one takes the "free object" defined by the operations and constraints, but since Stroscot supports term rewriting it is possible to define the mathematical model as a series of rewrite rules and then export the symbols, fulfilling the concept of ADT in both theory and practice.
 
 Stroscot's module system is based on F#, OCaml, Standard ML, Agda, 1ML, MixML, Backpack, and Newspeak's module systems. (TODO: reread the systems to see if I missed anything. But they're mostly type systems, rather than functionality)
 
@@ -35,6 +43,10 @@ In practical programs most code is contained in functors. For example every pack
 Lexical scoping
 ===============
 
+5C. Everything (including operators) declared in a program shall have a scope (i.e., a portion of the program in which it can be referenced). Scopes shall be determinable during translation. Scopes may be nested (i.e., lexically embedded). A declaration may be made in any scope. Anything other than a variable shall be accessable within any nested scope of its definition.
+
+7C. A reference to an identifier that is not declared in the most local scope shall refer to a program element that is lexically global, rather than to one that is global through the dynamic calling structure.
+
 An identifier is visible from the point of its declaration to the end of the immediately surrounding block.
 There is only one effective scope at any point in the code (for instance, variables and types are looked up in the same scope), which pulls in variables from various lexical scopes. Uniform scoping rules make programs easier to read and to understand.
 
@@ -132,7 +144,7 @@ But it is less convenient - programs that "should" compile don't. One's brain na
 
 For Stroscot, we are doing whole-program analysis anyway for optimization purposes, so it is fairly straightforward to allow recursive imports. The incremental build system is designed to handle recursive imports and other such oddities. Pike's concerns about understandability can be addressed by a lint analysis that finds recursive imports and suggests ways to resolve them.
 
-I don't agree though with Carbon's second point - I think there is not much advantage to specifically distinguishing API vs implementation, versus a general module separation.
+Now moving on to Carbon's second point, letting the build system distinguish between the dependencies of the API itself and its underlying implementation. Steelman 1B similarly says "The language shall require some redundant specifications in programs." This contradicts the DRY best practice ("Don't repeat yourself"), because now there are multiple points of truth - first you write the implementation, then you are required to write the specification on top as a duplication of the implementation. These are coupled and now when you change one you have to change the other. In Stroscot, this redundancy is avoided by making the implementation the only source of truth. Type signatures (specifications) do not affect the runtime behavior at all, but rather are compiler-checked assertions about the behavior of the implementation. These assertions are not redundant because it could indeed be the case that the implementation does not satisfy them. But such assertions are not required, because the implementation itself is the sole source of behavior. Because they are not required, there are no restrictions on where they appear - they may appear next to the implementation, or near the site of API use, or anywhere else in the program.
 
 Module versioning
 =================
diff --git a/docs/Commentary/Objects.rst b/docs/Commentary/Language/Objects.rst
similarity index 100%
rename from docs/Commentary/Objects.rst
rename to docs/Commentary/Language/Objects.rst
diff --git a/docs/Commentary/OpPrims.rst b/docs/Commentary/Language/OpPrims.rst
similarity index 92%
rename from docs/Commentary/OpPrims.rst
rename to docs/Commentary/Language/OpPrims.rst
index bce5514..957e395 100644
--- a/docs/Commentary/OpPrims.rst
+++ b/docs/Commentary/Language/OpPrims.rst
@@ -1,10 +1,18 @@
 Operational primitives
 ######################
 
+8A. There shall be a few low level input-output operations that send and receive control information to and from physical channels and devices. The low level operations shall be chosen to insure that all user level input-output operations can be defined within the language.
+
 As used in Stroscot, operational primitives refer to the stateful operations that form the building blocks of imperative programs. Examples include memory operations (read, write), foreign function calls, compiler intrinsics, and OS system calls. It would also be possible to call them "primitive operations", but this term is less precise and could be read as including elements of the runtime such as lambda reduction and term rewriting.
 
 There is an operational interpretation of every expression in Stroscot. For example, the operational interpretation of a value is returning that value. The operational interpretation of addition on two machine integers consists of storing the integers to memory or fixed registers, executing the ``add`` assembly instruction, appropriately handling any error conditions or traps, packaging up the result as a value, and returning it. And so on. Generally, a program may be viewed as assembly instruction sequences interleaved together with higher-level "glue" code. During optimization, one goal is to convert and reduce as much of this "glue" code as possible into assembly. Each switch into "glue" code corresponds to a jump back into the interpreter, with associated overhead.
 
+Steelman 1G says "There shall be a facility for defining those portions of programs that are dependent on the object machine configuration and for conditionally compiling programs depending on the actual configuration." Stroscot follows this fully, exposing the full machine configuration at runtime and allowing code to use this information in conditions or other control structures.
+
+11C. To aid conditional compilation, it shall be possible to interrogate properties that are known during translation including characteristics of the object configuration, of function and procedure calling environments, and of actual parameters. For example, it shall be possible to determine whether the caller has suppressed a given exception, the callers optimization criteria, whether an actual parameter is a translation time expression, the type of actual generic parameters, and the values of constraints characterizing the subtype of actual parameters.
+
+11D. The object system configuration must be explicitly specified in each separately translated unit. Such specifications must include the object machine model, the operating system if present, peripheral equipment, and the device configuration, and may include special hardware options and memory size. The translator will use such specifications when generating object code. [Note that programs that depend on the specific characteristics of the object machine, may be made more portable by enclosing those portions in branches of conditionals on the object machine configuration.]
+
 Assembly
 ========
 
diff --git a/docs/Commentary/Posets.rst b/docs/Commentary/Language/Posets.rst
similarity index 100%
rename from docs/Commentary/Posets.rst
rename to docs/Commentary/Language/Posets.rst
diff --git a/docs/Commentary/Language/Resource-Management.rst b/docs/Commentary/Language/Resource-Management.rst
new file mode 100644
index 0000000..5406398
--- /dev/null
+++ b/docs/Commentary/Language/Resource-Management.rst
@@ -0,0 +1,464 @@
+Resource management
+###################
+
+Resources are things that can be acquired and released, and are available in limited quantities due to OS or physical limitations. Examples include memory allocations, file handles, internet sockets, mutexes/locks, process table entries, and process identifiers (PIDs). A resource leak happens if the program does not promptly release a resource it has acquired after the program is finished with the resource. A resource management technique prevents resource leaks by releasing resources promptly.
+
+It is possible to manage resources by hand, but it is quite error-prone and tedious. Automatic resource management aims to provide a concise and safe replacement. Stroscot's solution is called "destructors" or "finalizers" or something like that.
+
+Manual approach
+===============
+
+The manual approach is to release sequentially after acquiring, for example::
+
+  f1 = open "file.txt"
+  f2 = open "file2.txt"
+  do_something f1 f2
+  close f1
+  do_something_2 f2
+  close f2
+
+There are several issues:
+
+* Early exit leak: There is a resource leak if do_something contains a return, exception, goto, break, or continue out of the do_something block. `Raymond Chen <https://web.archive.org/web/20201017213150/https://devblogs.microsoft.com/oldnewthing/?p=36783>`__ complains about C++ macros hiding return and goto statements in the function body.
+* Failed acquire: if open fails, do_something and close will be called on an invalid value. But with automatic exception propagation this will be fine since ``close InvalidHandle = InvalidHandle``.
+* Adjacency: If do_something is long, the release is far from the acquisition, so manual inspection cannot easily identify if the corresponding release function is actually called.
+* Encapsulation: the release requirement exposes a detail of the value, namely that it is a resource. This means it cannot be stored in a data structure or returned from a function without special care to call release later.
+* Interleaving: Not a problem here, but many solutions impose the requirement that f2 is freed before f1, a LIFO nesting.
+
+try-finally
+===========
+
+The try-finally modifies the basic approach by marking the do_something and release blocks explicitly. There are two patterns in Java 6, `standard pattern <http://www.javapractices.com/topic/TopicAction.do?Id=25>`__ and open-in-try (`here <https://stackoverflow.com/questions/2699209/java-io-ugly-try-finally-block>`__)::
+
+  f = open "file.txt"
+  try:
+      do_something f
+  finally:
+      close f
+
+  f = var InvalidHandle
+  try:
+      f := open "file.txt"
+      do_something f
+  finally:
+    if(f != InvalidHandle)
+      close f
+
+The unwinding protection solves the early exit leak. The failed acquire again doesn't really matter. This pattern does not solve adjacency, encapsulation, or interleaving, meaning that it is still awkward and verbose. For adjacency, throws act as go-downs and you have to scan down almost the whole function to the finally block to see what cleanups are waiting and if an allocation has a matching clean-up. Also there is an indentation pyramid with nested try-finally.
+
+try-with-resources
+==================
+
+The try-with-resources solves adjacency by making ``close`` a Java built-in interface, but it still can't do encapsulation or interleaving::
+
+  try(f = open "file.txt")
+      do_something f
+
+In particular `chained resources <https://stackoverflow.com/questions/12552863/correct-idiom-for-managing-multiple-chained-resources-in-try-with-resources-bloc>`__ are awkward.
+
+C# has a similar ``using (f = newThing) { ... }`` syntax, and Python has ``with``.
+
+Bracket
+=======
+
+Haskell has a ``bracket acquire release action`` combinator which functions similarly to try-with-resources. It similarly doesn't support interleaving or encapsulation.
+
+The `D guy <https://vimeo.com/97329153>`__ claims you need an even larger combinator than bracket, which goes as follows:
+
+::
+
+  if acquire then
+    when !next
+      rollback
+    release
+
+With masking this looks like:
+
+::
+
+  generalBracket acquire next rollback release =
+    mask $ \unmasked -> do
+      resource <- acquire
+      b <- unmasked (next resource) `catch` \e -> do
+        _ <- rollback resource
+        _ <- release resource
+        throw e
+      c <- release resource
+      return (b, c)
+
+acquire has to be masked because there could be an async exception between the action and running next.
+
+Scope guard
+===========
+
+This approach involves a "deferred release", which is called when the scope is exited. For example::
+
+  f = open "file.txt"
+  defer (close f)
+  do_something f
+
+It is available in C++ as the ScopeGuard class, in Go and Zig as the defer statement, and in D via the `scope <http://ddili.org/ders/d.en/scope.html>`__ keyword, where there are ``scope(exit)``, ``scope(success)``, and ``scope(failure)`` (referring to exiting the scope unconditionally as with ``defer``, without exception, or by exception respectively).
+
+It does solve early exit.
+
+The pattern nests in the sense that ``defer``-ed statements are run latest-defined to earliest-defined on exit, but again doesn't allow interleaving.
+
+It solves adjacency by putting the cleanup before the code that uses the resource, right after the code that's being cleaned up after. It's easy to verify that allocations match up with their nearby cleanups, but the clean-up is delayed to the scope's exit. The syntax is very compact, just specify a cleanup function, but is a statement rather than an expression, meaning that the acquire must also be a statement.
+
+``defer`` and ``scope`` do not allow returning a resource from a function and skipping the cleanup. They also introduce memory allocation questions since the deferred expressions can capture local variables. So no encapsulation.
+
+Goto cleanup
+============
+
+C has this pattern::
+
+  f = open "file"
+  if (f == InvalidHandle)
+    return
+  if (isException (do_something f))
+    goto cleanup
+  if (isException (do_something_2 f))
+    goto cleanup
+  cleanup:
+    close f
+
+This suffers from early return but addresses adjacency in that the cleanup label is present and can handle interleaving with conditionals. It is not encapsulated though and it is easy to mess up handling a failed acquire.
+
+Garbage collection
+==================
+
+Java's GC almost solves the problem by making closing automatic. It looks like::
+
+  f = open "file.txt"
+  do_something f
+
+It solves all the issues of early exit, adjacency (implicit in the open), nesting, and encapsulation (the GC is based on global program flow). Unfortunately, Java's GC does not guarantee it will promptly call the finalizer, so in practice the semantics are not usable.
+
+RAII
+====
+
+C++ uses RAII, which looks like GC, but uses deterministic memory management, such as stack allocation or reference counting. It ensures you cannot insert any failure points between the allocation and the start of the cleanup's scope. A resource can even be allocated with its corresponding cleanup in the middle of an expression. But defining a new class for every cleanup operation is tedious - fortunately in C++0x it is possible to define one "RAII lambda" class and be done. C++'s semantics define lambda memory handling so there is no allocation issue.
+
+With C++17 it is apparently possible to copy/move RAII types and store them in a data structure or return them from a function, but it seems fragile. In case of a complicated object graph, such as multiple objects sharing a resource, RAII falls down because the available strategies for deterministic memory management are insufficient.
+
+Destructors
+===========
+
+Design
+------
+
+Stroscot's destructors (or finalizers, or whatever name) are inspired by C++'s RAII and Java's GC. I am still waffling on the name because they behave a lot more like C++-style RAII, but they are also not quite as prompt, closer to Java's GC finalizers. Java's finalizers have inherent problems because they are associated with GC. In particular, because the GC may not run, Java's finalizers have no guarantee of timeliness or ordering, and hence cannot be used to free resources. In contrast, with the "automatic static" analysis of :cite:`proustASAPStaticPossible2017`, Stroscot's destructors free as soon as it is statically known that they are no longer used, so they are much closer conceptually to the stack or reference counting discipline of RAII, even though they have the syntactic cleanliness of a GC solution.
+
+It is not really a big assumption that there is exactly one best place to release the resource, "immediately after the last use of the operation". Semantics-wise, identifying the best location for the release operation is a static, completely solvable problem. It is just of a high complexity :math:`\Sigma^0_1`. So if we are willing to accept potentially long compile times, we can eliminate resource management errors. Such an analysis is more precise than traditional GC, because GC looks at what references are "in scope" and cannot free an unused reference embedded in a structure whose other parts are in use.
+
+Compared to Java's finalizers and C++'s RAII, I made destructors functions, not directly associated with any objects. This avoids issues with resurrection and so on - the destructor is just a lambda so it can do whatever and still be freed without issues.
+
+Similarly to ``defer``, the cleanup is placed at the top, near or within the acquire call.
+
+Destructors directly implement D's ``scope(exit)``, and with an extra flag variable they can implement ``scope(success)`` and ``scope(failure)``:
+
+::
+
+  scope_failure rollback =
+    flag = mut true
+    f = newDestructor (if flag then { rollback } else {})
+    ScopeFailure flag f
+
+  use (ScopeFailure flag f) =
+     flag := false
+     use f
+
+
+  s <- scope_failure rollback
+  code
+  use s
+
+Similarly to ``finally``, there is a marker for the end of the destructor, ``use``. `Herb Sutter <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4152.pdf>`__ claims that the extra ``use`` is "tedious and fragile" and forces the programmer to think about the placement of failure determination. One can define a proper ``scope_failure rollback { ... }`` block structure, and even redefine return/continue/break to not run the rollback. But personally I think the destructors are more powerful because they allow interleaving scopes. The combinators only allow nesting which isn't as expressive. In Haskell land Snoyman created `ResourceT <https://hackage.haskell.org/package/resourcet>`__ even though there was ``bracket``, exactly because of this.
+
+Destructors allow nesting, in the natural way ``newDestructor a; newDestructor b; ...; use b; use a``, and interleaving, by reordering to ``use a; use b``. Destructors also support encapsulation such as returning the allocated resource. It is also possible to embed the destructor in a resource handle and use RAII style programming - each operation calls ``use`` and extends the destructor's life.
+
+Formal definition
+-----------------
+
+More formally, a destructor is a magic value created with the one-argument function ``newDestructor : (free : Command) -> Op Destructor``. It supports equality, hashing, and command ``use : Destructor -> Command`` and ``useForever : Destructor -> Op Command``. The semantics is that ``free`` will be called as soon as it is known that ``use`` and ``useForever`` will not be called. Calling ``use`` delays finalization until after the ``use``, and ``useForever`` cancels the destructor and returns the free operation. The general transformation:
+
+::
+
+  reduce (NewDestructor free c) =
+    f = freshSymbol
+    transform (c f) {free,f}
+  reduce (Use f c) = c
+  reduce (UseForever f c) = c free
+
+  transform : Task -> Task
+  transform c =
+    if !(could_call (Use f) c || could_call (UseForever f) c)
+      reduce (free {continuation = c})
+    else if will_definitely_call (UseForever f) c
+      reduce c
+    else
+      if not (will_definitely_call (Use f) c)
+        info("Delaying destructor due to conditional usage")
+      let c' = continuation c
+      reduce (c { continuation = transform c' })
+
+
+Non-prompt finalization
+-----------------------
+
+Destructors do not really free memory "immediately after the last use", as the `info("Delaying destructor due to conditional usage")`` message points out. Rather they free "immediately before the first point of non-use". This distinction is clear when the location to free depends on further input:
+
+::
+
+  af = print "a"
+  a = newDestructor af
+  b = input Bool
+  if b then
+    print "b"
+    exit
+  else
+    print "c"
+    use a
+    print "d"
+    exit
+
+Because ``a`` might be used in the else branch, it cannot be freed between the ``newDestructor af`` and ``input Bool`` statements, even though this would be the earliest place to free for a "true" input. Instead, ``a`` is freed as soon as it is known it will (unconditionally) not be used, hence this program is equivalent to:
+
+::
+
+  af = print "a"
+  b = input Bool
+  if b then
+    af
+    print "b"
+    exit
+  else
+    print "c"
+    af
+    print "d"
+    exit
+
+Non-prompt finalization can be made into an error/warning if prompt memory management is desired.
+
+I also started out with a different API, instead of a ``free`` operation that gets called at random times, destructors had this operation ``lastUse : Destructor -> Op Bool``, that returns false for every call except the last. The pattern for using this API is to make each use check if it's the ``lastUse``, like ``use (PromptDestructor free d) = { l = lastUse d; if l { free } }``. But eventually I proved that this API is equivalent to the current API: the use function works to implement the current API with the old (minus the non-prompt finalization), and you can implement the old API with the current one::
+
+  isFinalized = mut false
+  f = newDestructor { isFinalized := true }
+  lastUse =
+    use f
+    read isFinalized
+
+This implementation of destructors works just fine in programs where the warning is not triggered. But with non-prompt finalization, delaying until known, the contract is not valid because ``lastUse`` could return false even though it is the last use (delaying the destructor after the read).
+
+Subsuming manual memory management
+----------------------------------
+
+By construction, destructors without the warning are as prompt as destructors with the warning, on the programs where the warning does not trigger. In particular, destructors subsume prompt destructors subsume manual memory management. Taking a program written with standard ``malloc/free``, we can gradually change it:
+
+1. ``malloc`` is wrapped to return a tuple with ``newDestructor``, ``free`` is replaced with ``use``
+2. every operation is modified to call ``use``
+3. the destructor warning is turned off
+4. The ``use`` calls corresponding to ``free`` are removed
+
+Up until step 4, the destructor program compiles identically to the original. It's step 4 that's a bit fragile - the lifetime of the destructor could be shortened and, depending on the program structure, the point at which ``free`` should be called may become much harder to compute. But hopefully the analysis will be fairly robust and able to handle most cases. At worst, the programmer will have to provide additional help to the destructor analysis in the form of inserting the ``use`` statements corresponding to ``free``. Either way, since all operations call ``use``, the program behavior is not changed, only its resource management.
+
+Destructor order
+----------------
+
+If multiple destructors simultaneously become able to call ``free``, then destructor instruction insertions are run in the order of creation, first created first. This means the free calls will execute most recent first, e.g. if there is an exception.
+
+::
+
+  a = newDestructor (print "a")
+  b = newDestructor (print "b")
+
+  if randomBool then
+    print "c"
+    exit
+  else
+    print "c"
+    use a
+    use b
+    exit
+
+  # when bool is false: cab
+  # when bool is true: bac
+
+Freed on exit
+-------------
+
+Many resources are automatically freed by the OS on exit: memory, file handles, etc. This automatic freeing is generally more efficient than releasing each resource one by one. So as an optimization, one would like to *not* free these resources earlier, but rather hold on to them until the program exits and the OS frees them itself. So what we need is an analysis that determines at what point in the program there are sufficient spare resources that any further allocation can be satisfied without deallocation. This measure "the remaining amount of additional memory the program might use" will not necessarily be compared against the remaining memory amount of free physical memory actually available, but more likely a configurable parameter like 2MB. Once this point is determined the compiler can insert ``useForever`` calls to mark all the in-use resources as not needing manual finalization.
+
+Sloppy frees
+------------
+
+GC is more composable and it can also be faster than manual memory management :cite:`appelGarbageCollectionCan1987`. As Appel points out, even if freeing an individual object is a single machine instruction, such as a stack pop, freeing a lot of objects still has significant overhead compared to copying out a small amount of useful data and just marking a whole region of objects as free. In a similar vein, sometimes we do not actually want the destructor to run as promptly as possible, but rather batch it with other allocations and free it all in one go. The opportunities for batching are hard to detect and even harder to implement by hand. Setting some "slop factor" of memory that can be delayed-freed is quite useful - the only downside is that if the program is pushing the limits of memory maybe it will crash at 1.9GB instead of 2GB.
+
+Really, we are distinguishing "unused" or "dead" memory from memory that is released back to the OS or the rest of the program.
+
+Evaluation order
+----------------
+
+There are also "space leaks" where memory could be freed earlier by evaluating expressions in a specific order but some other order is chosen. Certainly there is some evaluation order that results in minimum RAM usage, but maybe a less compact order is more time-efficient. So there is some amount of time-space tradeoff for this category. Destructors kind of skirt this issue by being completely imperative, but with unsafePerformIO this becomes relevant again.
+
+On borrowing
+------------
+
+Rust has gotten much attention with the borrow checker, documented in :cite:`weissOxideEssenceRust2019`. Similar to destructors, Rust also has a concept of the "lifetime" of each reference. But, whereas in Stroscot the lifetime is simply the set of program states during which the reference is not dead, in Rust a lifetime is a *region* consisting of annotating each program point with the set of *loans* of the reference, where each loan is either unique or shared. At each point, a reference may have no loans, one unique loan, or many shared loans - no other possibilities are allowed. This restrictive set of allowed access patterns means that Rust does not allow simple cyclic pointer patterns such as doubly-linked lists.
+
+Similarly, Val's `mutable value semantics <https://www.jot.fm/issues/issue_2022_02/article2.pdf>`__ is even more restrictive than Rust, dropping references altogether and simply using the function parameter annotation ``inout``. But it once again cannot represent any sort of cyclic pointer structure. It is really just the trick for representing state as the type ``s -> (a,s)``, and doesn't handle memory management at all.
+
+In practice, Rust developers have a variety of escapes from the borrow checker.  code frequently switches to the ``Rc`` reference counted type, which besides cycles has the semantics of GC. There is even a `library <https://github.com/Others/shredder>`__ for a ``Gc`` type that does intrusive scanning.
+
+Per :cite:`proustASAPStaticPossible2017`, destructors and the "free after last use" criterion subsume both region-based memory management and reference counting. :cite:`corbynPracticalStaticMemory2020` implemented a buggy incomplete version and showed even that version is comparable to Rust.
+
+Implementability
+----------------
+
+If doing automatic static memory management is so easy, why hasn't it been tried before? Well, it has. For example, :cite:`guyerFreeMeStaticAnalysis2006` has a similar notion of automatically inserting frees, and they report good results. But that paper focused on reachability, rather than lack of use, and their analysis was local to function blocks, rather than global. So it didn't see much adoption.
+
+:cite:`proustASAPStaticPossible2017` presented the theory and the formulation of the problem fairly well, but he fell into the trap of thinking that since the complexity of determining "waste blocks" was :math:`\Sigma_0^1`, any analysis had to be approximate. There are techniques for solving such high-complexity problems precisely, as evidenced in the TERMCOMP termination analysis competition, but such techniques really only got started in 2007 or so. From his citations list, Proust didn't really get into this area of the literature.
+
+So the answer is, it seems novel to try to apply techniques from formal verification to memory management, and that's the only technique that seems powerful enough to implement destructors in the way presented here, where the point of finalization is guaranteed. All previous approaches have focused on approximate analyses that aren't powerful enough to subsume manual memory management.
+
+Certainly there is some risk involved in implementing a novel analysis. But it doesn't seem like a `"cursed problem" <https://www.youtube.com/watch?v=8uE6-vIi1rQ>`__ where even trying to solve it is a waste of time - :cite:`corbynPracticalStaticMemory2020` got decent results with just 8 months or so of part-time work. I'd rather be spending a lot of effort on solving the right problem, even if it's hard, than getting sidetracked solving the wrong easy problem.
+
+Exceptions in destructors
+-------------------------
+
+What to do when throwing an exception from a destructor? Say we have
+
+::
+
+  x =
+    f = newDestructor (throw Bar)
+    throw Foo
+    use f
+
+  y = x catch \case
+          Foo -> print "foo"
+          Bar -> print "bar"
+
+Per Stroscot's semantics, the destructor runs as soon as it is known that ``use`` will not be called - i.e. immediately after constructing the destructor, before the ``throw Foo`` (because ``use f`` is unreachable). So ``x`` is equivalent to ``throw Bar; throw Foo``. Then per normal exception semantics ``throw Bar`` wins since it is first and ``y`` outputs ``bar``.
+
+If we had an I/O operation ``writeOrThrow`` instead of ``throw Foo``, then ``use f`` is reachable. So then we have two cases:
+
+* if ``writeOrThrow`` throws, then it is known that ``use`` will not be called. The destructor will be delayed until just before the first I/O operation executed after the ``writeOrThrow``, in this case ``print "foo"``. So ``y`` will exit with a ``Bar`` error without printing anything.
+* if ``writeOrThrow`` doesn't throw, then the destructor will execute after the last ``use``. Again ``y`` will exit with a ``Bar`` error without printing anything.
+
+As far as I can tell this is a logical choice. C++ instead decided to terminate the program on the ``throw Bar`` in the destructor. The justification seems to be that it slightly simplified the implementation of unwinding, and that the C++ STL wanted to "arbitrarily require that [destructors] may not throw." (`Herb Sutter <https://ptgmedia.pearsoncmg.com/imprint_downloads/informit/aw/meyerscddemo/DEMO/MAGAZINE/SUTTER.HTM>`__, also Item 16 "Destructors That Throw and Why They're Evil" of his 1999 book Exceptional C++) `John Kalb and David Abraham <http://web.archive.org/web/20130728131646/cpp-next.com/archive/2012/08/evil-or-just-misunderstood/>`__ say "The reason we can’t have throwing destructors is that nobody worked out how to deal with multiple exceptions wanting to propagate through the same set of stack frames. [...] we think termination is a bit draconian. Frankly, we don’t think it’s so hard to nail down the final details of how this should work."
+
+Kalb proposes to "drop the second [destructor] exception on the floor and propagate the original one", but this is a bad idea. Since (per Sutter's convention) destructors generally don't fail, an exception during a destructor is going to be fairly serious, such as an OOM. Ignoring this exception in favor of a trivial logic exception is the wrong approach.
+
+Exception safety
+================
+
+Exception safe code is code that works correctly even when exceptions are thrown. The basic issue is in Rust/C++ almost everything is a resource. In Stroscot almost everything is a value, inert data - copy/clone is built into the language and can't fail. Taking away explicit memory management makes it much easier to ensure exception safety. Reasoning about invariants with pure values is straightforward, and fail-fast coding styles mean that the program doesn't live long. And when writing cleanups the programmer is already thinking about exception safety and restoring invariants, so will write an exception-safe cleanup.
+
+Still, what about exception safety for a mutable data structure? C++ has `levels of safety <https://en.wikipedia.org/wiki/Exception_safety>`__ for stateful functions based on what invariants are preserved.
+
+* no-throw means forward progress is guaranteed. It's nice when you can give this, but most functions can fail due to insufficient memory.
+* Strong safety means that state transitions happen atomically and a failure will return to the old state. To ensure this one needs basic safety and to copy the relevant data beforehand and write it back afterwards.  This is infeasible for even simple data structures in C++ due to overloaded assignment and copy operators being able to throw.
+* Basic safety means that the final state will be valid, i.e. all invariants hold. You need to safeguard against each function call throwing. This requires adding handling code to each call and trusting the documentation for the list of thrown exceptions (or using no-throw).
+
+These levels only work for stateful data structures that call a small and easily auditable set of other functions.
+
+In Stroscot there are two steps to making code exception-safe:
+
+* identify invariants. These can be written as assertions using the pure read operations on stores. With this the static verification will identify the function and the exceptional control flow that breaks the invariant.
+* place exception cleanup handlers to restore broken invariants
+
+This code in Rust or C++ is not exception safe: (based on `this code <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1995/N0623.asc>`__ and `this code <https://github.com/rust-lang/rfcs/blob/master/text/1236-stabilize-catch-panic.md#background-what-is-exception-safety-in-rust>`__)
+
+::
+
+  push_ten_more : (v : Vec T) -> T -> Op { v : Vec (T|uninitialized) }
+  push_ten_more (this@(readRef -> Vector arr)) t =
+    new_arr = alloc (length arr + 10)
+    for (i in indexes arr)
+      copy arr[i] to new_arr[i]
+      delete arr[i]
+    this := Vector new_arr
+
+    for i in 0..10 {
+      (ptr v) offset (len + i) := t.clone()
+    }
+  }
+
+The update to the Vector happens when the next 10 elements are uninitialized, and ``Vec`` has an internal invariant that its elements are safe to deallocate. So if `t.clone` throws then the initialization will not be called. Vec's destructor that assumes the invariant will then free uninitialized memory.
+
+In Stroscot, most values are passed directly, not by reference. So there is no need for copying, cloning, or explicitly calling destructors, and the whole operation can fail only with OOM or an async exception (which is common to most functions so it's left implicit). Similarly the array has no path where it is returned uninitialized. So here is the corresponding Stroscot code::
+
+  push_ten_more : Vec T -> T -> Op (Vec T)
+  push_ten_more arr t =
+    new_arr = alloc (length arr + 10) uninitialized
+    for (i in indexes arr)
+      new_arr[i] := arr[i]
+    for i in 0..10 {
+      arr[len + i] := t
+    }
+
+Another example is ``TwoList``, which maintains two mutable linked lists with the invariant that they are the same. What does adding an element look like?
+
+::
+
+  add a (List l) =
+    head = read l
+    l := Cons a head
+
+  -- not exception safe
+  add a (TwoLists l1 l2) =
+    add a l1
+    add a l2
+
+This is not exception safe for the invariant, because an async exception between the two adds may add a to l1 but not l2. So we fix it by (1) adding the assertion and (2) adding a cleanup handler (following the scope guard destructor pattern).
+
+::
+
+  -- (1) add the assertion
+  add : T -> { TwoList l1 l2 | freeze l1 == freeze l2 } -> { TwoList l3 l4 | freeze l3 == freeze l4 }
+  add a (TwoLists l1 l2) =
+    -- (2) add a cleanup handler
+    finishedFlag = mut false
+    l1_old = freeze l1
+    l2_old = freeze l2
+    f = newDestructor {
+      when not finishedFlag then
+        l1 := unfreeze l1_old
+        l2 := unfreeze l2_old
+    }
+    add a l1
+    add a l2
+    -- disarm cleanup handler
+    finishedFlag := true
+    use f
+
+Here we have strong safety, i.e. it restores the state if an exception is thrown during the add. If we didn't use the cleanup handler, the assertion would fail because we wouldn't necessarily have strong safety.
+
+Poisoning
+---------
+
+Rust has "poisoning" for taking locks, which is a safeguard against trying to use data that was corrupted due to an exception, e.g. a concurrent heap that doesn't have the heap property anymore. Poisoned data will tear down all connected threads.
+
+Essentially it uses the scope_failure cleanup to poison the lock on failure:
+
+::
+
+  getGuard mutex =
+    acquireLock mutex
+    flag = mut true
+    f = newDestructor {
+      if flag then
+        poisonLock mutex
+      releaseLock mutex
+    }
+    Guard flag f
+
+  finishGuard (Guard flag f) =
+    flag := false
+    use f
+
+Once the lock is poisoned then locking the mutex will throw an exception.
+
+However, the invariant checking approach to exception safety discussed above will throw an exception anyway when trying to use a data structure operation on a corrupted data structure. It also covers the single-threaded case where no mutex is used but the recovery from an exception is incomplete. So poisoning isn't really that useful. But a PoisoningMutex can at least be in the standard library for those who want the functionality.
+
diff --git a/docs/Commentary/Security.rst b/docs/Commentary/Language/Security.rst
similarity index 73%
rename from docs/Commentary/Security.rst
rename to docs/Commentary/Language/Security.rst
index 9b2a8fa..2e695b2 100644
--- a/docs/Commentary/Security.rst
+++ b/docs/Commentary/Language/Security.rst
@@ -50,3 +50,47 @@ Logging
 -------
 
 Logging is important for security and a common logging approach should be used across the system. The logs should be detailed enough to identify attacks, including UTC timestamp information. They should avoid sensitive information if possible. Regardless, logs must be considered as assets, as an attacker will often wish to erase their tracks.
+
+Cryptography
+============
+
+timing attacks
+most of the time our time is very precious and so we want to speed up things and do multiple operations at once
+but with crypto the time may reveal information about the secret key
+we can do clever tricks like making a fixed length loop
+we have this drive for speed but optimization can also break the cryptographic properties
+
+valgrind or address sanitizer for checking for memory problems
+it's really helpful to have tools that work on binary so you don't have to worry about getting into the whole compilation process
+taint tracking
+
+processors themselves are unpredictable - 3 to 7 cycles for an add on an ARM cortex
+micro-implementations for each platform and processor - multiplies surface area
+
+
+formal methods - how to express properties/proofs succinctly? Coq etc. are a pain to use, most mathematicians don't use them
+- specify program
+- specify mathematical input-output relationship
+- prove that you have that input-output relationship for that software
+
+nevertheless EverCrypt managed to make a formally verified crypto library - the code really has the maximum assurance of any code that we've seen for cryptography
+it's written in C and Coq
+
+testing - find a bug, make tests that will catch that bug, add that to your regression test suite, and make sure it never happens again. Then you get e.g. test vectors which make sure no implementation has that bug. fuzzing - test a space rather than just hand-written cases.
+but - in Falcon 2, all of the implementations that were released had the same test vectors and the same leaking bug. "this shows that the traditional development methodology (even when being super careful) has failed"
+with even the most advanced fuzzing you're not testing all of the possible inputs - millions of security holes consist of the attacker finding some input which nobody thought of testing for. it's just some obscure kind of input where the attacker says, "haha - if I input exactly that length after setting off the following condition, then the following weird thing is going to happen and I can take advantage of it".
+
+so how do you deal with this?
+- it's very low probability, so known answer tests are not going to find this
+- proving correctness is tedious
+- symbolic testing
+
+you build up a computation DAG showing how the arithmetic from these inputs inputs gives you some output through a series of computations and then you analyze this graph and say "yes, this works correctly".
+
+anger - works on binaries, starts from valgrind. it has this cool GUI called anger management
+Manticore supposedly can do the same thing from binary and comes with a lot of the same kinds of analyses
+
+but if the SMT solvers aren't smart enough to see that the resulting code works, then you have to build some new tools
+
+compared to all the proving tools, doing symbolic testing/symbolic execution is actually fun
+
diff --git a/docs/Commentary/Sets.rst b/docs/Commentary/Language/Sets.rst
similarity index 80%
rename from docs/Commentary/Sets.rst
rename to docs/Commentary/Language/Sets.rst
index b4982ee..b023045 100644
--- a/docs/Commentary/Sets.rst
+++ b/docs/Commentary/Language/Sets.rst
@@ -6,6 +6,8 @@ Stroscot allows specifying properties about execution, which the compiler then a
 Sets
 ====
 
+Steelman 3C "It shall be possible to define new data types in programs. A type may be defined as an enumeration, an array or record type, an indirect type, an existing type, or a subtype of an existing type. It shall be possible to process type definitions entirely during translation. An identifier may be associated with each type. No restriction shall be imposed on user defined types unless it is imposed on all types."
+
 Sets in Stroscots are defined by a predicate ``a isElementOf S : Value -> Set -> Bool``, where ``Set = { s : Any | forall a : Any. (a isElemof s : Bool) }``. The predicate can describe any side effect free computation, so a set can describe any boolean function. As described in :ref:`paradoxes` Stroscot avoids the set-theoretic paradoxes by requiring every set definition to be well-founded.
 
 Sets don't contain null by default, you have to add it to the predicate as an allowed value.
@@ -115,7 +117,11 @@ There are also fixed-point rational formats like ``x divided_by 2^5 : Scaled int
 Enumeration
 -----------
 
-An enumeration is a set of symbols but the order of the symbols is defined.
+Steelman 3-2A. "There shall be types that are definable in programs by enumeration of their elements. The elements of an enumeration type may be identifiers or character literals. Each variable of an enumeration type may be restricted to a contiguous subsequence of the enumeration.""
+
+Steelman 3-2B. "Equality, inequality, and the ordering operations shall be automatically defined between elements of each enumeration type. Sufficient additional operations shall be automatically defined so that the successor, predecessor, the position of any element, and the first and last element of the type may be computed."
+
+An enumeration is a set of symbols but various operations on the symbols are automatically defined.
 
 ::
 
@@ -128,6 +134,10 @@ It's a macro that defines the symbols, a comparison operator, conversion to/from
 ADTs
 ----
 
+3-3G. It shall be possible to define types with alternative record structures (i.e., variants). The structure of each variant shall be determinable during translation.
+
+3-3H. Each variant must have a nonassignable tag field (i.e., a component that can be used to discriminate among the variants during execution). It shall not be possible to alter a tag field without replacing the entire variant.
+
 Abstract data types are sets containing trees of uninterpreted symbols. So a datatype declaration (from `here <https://github.com/UlfNorell/insane/blob/master/Context.agda>`__)
 
 ::
@@ -149,16 +159,15 @@ is equivalent to
 
 :cite:`dolanPolymorphismSubtypingType2017` section 2.2 says the recursive appearance of ``Cxt Ty`` is interpreted using the least pre-fixed point and Bekić's theorem, but I think any fixed point will do.
 
-Records
--------
+Refinements
+-----------
 
-Record specifications can be closed, listing all properties, or open, allowing other associations not mentioned. For example ``{a: "s", b: 2} isElementOf OpenRec [("b",Int)]``. The fields can be ordered or unordered. Some fields can be omitted, i.e. they are optional. This is different from a ``Maybe`` type because ``{a: Nothing}`` is distinct from ``{}``. This can be accomplished by writing ``Rec [("b",Int,Optional)]``.
+Steelman 3D. "The constraints that characterize subtypes shall include range, precision, scale, index ranges, and user defined constraints. The value of a subtype constraint for a variable may be specified when the variable is declared. The language should encourage such specifications. [Note that such specifications can aid the clarity, efficiency, maintainability, and provability of programs.]"
 
-Rich Hickey seems to think values like ``("b",Int)`` are important and should have names like ``:b`` and ``:c``, so you can write ``{b: 2, c: 3} isElementOf Rec [:b,opt :c]``. I guess? It could be useful if you use the same fields a lot, but it doesn't save too much typing. The real optimization is defining collections of fields, so you can write ``Rec ([:b,:c] ++ standardfields)``.
+::
 
-Clojure also has this thing about namespace qualification for records. XML had qualified names (namespaces), JSON doesn't. Everybody has switched from XML to JSON. So it seems like namespaces are overcomplicating the issue. Generally formats have a single domain of authority and don't need namespacing - the interpretation of a field is a property of the schema, not the value. This can be seen in the evolution of the ``<svg>`` element from
+  RefinementType = { x : SomeType | someProperty x }
 
-If you do have user-defined fields and need some simple thing to avoid key collisions you can use long key names like "org.mycompany.thing". This relies on the simple assumption that non-namespaced property names won’t have "." in them. But unlike a namespace mechanism this doesn't view unqualified names as incomplete qualified names, rather it sees qualified names as over-specialized names. "Over" is because you can't access it like ``obj.org.mycompany.thing``, you have to write ``obj["org.mycompany.thing"]``.
 
 Tracing
 -------
@@ -180,6 +189,8 @@ But basically this is an error message and error messages are hard.
 Annotations
 ===========
 
+4B. It shall be possible to specify the type of any expression explicitly. The use of such specifications shall be required only where the type of the expression cannot be uniquely determined during translation from the context of its use (as might be the case with a literal).
+
 Programmers can use annotations to say that a value is in a specific set. This restricts the possible range of values an identifier may take, allowing the compiler to optimize for a specific runtime representation.
 
 Set annotations are translated to assertions, and these assertions are statically checked, meaning values outside the set will give an assertion failure.
@@ -193,6 +204,8 @@ Set annotations are translated to assertions, and these assertions are staticall
 Function annotations
 ====================
 
+7G. The type of each formal parameter must be explicitly specified in programs and shall be determinable during translation. Parameters may be of any type. The language shall not require user specification of subtype constraints for formal parameters. If such constraints are permitted they shall be interpreted as assertions and not as additional overloading. Corresponding formal and actual parameters must be of the same type.
+
 The main function type declaration restricts the definition of the function so it is only applied on the type, i.e. without other definitions the function is not defined outside its type. You can define multiple restricted functions to obtain overloaded behavior on different types. The restriction shows up in documentation and IDE tooltips.
 
 ::
diff --git a/docs/Commentary/State.rst b/docs/Commentary/Language/State.rst
similarity index 95%
rename from docs/Commentary/State.rst
rename to docs/Commentary/Language/State.rst
index 4d9ed9e..7cb3ccd 100644
--- a/docs/Commentary/State.rst
+++ b/docs/Commentary/Language/State.rst
@@ -8,16 +8,20 @@ Blocks
 
 One issue with this denotational semantics view is that most programs these days are expected to operate in an interactive manner, receiving input, producing output, and manipulating a persistent state. How do we handle this? Well, we extend the semantics to include more values. In particular, the output of the program now includes values representing procedures or blocks of statements. So for example we might have a simple "check password" block that inputs a string, exits with failure if it is not the string "42", and otherwise outputs the string "You got it" and exits with success.
 
+Steelman 6B. There shall be a control mechanism for sequencing statements.
+
 Following Haskell, what we need is do-notation. In Haskell, this is by default tied to monads, but with GHC's language extensions such as RebindableSyntax and QualifiedDo, there are really no restrictions on the types. The basic syntax is three productions::
 
   simple statement {e} = e
   command {e;stmts} = e >> {stmts}
   operation {pat <- e; stmts} = e >>= \x -> case x of { pat -> {stmts}; _ -> fail "" }
 
-There is the general principle that any statement may be replaced with a sequence of commands, operations, and statements.
+Per :cite:`ichbiahRationaleDesignADA1979` there is the general principle that, in a block, any statement may be replaced with a sequence of commands, operations, and statements. But of course it does not make sense to end a block with an operation, as the assigned value cannot be used.
 
 In Haskell, ``e >> m = e >>= \_ -> m``, and the type is ``(>>) : IO a -> IO b -> IO b``. I actually consider this a mistake. First, ``IO a`` makes it too easy to ignore return values - it should at least be a fixed type like ``IO ()``. But even ``()`` has some use, e.g. to allow returning exceptions, so really ``()`` should be replaced with a specialized type like ``FixedReturnValue``. At that point it seems worth defining a separate type synonym. So we can really consider statements to be two disjoint categories, ``Command = IO FixedReturnValue`` and ``Operation a = IO (a \ FixedReturnValue)``. If we hide ``FixedReturnValue`` then we just get two opaque types ``Command`` and ``Operation a`` and a combination ``IO a = Command | Operation a``. At that point there is essentially no linkage and we can just take ``Command`` and ``Operation a`` as two distinct types to begin with. So really ``(>>)`` should not be hardcoded to its monadic definition, for example it is useful to take ``(>>)`` to be function composition or category composition.
 
+Per :cite:`ichbiahRationaleDesignADA1979` one obvious distinction is that there is a no-op command, the empty block ``{}``, but this command is not an operation. I guess someone could define a ``nullBlock`` explicit command for it but that is something for the standard library.
+
 There is also ``return``, which in Haskell is one of the monad operations. There is definitely a difference between writing ``f prompt = getLine prompt`` and ``f prompt = { return (getLine prompt) }``, and it is not clear how one would phrase the second version "return a block that reads a line with this prompt" any other way.
 
 Ensuring that adding or removing braces on a single statement is a no-op is a useful property; this property is clear based on the translation but may be confusing to some. So for example we would write ``f prompt = { getLine prompt }`` and ``f prompt = return (getLine prompt)``. The "last statement's value is returned" property is somewhat common in newer languages, the naked return less so.
@@ -171,65 +175,11 @@ For ``mfix`` vs ``mdo``, there are many implicit laws promoted by the ``mdo`` no
 
 Unfortunately, the only known monad satisfying right shrinking is the lazy state monad ``s -> (a,s)`` (and its restrictions, the output monad ``Monoid w => (a,w)``, reader monad ``p -> a``, identity monad ``a``, and trivial monad ``Void``). Erkok has a proof 3.1.6 that the trivial monad ``const Void`` is the only monad satisfying left-strictness ``undefined >>= f = undefined``, right shrinking, strictness 2.1.1, and various other properties. The setup is the above right shrinking rule where ``f xs = return (1 : xs); g xs = case xs of [x] -> return x; _ -> return 1``. He says "It is easy to see that [the LHS mdo block] must yield bottom by the strictness property". Expanding on this, if we start with ``(z,w) = undefined``, then after one loop we get ``z=1:undefined`` and ``g z = undefined``, so the overall function returns ``undefined`` by the left-strictness property, so therefore by strictness 2.1.1 the overall ``mfix`` is also undefined. But of course, if we start with the fixed point of the RHS, ``(z,w) = (repeat 1,1)``, we get that back even in the LHS. So Erkok's proof relies on strictness and ``mfix`` producing the least fixed point. Using similar arguments about bottom, there is a proof that Haskell `cannot have a State monad <https://smallbone.se/no-state-monad.html>`__. Really, the discussion should focus on the monad's behavior for defined values and total functions, and not discuss bottom or seq at all. I think it's best to suppose that the premises of the proof are incorrect, but the fact remains that the only known right-shrinkable monad is the lazy state monad. Absent other monads, it seems the lazy state monad is really the canonical implementation of ``MonadFix``, similar to how lists are the only non-trivial implementation of ``MonadZip`` and everything else is just lifting.
 
-But Erkok had to write a thesis, so of course he can't just say "lazy state is MonadFix" and leave it there. Erkok proposes to leave out right shrinking and other properties to obtain a more general definition. The main issue with this is that ``mfix`` for the lazy state monad is no longer unique - there is a "shallow" mfix operations which simply apply the function to the bottom state. Erkok's definition of ``mfix`` for I/O is shallow in this sense. ``mfix`` cannot be implemented in general for ADT-style monads. (c.f. `GHC.Generics instances <https://hackage.haskell.org/package/base-4.18.0.0/docs/src/Control.Monad.Fix.html#line-140>`__). These operations are kind of useful for
-
-data    U1        p = U1                  -- lifted version of ()
-data    (:+:) f g p = L1 (f p) | R1 (g p) -- lifted version of Either
-data    (:*:) f g p = (f p) :*: (g p)     -- lifted version of (,)
-newtype K1    i c p = K1 { unK1 :: c }    -- a container for a c
-newtype M1  i t f p = M1 { unM1 :: f p }  -- a wrapper
-newtype Par1   p = Par1 { unPar1 ::   p } -- gives access to parameter p
-newtype Rec1 f p = Rec1 { unRec1 :: f p } -- a wrapper
-
-Consider f ⊥. If it is ⊥, mfix f must be ⊥ by strictness. If f ⊥ = Left l, then f must factor through Left by monotonicity, i.e., there must be a function h such that f = Left · h, or equivalently, h = unLeft · f. Then mfix f = mfix (Left · h) = Left $ mfix h = Left $ mfix (unLeft . f). Similarly Right r must factor as f = Right . h, h = unRight . f, and mfix f = Right $ mfix (unRight . f). So to summarize, we have:
-
-  mfix f = case f ⊥ of
-   L1 _ -> L1 $ mfix (\x -> case f x of L1 y -> y)
-   R1 _ -> R1 $ mfix (\x -> case f x of R1 y -> y)
-
- (4.3)
-Just
- → return (fix (unJust · f ))
-Note that we did not make any choices in constructing Equation 4.3; the behavior of
-mfix is completely dictated by the properties that must be satisfied by all value recursion
-operators. We leave it to the reader to show that Equations 4.2 and 4.3 are equivalent,
-establishing uniqueness.
+But Erkok had to write a thesis, so of course he can't just say "lazy state is MonadFix" and leave it there. Erkok proposes to leave out right shrinking and other properties to obtain a more general definition. The main issue with this is that ``mfix`` for the lazy state monad is no longer unique - there is a "shallow" mfix operations which simply apply the function to the bottom state. Erkok's definition of ``mfix`` for I/O is shallow in this sense. ``mfix`` cannot be implemented in general for ADT-style monads. (c.f. `GHC.Generics instances <https://hackage.haskell.org/package/base-4.18.0.0/docs/src/Control.Monad.Fix.html#line-140>`__), specifically missing instances for ``U1 = ()``, ``:+: = Either``, ``Par1``, and ``K1``. And then, for continuations, Erkok has a type-theoretic argument for why no implement of mfix exists. For Codensity, there are `several potential implementations <https://github.com/ekmett/kan-extensions/issues/64>`__ of ``mfix``, based on I/O and state, but nobody has proven them correct - likely they only work for a subset of programs.
 
+So really mfix is just a few operations (mfix state, fixIO, fix applied to a generic traversal of a strict product-like monad) that happened to be collected in an ad-hoc manner. The main issue with the "relaxed" mfix definition is that, because the instances satisfy barely any properties, the ``mdo`` notation is completely unintuitive - simply adding a non-recursive statement at the end can break the program. For this reason, ``mdo`` is pretty much deprecated and most recommendations are to use ``mfix`` directly or ``rec {}`` which is a lightweight macro for ``mfix`` with a variadic tuple argument. At that point, ``mfix`` is just an ordinary function, not really part of the monad syntax.
 
-instance MonadFix Par1 where
-    mfix f = Par1 (fix (unPar1 . f))
-
--- | @since 4.9.0.0
-instance MonadFix f => MonadFix (Rec1 f) where
-    mfix f = Rec1 (mfix (unRec1 . f))
-
--- | @since 4.9.0.0
-instance MonadFix f => MonadFix (M1 i c f) where
-    mfix f = M1 (mfix (unM1. f))
-
--- | @since 4.9.0.0
-instance (MonadFix f, MonadFix g) => MonadFix (f :*: g) where
-    mfix f = (mfix (fstP . f)) :*: (mfix (sndP . f))
-      where
-        fstP (a :*: _) = a
-        sndP (_ :*: b) = b
-
-
-
- And then, for continuations, Erkok has a type-theoretic argument for why no implement of mfix exists. For codensity, there are `several potential implementations <https://github.com/ekmett/kan-extensions/issues/64>`__ of ``mfix``, based on I/O and state, but nobody has proven them correct.
-
-or just a few operations (mfix state, fixIO, fix applied to a generic traversal of a strict sum-like monad) that happened to be collected in an ad-hoc manner. The main issue with the "relaxed" mfix definition is that, because the instances satisfy barely any properties, the ``mdo`` notation is completely unintuitive - simply adding a non-recursive statement at the end can break the program. For this reason, ``mdo`` is pretty much deprecated and most recommendations are to use ``mfix`` directly or ``rec {}`` which is just a lightweight macro for ``mfix`` with a tuple argument. At that point, ``mfix`` is just an ordinary function, not really part of the monad syntax.
-
-``let`` allows recursive definitions, bare definitions are not recursive:
-
-::
-
-  x = x + 1 # defines a new x shadowing the old x
-  let x = (x : Int) * 2 # defines a fixed point with unique solution x=0
-
-  fact x = if x==0 then 0 else x * fact (x-1) # fails due to fact being an unbound symbol
-  let fact x = ... # proper definition
-
+So I am thinking to support recursive variable definitions (as the lazy state monad) specially, and otherwise prohibit recursive definitions for more complex monads.
 
 Arrows
 ------
@@ -569,6 +519,8 @@ There is still a pure/impure dichotomy though. Regardless of syntax, impurity ca
 "Unsafe" I/O
 ============
 
+Steelman 4C. The language shall attempt to minimize side effects in expressions, but shall not prohibit all side effects. A side effect shall not be allowed if it would alter the value of a variable that can be accessed at the point of the expression. Side effects shall be limited to own variables of encapsulations. The language shall permit side effects that are necessary to instrument functions and to do storage management within functions. The order of side effects within an expression shall not be guaranteed. [Note that the latter implies that any program that depends on the order of side effects is erroneous.]
+
 Haskell has ``runST`` and ``unsafePerformIO`` that allow turning impure computation into pure computations. These can be implemented by throwing a resumable exception that's caught in a top-level handler that does the I/O. ``runST`` scrutinizes its computation for impure behavior such as printing or returning allocated references, while ``unsafePerformIO`` does not and exposes the internal evaluation order.
 
 If one wants to understand the evaluation order or is dealing with commutative operations, these functions are quite useful, e.g. Debug.Trace.trace looks like a non-I/O function but actually outputs something on the console, and allocation can be done in any order.
diff --git a/docs/Commentary/TermRewriting.rst b/docs/Commentary/Language/TermRewriting.rst
similarity index 90%
rename from docs/Commentary/TermRewriting.rst
rename to docs/Commentary/Language/TermRewriting.rst
index a932382..931089e 100644
--- a/docs/Commentary/TermRewriting.rst
+++ b/docs/Commentary/Language/TermRewriting.rst
@@ -8,7 +8,7 @@ The definition is vaguely based on :cite:`vanoostromConfluenceAbstractHigherOrde
 
 There is some question of whether context substitution is capture-avoiding, i.e. does ``(\x. □)[□ := x]`` not resolve to ``(\x. x)``. Terese says it captures. With Oostrom this substitution is forbidden since ``x`` is not a closed term. In our more liberal definition this resolves to ``(\y. x)`` by alpha-renaming the variable.
 
-An example substitution calculus is the lambda calculus. The set of preterms is built from nullary symbols (variables, holes, symbols, and constants), applications of two preterms, and abstractions of a variable and a preterm. The substitution calculus rewriting rules are beta reduction and alpha renaming. Eta reduction can be added but makes the system only weakly orthogonal. :cite:`endrullisHighlightsInfinitaryRewriting2012`
+An example substitution calculus is the lambda calculus. The set of preterms is built from nullary symbols (variables, holes, symbols, and constants), applications of two preterms, and abstractions of a variable and a preterm. The substitution calculus rewriting rules are beta reduction and alpha renaming. Eta reduction can be added too and makes the system only weakly orthogonal. :cite:`endrullisHighlightsInfinitaryRewriting2012`
 
 Currying
 ========
@@ -26,9 +26,9 @@ The PP system is an extension of the uncurried system, and is isomorphic to the
 Conditional rewriting
 =====================
 
-In a general CTRS the reduction rules are of the form ``t | P -> s``, where ``P`` is a predicate. The allowed rewrites of a conditional rule are the unconditional rewrites ``t -> s`` that satisfy the condition ``P``. We can add a form of logic programming by allowing conditions to use variables not in the LHS, e.g. ``precedes x z | exists y. precedes x y && precedes y z = true``. This further extends to allowing variables on the RHS not present on the LHS. A further extension allows LHSs that are a single variable but have a condition - some authors disallows variables LHSs. Some syntax like ``l | exists x. C = r`` and ``l = exists x. r`` might make this easier to follow.
+In a general CTRS the reduction rules are of the form ``t | P -> s``, where ``P`` is a predicate. The allowed rewrites of a conditional rule are the unconditional rewrites ``t -> s`` that satisfy the condition ``P``. We can add a form of logic programming by allowing conditions to use variables not in the LHS, e.g. ``precedes x z | exists y. precedes x y && precedes y z = true``. This further extends to allowing variables on the RHS not present on the LHS. A further extension allows LHSs that are a single variable but have a condition - some authors disallow LHSs like this. Some syntax like ``l | exists x. C = r`` and ``l = exists x. r`` might make this easier to follow.
 
-The definition of a CTRS is complicated by allowing predicates to refer to the rewrite relation ``->``. Ideally the rewrite relation would be defined as a fixed point of the rewrite rules. I.e. letting ``S`` be the system as a function of the rewrite relation ``->``, we would define ``->`` to be a relation ``R`` such that ``R = S(R)``. Terese presents the least fixed point. However, with badly-behaved conditions, no fixed point may exist, so instead we use the "optimal prefixedpoint", the intersection of the maximal prefixedpoints. I.e. we find the sets ``Pre = { R : R subseteq S(R) }, PreMax = { R in Pre : forall R' in Pre, R subseteq R' implies R= R' }, R = intersection PreMax``. We warn on all the reductions in ``S(R) \ R`` that make it not a fixed point.
+The definition of a CTRS is complicated by allowing predicates to refer to the rewrite relation ``->``. Naively, the rewrite relation would be defined as a fixed point of the rewrite rules. I.e. letting ``S`` be the system as a function of the rewrite relation ``->``, we would define ``->`` to be a relation ``R`` such that ``R = S(R)``. Terese presents the least fixed point. However, with badly-behaved conditions, no fixed point may exist, so instead we use the "optimal prefixedpoint", the intersection of the maximal prefixedpoints. I.e. we find the sets ``Pre = { R : R subseteq S(R) }, PreMax = { R in Pre : forall R' in Pre, R subseteq R' implies R= R' }, R = intersection PreMax``. We warn on all the reductions in ``S(R) \ R`` that make it not a fixed point.
 
 For example, with a system with the single rule ``a -> a | a is a normal form``, ``S({}) = {(a,a)}`` and ``S({a,a}) = {}``. There is no fixed point, so the naive definition doesn't work. The optimal prefixedpoint is ``{}`` so ``R = {}``. But we warn that the reduction ``a -> a`` is not included. As another example, take ``a -> b if not(c -> d); c -> d if not(a -> b)``. The maximal (pre)fixedpoints are ``{(a,b)}`` and ``{(c,d)}``, but their intersection is ``{}`` so the optimal prefixedpoint is ``R = {}``. We warn that ``a -> b`` and ``c -> d`` are not included (extra diagnostics could show they are mutually exclusive). Lastly, ``a -> b if a -> b`` allows the reduction ``a -> b`` as it is the maximal (pre)fixedpoint.
 
@@ -39,13 +39,13 @@ As far as terminology, the literature uses "term rewriting" to refer to uncondit
 Cycles
 ======
 
-The untyped lambda calculus has cycles, e.g. ``Omega = let w=\x.x x in w w`` reduces to itself and :cite:`venturinizilliReductionGraphsLambda1984` shows a 6-cycle ``M M I``. Similarly commutativity ``a + b = b + a`` generates cycles.
+The untyped lambda calculus has cycles, e.g. ``Omega = let w=\x.x x in w w`` reduces to itself and :cite:`venturinizilliReductionGraphsLambda1984` shows a 6-cycle ``M M I``. Similarly commutativity ``a + b = b + a`` generates cycles. Maude in fact has associative and commutative operators, declared with special syntax like ``op _;_ [assoc]``. But this is a bit specific; likely there are other equations we would want to have, like the Jacobi identity :math:`x\times (y\times z)\ =\ (x\times y)\times z\ +\ y\times (x\times z)` and so on. In general, we have a list of rewrite rules and it is not necessarily known a priori whether these may result in cycles, but we want to resolve any cycles that come up in a natural manner.
 
-:cite:`dershowitzRewriteSystems1991`'s notion of a congruence-class rewriting system is helpful - the rewrite rules are split into rules R and reversible equations S, and we consider the system R/S (R mod S). A term where only S rules apply (no R rules apply) is considered a normal form. So similarly we consider the `condensation <https://en.wikipedia.org/wiki/Strongly_connected_component#Definitions>`__ of the rewrite graph, condensing each SCC to a single term. This does away with the "rewriting modulo equations" formalism while still maintaining its power.
+In a lot of cases, the cycle may be due to reversible equations ``pat1 = pat2; pat2 = pat1``.:cite:`dershowitzRewriteSystems1991`'s notion of a congruence-class rewriting system is formulated along these lines - the rewrite rules are split into rules R and reversible equations S, and we consider the system R/S (R mod S). A term where only S rules apply (no R rules apply) is considered a normal form. But this "rewriting modulo equations" formalism is not sufficient - we see with Omega that even beta reduction, a single rule, and one not obviously intended to be cyclic, can result in cycles.
 
-A term is a "condensed normal form" if it has no reduction out of its SCC. Hence ``Omega``,  ``M M I`` and ``a + b`` would be condensed normal forms since their SCC contains themselves and they have no further reductions. We could further specify the normal form to be a canonical representative of the SCC, e.g. taking the smallest and lexicographically first element of the SCC, but leaving input unchanged seems better.
+So instead we consider the `condensation <https://en.wikipedia.org/wiki/Strongly_connected_component#Definitions>`__ of the rewrite graph, condensing each SCC to a single term. This condensation step is similar to the quotient R/S but acts individually on each term rather than on the system. A term is a "condensed normal form" if it has no reduction out of its SCC. Hence ``Omega``,  ``M M I`` and ``a + b`` would be condensed normal forms since their SCC contains themselves and they have no further reductions. We could further specify the normal form to be a canonical representative of the SCC, e.g. taking the smallest and lexicographically first element of the SCC, but leaving input unchanged and returning the first element of the SCC that is encountered seems better for debugging purposes.
 
-Orthogonal higher-order TRSs that are weakly head normalizing are acyclic, per :cite:`ketemaViciousCirclesRewriting2005`, so the cycle condensation doesn't affect standard functional programming because condensing acyclic rewriting systems gives back the same system. So the cycle detector doesn't have to be that great, even supporting associativity/commutativity is going into PhD territory.
+Orthogonal higher-order TRSs that are weakly head normalizing are acyclic, per :cite:`ketemaViciousCirclesRewriting2005`, so the cycle condensation doesn't affect standard functional programming - condensing acyclic rewriting systems gives back the same system. Thus cycle detection shouldn't have much of an impact on performance.
 
 Nondeterminism
 ==============
@@ -80,9 +80,9 @@ Infinite reduction is useful because it is "smoother" than finite reduction - no
 
 The idea is to extend our set of terms to include infinite terms, defined as the `metric completion <https://en.wikipedia.org/wiki/Complete_metric_space#Completion>`__ of finite terms with a distance function :math:`2^{-n}` if the n-th level of the terms is the first level where a difference appears and 0 if the terms are equal. By convention the top level is level zero. This definition is equivalent to a co-inductive definition of terms, i.e. the largest set consisting of term-parts whose subterms are co-inductive terms.
 
-This set, like the real numbers, is uncountably large and includes terms with no finite description. Actually we should be able to map the set of finite and infinite terms to the real interval [0,1]: do a breadth-first traversal of the term tree and encode the sequence of symbols using arithmetic encoding. Properly defined, the mapping should be a computable, continuous bijection that imposes a total order on terms. TODO: verify. There is then a canonical representation of each real number, such as `this continued fraction representation <https://oscarcunningham.com/494/a-better-representation-for-real-numbers/>`__, so also a canonical representation of each infinite term. Although, again like the reals, it is computationally more tractable to use an unevaluated-expression-tree symbolic representation and  than an infinite sequence of numbers.
+This set, like the real numbers, is uncountably large and includes terms with no finite description. Actually we should be able to map the set of finite and infinite terms to the real interval [0,1]: do a breadth-first traversal of the term tree and encode the sequence of symbols using arithmetic encoding. Properly defined, the mapping should be a computable, continuous bijection that imposes a total order on terms. TODO: verify. There is then a canonical representation of each real number, such as `this continued fraction representation <https://oscarcunningham.com/494/a-better-representation-for-real-numbers/>`__, so also a canonical representation of each infinite term. Although, again like the reals, it is computationally more tractable to use an unevaluated-expression-tree symbolic representation and special-case interesting classes of terms, than it is to use a generic representation as an infinite sequence.
 
-There are various extensions of the transitive closure to infinitary reduction, so the question arises as to which one to use. :cite:`kahrsInfinitaryRewritingClosure2013` discusses several and provides an ordering so that each is mostly a proper subset of the next (not sure about P* subset bi-infinite). Many of these use the monotonic closure X*, which is the least fixedpoint of the function G defined as G(R) = X(R) union R, which by the (transfinite) Kleene fixed-point theorem exists and is the limit/union of the sequence :math:`X^0 = \emptyset, X^{n+1} = G(X^n), X^\delta = \bigcup_{\alpha < \delta} X^\alpha`.
+There are various extensions of the transitive closure to infinitary reduction, so the question arises as to which one to use. :cite:`kahrsInfinitaryRewritingClosure2013` discusses several and provides an ordering so that each is mostly a proper subset of the next (not sure about P* subset bi-infinite). Many of these use the monotonic closure operator \*. Specifically X* is the least fixedpoint of the function G defined as G(R) = X(R) union R, which by the (transfinite) Kleene fixed-point theorem exists and is the limit/union of the sequence :math:`X^0 = \emptyset, X^{n+1} = G(X^n), X^\delta = \bigcup_{\alpha < \delta} X^\alpha`.
 
 * S*, the monotonic closure of strongly converging reduction sequences, "strong" being a requirement that the depth of the redexes contracted in successive steps tends to infinity. S=S* for "compatible" TRSs, ones where t R u imply C[t] R C[u] for any context C, which all iTRSs satisfy.
 * W*=A=A*, the monotonic closure of weakly converging reduction sequences, and also the `adherent points <https://en.wikipedia.org/wiki/Adherent_point>`__ of reduction sequences in the metric space. Weak convergence by itself is not transitively closed, e.g. ``a = b; f x a = f (g x) a`` has ``f c a -ω> f (g (g (g ...))) a -> f (g (g (g ...))) b`` :cite:`dershowitzRwriteRewriteRewrite1991` :cite:`simonsenWeakConvergenceUniform2010`, hence the need for closure. By definition of adherent point, each w-reduct is either an accumulation point, i.e. a appears arbitrarily close infinitely often in a reduction sequence, or an isolated point which can be reached in a finite number of reductions.
@@ -90,27 +90,13 @@ There are various extensions of the transitive closure to infinitary reduction,
 * bi-infinite rewriting, defined in :cite:`endrullisCoinductiveFoundationsInfinitary2018` Section 6.2 as the greatest relation R such that R = the reflexive transitive closure of single-step rewriting union R lifted to apply to subterms.
 * T*: the monotonic closure of T, the topological closure of the reflexive transitive closure. T itself is not transitively closed, e.g. ``leq 0 x = true; leq (s x) (s y) = leq x y; inf = s inf`` has ``leq inf inf T leq (mu x. s x) (mu y. s y) T true`` (by topological closure of finite approximations of the S towers) but not ``leq inf inf T true`` (because the terms are of finite depth). Alternatively I have defined T* as the smallest relation M such that M is reflexively, transitively, and topologically closed and contains the single-step relation, which I think is equivalent.
 
-S* is the standard in the literature but doesn't have much going for it besides that. If there is a reduction that switches heads, ``a X = b (c X); b X = a (c X)``, then S* says there are no w-reductions. W* has ``a e -w> a (mu x. c x)`` and ``a e -w> b (mu x. c x)``. TRSs are in general nondeterministic, so the "strong" definition that requires a single limit to exist is too strong.
+S* is the standard in the literature but doesn't have much going for it besides that. If there is a reduction that switches heads, ``a X = b (c X); b X = a (c X)``, then S* says there are no w-reductions. W* has ``a e -w> a (mu x. c x)`` and ``a e -w> b (mu x. c x)``. TRSs are in general nondeterministic, so the "strongly converging" definition that requires a single limit to exist is too strong.
 
 For cycle condensation we would like to equate as many terms as possible to get large SCCs, and similarly a large reduction relation means there will be an escape from infinite regresses. As an example, with bi-infinite rewriting or T*, the hypercollapsing term ``mu x. C x`` with rule ``C x = x`` will reduce to every term (limit of approximations ``C^n f = f``), making it ambiguous, while with W* and P* the hypercollapsing term only reduces to itself hence is a condensed normal form. Similarly with ``C A = A`` where ``A`` is a constant, ``mu x. C x = A`` with bi-infinite/T* but W*/P* don't reduce at all. Bi-infinite and T* seem equally simple to formalize since they are both single fixed points, so it seems T* wins because it's larger.
 
-Also conditional rewriting can interact with infinite reduction and cause unwanted behavior with a weak closure. For example consider the system ``ds x y | x == y = e`` and reducing the infinite term ``G = ds G G`` (in :cite:`klopCombinatoryReductionSystems1980` this is achieved by the system ``G = a = c a; c x = ds x (c x)``). Since ``e`` is a normal form hence equal to itself, all finite terms defined by ``T = { x : x == e or x in ds T T }`` reduce to ``e``. So using a bi-infinite closure, ``G`` uniquely reduces to ``e``. But with a weak closure ``X = ds e X`` is a normal form and the system becomes nondeterministic. Similarly with ``dk x y | x == y = e x`` and ``G = dk G G``, we should get ``e (e (e ...))`` as the unique result, but with a weak closure we don't. Another tricky system is ``c x | x == c x = e; b = c b`` - the obvious reduction is ``b = mu x. c x = e``, but this system has a hidden circularity of the form ``mu x. c x = e`` if ``mu x. c x = e``.
+Also conditional rewriting can interact with infinite reduction and cause unwanted behavior with a weak closure. For example consider the system ``ds x y | x == y = e`` and reducing the infinite term ``G = ds G G`` (in :cite:`klopCombinatoryReductionSystems1980` this is achieved by the system ``G = a = c a; c x = ds x (c x)``). Since ``e`` is a normal form hence equal to itself, all finite terms defined by ``T = { x : x == e or x in ds T T }`` reduce to ``e``. So using a bi-infinite closure, ``G`` uniquely reduces to ``e``. But with a weak closure ``X = ds e X`` is a normal form and the system becomes nondeterministic. Similarly with ``dk x y | x == y = e x`` and ``G = dk G G``, we should get ``e (e (e ...))`` as the unique result, but with a weak closure we don't. Another tricky system is ``c x | x == c x = e; b = c b`` - the obvious reduction is ``b = mu x. c x = e``, but this system has a hidden circularity of the form ``mu x. c x = e`` if ``mu x. c x = e``. So again based on this we would like a bi-infinite or T* closure.
 
-The common notions of an ARS carry over to infinitary reductions: :cite:`endrullisInfinitaryTermRewriting2014`
-
-* transitive reduction: irreflexive kernel of reduction closure
-* normal form: irreducible term
-* strongly normalizing (terminating): every infinite reduction sequence has a limit
-* nonterminating reduction: infinite reduction sequence with no limit or that does not reduce to its limit
-* weakly normalizing (normalizing): every term has a reduction to a normal form
-* confluence: if t reduces to t1 and t2, then there is a common term s such that t1 and t2 reduce to s.
-* Church-Rosser: if t1 is equivalent to t2, then there is a common term s such that t1 and t2 reduce to s.
-* normal form property w.r.t. reduction:: if u reduces to t and s, and s is a normal form, then t reduces to s
-* normal form property: if t is equivalent to s and s is a normal form, then t reduces to s
-* unique normalization w.r.t. reduction: if t reduces to t1 and t2, and t1, t2 are normal forms, then t1=t2
-* unique normalization: if t1 is equivalent to t2, and t1, t2 are normal forms, then t1=t2
-
-However common theorems such as Newman's lemma do not, so it is not clear how useful these are.
+Overall I think T* is the simplest and easiest to understand, so that's what I picked.
 
 Meaningless terms
 =================
diff --git a/docs/Commentary/Types.rst b/docs/Commentary/Language/Types.rst
similarity index 93%
rename from docs/Commentary/Types.rst
rename to docs/Commentary/Language/Types.rst
index d436ac1..79e2773 100644
--- a/docs/Commentary/Types.rst
+++ b/docs/Commentary/Language/Types.rst
@@ -18,13 +18,19 @@ Are sets types?
 
 Academics seem to have gravitated to the syntactic definition because it's easier to publish meaningless gibberish. Similarly Haskell, Java, and C++ use a syntactic definition AFAICT. I don't think there are many languages that use representation as this would imply that all pairs of floats are the same type. The set of modes definition (Parnas et al.'s own) does not seem to have been adopted.
 
-Stroscot follows Julia in using the "value space without behavior" definition. (At least, Julia's documentation `says <https://docs.julialang.org/en/v1/devdocs/types/>`__ it follows the "set of values" definition) Castagna calls them "set-theoretic types with semantic subtyping". A related approach is :cite:`dolanAlgebraicSubtyping2016`, which uses syntactic labels forming a distributive lattice. Although distributive lattices are isomorphic to collections of sets, Dolan's approach IMO falls on the syntactic side.
+Steelman 2B I think adopts the "value space and behavior" definition. Stroscot follows Julia in using the "value space without behavior" definition. (At least, Julia's documentation `says <https://docs.julialang.org/en/v1/devdocs/types/>`__ it follows the "set of values" definition) This is because the definition of types as having "operations" breaks down once you have operations that take multiple types, like a ```concatenateNCopies : Int -> String -> String`` operation. Is this a String operation? an Int operation?
+
+Then there are subtypes, which Steelman 2B defines as "a subset of the elements of a type, that is characterized by further constraints." Stroscot follows Castagna's approach in having "set-theoretic types with semantic subtyping" - you can define any subset of a type (set) and that is also a type (set). A related approach is :cite:`dolanAlgebraicSubtyping2016`, which uses syntactic labels forming a distributive lattice. Although distributive lattices are isomorphic to collections of sets, Dolan's approach IMO falls on the syntactic side.
+
+Regarding representation, which Steelman simply defines as "implementation characteristics", there is a specification of how the memory layout of values works on the Memory Management page.
 
 Overall, while I would be justified in calling Stroscot's sets types, as it is one of the listed definitions, it's not perfectly consistent with the common usage today so it could invite a flamewar. It's easier to call them sets.
 
 Is Stroscot typed?
 ==================
 
+Steelman 3A: "The language shall be strongly typed. The type of each variable, array and record component, expression, function, and parameter shall be determinable during translation."
+
 By a "typed" programming language, someone generally means that the language has a type system or type discipline that ensures type safety. Wikipedia defines a type system as "a set of rules that assigns a property called a type to the various constructs of a computer program". This phrasing assumes each construct has exactly one "principal" type. But more complex type systems don't have this property. For example with a Haskell GADT ``data R where R : { unR : Int } -> R Int``, ``unR`` may have the type ``forall a. R a -> a`` or ``forall a. R a -> Int``, and these have no unifying type. Haskell just picks a type ``unR : R Int -> Int`` in this case. Really the mapping from expressions to types is many-to-many. Particularly, with Stroscot's definition of types as sets, any set containing the value is a valid type. For example, ``1 : Int``, but also ``1 : {1}`` and ``1 : Any``. So certainly Stroscot has some ambient notion of "the valid types for a value", but in Stroscot's case this is just set membership so it is a little vague in determining whether Stroscot has a type system. ChatGPT lists some features of type systems we can look for: type annotations, type errors, type compatibility, type conversion, type hierarchy, type safety.
 
 The purpose of type annotations is clear if we consult `Robert Harper <https://existentialtype.wordpress.com/2011/03/19/dynamic-languages-are-static-languages/>`__. Per him, a type system should allow "stating and enforcing the invariant that the value at a particular program point must be a certain type", in the form of a type declaration. Stroscot's set membership assertions have a specific syntax for type declarations and type restrictions, that accomplish exactly this purpose. There are also annotations for declaring the type of a value, for example ``int8 1`` is different from ``int32 1``.
@@ -55,6 +61,8 @@ So is Stroscot static? Yes - it supports all the static features, and more. The
 Is Stroscot dynamic?
 ====================
 
+Steelman 7D. The type of the result for each function must be specified in its declaration and shall be determinable during translation. The results of functions may be of any type. If a result is of a nonindirect array or record type then the number of its components must be determinable by the time of function call.
+
 Dynamic programming languages allow flexible and adaptable coding practices. But dynamic languages vary widely in their features. We can identify some common features using ChatGPT:
 
 * Introspection: A mechanism to determine the type of some arbitrary data and retrieve its attributes or representation at runtime. This includes normal values, modules, objects, and functions.
@@ -65,7 +73,7 @@ Dynamic programming languages allow flexible and adaptable coding practices. But
 * Metaprogramming, reflection and runtime modification: Writing macros (code that manipulates other code). Executing code with eval. Viewing, adding, and modifying methods and properties. Monkey patching. Module loading at runtime.
 * Good: As a rule of thumb, dynamic languages are good, all other things being equal.
 
-Stroscot has these features, so can be considered dynamic.
+Stroscot has these features, so can be considered dynamic. The "determinable type" of every function is something like ``Any -> Any``. Type specifications are separate from the actual "type" as specified by the functions dynamic behavior.
 
 Is Stroscot strong?
 ===================
diff --git a/docs/Commentary/Values.rst b/docs/Commentary/Language/Values.rst
similarity index 89%
rename from docs/Commentary/Values.rst
rename to docs/Commentary/Language/Values.rst
index bcb4d99..8b74a5c 100644
--- a/docs/Commentary/Values.rst
+++ b/docs/Commentary/Language/Values.rst
@@ -18,7 +18,7 @@ Values are immutable (as `Rich Hickey says <https://github.com/matthiasn/talk-tr
 Colors
 ======
 
-So each value has at least two parts (conceptually): there is the "human-readable" value, like for example the integer 1. Then there is the "color" or user annotated type, and that's generally a symbol like ``MyAmazingInteger64``. So when we talk about a value in Stroscot, we are talking about the human-readable value plus the color - ``MyAmazingInteger64 1``. Stroscot has no way to talk about the human-readable value alone, because it would be ambiguous; every value in Stroscot has a color. Would we be talking about the ``1`` stored as the standard library's color (written as just ``1``)? Or about a more specific library's colored ``1``? The question is somewhat pedantic, because we can generally easily convert between values of different colors, and most of the time this conversion is done automatically. But, for purposes of the language semantics, values of different colors are not "equal", because it is easy to distinguish the color.
+So each value has at least two parts (conceptually): there is the "human-readable" value, like for example the integer 1. Then there is the "color" or user annotated type, and that's generally a symbol like ``MyAmazingInteger64``. So when we talk about a value in Stroscot, we are talking about the human-readable value plus the color - ``MyAmazingInteger64 1``. Stroscot has no way to talk about the human-readable value alone, because it would be ambiguous; every value in Stroscot has a color. Would we be talking about the ``1`` stored as the standard library's color (written as just ``1``)? Or about a more specific library's colored ``1``? The question is somewhat pedantic, because we can generally easily convert between values of different colors, and most of the time this conversion is somewhat automatic. But, for purposes of the language semantics, values of different colors are not "equal", because it is easy to distinguish the color.
 
 Once we get into optimization, we may start using identical memory representations for values of different colors, erasing checks and conversions and so on, because checking the color at each point is slow. For example, SQLSafeString and String will most likely have the same memory representation (it's not like there's an amazingly efficient way of representing sanitized strings that's different from representing normal strings). But during optimization, we are already considering equivalence classes of programs rather than programs - simply by optimizing, we are improving the speed of the program, which is an observable property, and therefore we are changing the behavior of the program by optimizing. The reason this is allowed is because optimization considers programs with the same output value to be equivalent.
 
@@ -43,6 +43,16 @@ Terms subsume algebraic data types. Whereas an ADT value is a symbol applied to
 
 Term application also includes the other function syntax - keyword arguments, implicit arguments, and so on - so is more than just positional application.
 
+SCC
+===
+
+An SCC value is a strongly connected component of terms that reduce to each other but not anything else. For example ``\(\x. x)(\x. x)`` is an SCC because it reduces to itself and not anything else. It is a bit strange as a value though because there may be multiple ways of writing it; for example with the rewrite system ``a = b; b = a``, both ``a`` and ``b`` refer to the same SCC. Because of this strangeness SCCs are specially handled by the execution engine.
+
+Infinite term
+=============
+
+According to the infinite term reduction semantics, we may get an infinite term, defined as a Cauchy sequence of finite terms or simply a Cauchy sequence of terms. If there are no reduction of this infinite term then it is a value. For example with a definition ``x = [1,...x]`` we may get the infinite list ``x = [1,1,1,...]``. To avoid issues with nontermination, infinite terms also need special handling by the execution engine.
+
 Functions/Sets
 ==============
 
diff --git a/docs/Commentary/Verification.rst b/docs/Commentary/Language/Verification.rst
similarity index 95%
rename from docs/Commentary/Verification.rst
rename to docs/Commentary/Language/Verification.rst
index ab6e181..9fffbe0 100644
--- a/docs/Commentary/Verification.rst
+++ b/docs/Commentary/Language/Verification.rst
@@ -1,8 +1,13 @@
-.. _Verification:
-
 Verification
 ############
 
+.. epigraph::
+
+   1B. The language shall be designed to [...] maximize automatic detection of programming errors.
+   1E. There shall be no language restrictions that are not enforceable by translators.
+
+   -- Steelman
+
 .. epigraph::
 
    The attitude today is that you can write any sloppy piece of code and the compiler will run diagnostics. If it doesn’t spit out an error message, it must be done correctly.
@@ -21,7 +26,6 @@ Verification
 
    -- Stroscot
 
-
 Stroscot aims to be a practical programming language, but it also aims to provide strong guarantees about program behavior, for example that array accesses are not out of bounds. In most cases these can be ensured statically by the verification system.
 
 Verification is the process of verifying that a system satisfies a property. Static verification and symbolic execution is a natural extension of unit testing, and much more powerful. Building it into the language with a standardized API and UX will allow many amazingly robust programs to emerge.
@@ -40,11 +44,12 @@ To get around this there are various tricks:
 
 TLA+ can't check deep properties on 200KLOC, but can affordably verify them on 2KLOC. And meanwhile we can check "shallow" properties on arbitrarily large codebases without a state space explosion. There is a lot of room for optimization and clever design. Per :cite:`bergeronSystemsProgrammingLanguages1972` a 50% slowdown for additional error checking (relative to e.g. C) is completely manageable.
 
-Rice's theorem and `halting problem approximation results <https://en.wikipedia.org/wiki/Halting_problem#Approximations>`__ show that no algorithm can evaluate a nontrivial property with perfect accuracy; there is an upper limit to accuracy (below 1) that no algorithm can surpass consistently. But these results do not prevent evaluating a nontrivial property with possible outputs Yes/No/IDK to a reasonable level of accuracy on practical programs, which have structure not found in random programs. Still, the verifier is developed on a "commercially reasonable effort" basis, where it is a bug in the compiler if the analysis returns IDK on a program, but the bug will only be fixed given sufficient funding and evidence of the usefulness of the program (e.g. that it works in other languages). So verification will likely be an area of development forever. There is no panacea, but we can provide options for the programmer if the analysis fails:
+Rice's theorem and `halting problem approximation results <https://en.wikipedia.org/wiki/Halting_problem#Approximations>`__ show that no algorithm can evaluate a nontrivial property with perfect accuracy; there is an upper limit to accuracy (below 1) that no algorithm can surpass consistently. But these results do not prevent evaluating a nontrivial property with possible outputs Yes/No/IDK to a reasonable level of accuracy on practical programs, which have structure not found in random programs. Still, the verifier is developed on a "commercially reasonable effort" basis, where it is a bug in the compiler if the analysis returns IDK on a program, but the bug will only be fixed given sufficient funding and evidence of the usefulness of the program (e.g. that it works in other languages). So verification will likely be an area of development forever. There is no panacea, but we can provide options for the programmer if the analysis fails (returns IDK):
 
 * reject: reject programs for which the algorithm fails as wrong (like ``-Werror``)
 * defer: insert runtime checks if needed and throw an error when the property fails (like Haskell's ``-fdefer-type-errors``)
-* override: behave as if the algorithm gave answer A (could crash at runtime, but allows bypassing bugs). More complex is to make the verifier scriptable, so that tactics can be specified and the verifier does not have to search through exponentially large spaces.
+* override: behave as if the algorithm gave answer A (could crash at runtime, but allows bypassing bugs).
+* scripting: allow specifying tactics and proof sketches so that the verifier has hints for how to search through the state space.
 
 For definite bugs the options are similar, although the override option implies a somewhat foolish promise that "poison" inputs will be avoided, in exchange for a minimal speedup and the loss of safety.
 
@@ -53,7 +58,7 @@ The seL4 microkernel (8700 lines of C) has been successfully statically modeled
 Implementation
 ==============
 
-The implementation in Stroscot will be "from scratch" - custom SAT/SMT solver, custom state space explorer, etc. The main reasons are to avoid the overhead present in existing tools of translating to/from the constraint specification languages such as SMTLIB or specialized XML formats, and to allow Stroscot to be self-hosted. But the implementation will use the techniques from various existing implementations.
+The implementation in Stroscot will be "from scratch" - custom SAT/SMT solver, custom state space explorer, etc. The main reasons are to avoid the overhead present in existing tools of translating to/from the constraint specification languages such as SMTLIB or specialized XML formats, and to allow Stroscot to be self-hosted. But the implementation will use the techniques from various existing implementations such as Z3.
 
 The field of verification is quite large but is centralized around various subjects/competitions:
 
@@ -205,7 +210,16 @@ Exceptions
 
 The main reachability analysis figures out which exceptions a piece of code may throw. Top-level unhandled exceptions are reported as warnings.
 
-Assertions have a simple form ``assert expr`` that throws ``AssertionFailed``, equivalent to ``when expr (throw AssertionFailed)``. Java's complex form ``assert expr : exception`` that throws a specific ``exception`` on failure seems pointless - it's only a little less verbose than ``when expr (throw exception)``.
+Assertions
+~~~~~~~~~~
+
+10F. It shall be possible to include assertions in programs. If an assertion is false when encountered during execution, it shall raise an exception. [Note that assertions can be used to aid optimization and maintenance.]
+
+Assertions have a simple form ``assert expr`` that throws ``AssertionFailed``, equivalent to ``when expr (throw AssertionFailed)``. Java's complex form ``assert expr : exception`` that throws a specific ``exception`` on failure seems pointless - it's only a little less verbose than ``when expr (throw exception)``. Could be worth it though, throw it to the standard library to decide.
+
+!0F. It shall also be possible to include assertions, such as the expected frequency for selection of a conditional path, that cannot be verified.
+
+This part we can ignore, Stroscot verifies everything. The example given is more like a pragma for optimization.
 
 Dead code
 ~~~~~~~~~
diff --git a/docs/Commentary/Language/index.rst b/docs/Commentary/Language/index.rst
new file mode 100644
index 0000000..824c223
--- /dev/null
+++ b/docs/Commentary/Language/index.rst
@@ -0,0 +1,10 @@
+Language specification
+######################
+
+This section is focused on questions related to the specification of the language - why a particular concept was defined, why a feature is defined that way, and whether a concept is necessary for the core language.
+
+.. toctree::
+  :maxdepth: 2
+  :glob:
+
+  *
\ No newline at end of file
diff --git a/docs/Commentary/BuildSystem.rst b/docs/Commentary/Libraries/BuildSystem.rst
similarity index 100%
rename from docs/Commentary/BuildSystem.rst
rename to docs/Commentary/Libraries/BuildSystem.rst
diff --git a/docs/Commentary/Compiler-Library.rst b/docs/Commentary/Libraries/Compiler-Library.rst
similarity index 86%
rename from docs/Commentary/Compiler-Library.rst
rename to docs/Commentary/Libraries/Compiler-Library.rst
index c7179f4..bcf1466 100644
--- a/docs/Commentary/Compiler-Library.rst
+++ b/docs/Commentary/Libraries/Compiler-Library.rst
@@ -5,6 +5,8 @@ For the language to be useful it must have an implementation, and ideally a self
 
 It is particularly important to have regression tests from the beginning, to catch near-infinite loop bugs and so on.
 
+Steelman 1G says "The design of the language should strive for machine independence. It shall not dictate the characteristics of object machines or operating systems except to the extent that such characteristics are implied by the semantics of control structures and built-in operations. It shall attempt to avoid features whose semantics depend on characteristics of the object machine or of the object machine operating system." This is not a problem for the core language, which is dictated by mathematical and practical considerations and thus is already machine-independent, but the library is much more affected by these considerations.
+
 Synthesizing
 ============
 
@@ -126,11 +128,13 @@ Then there are the libraries suggested by ChatGPT, libraries for:
 Booleans
 ========
 
+Steelman 3-2C. "There shall be a predefined type for Boolean values."
+
 Booleans are complex. To summarize `Wikipedia <https://en.wikipedia.org/wiki/Boolean_data_type>`__:
 
 * Algol was the first to have an explicit boolean data type, in 1960
 * Fortran added a boolean data type after ALGOL came out
-* Languages with enumerated types use that for their boolean data types (Pascal, Haskell, etc.)
+* Languages with enumerated types use that for their boolean data types (Pascal, Haskell, Ada, etc.)
 * Python, SQL, JS, Lua, Pl/I, and Java don't have enum types but have special bool types that are kind of enum-like
 * Perl, Rexx, Lisp, Tcl, Ruby, Forth doesn't have booleans, they use various values of other types as true/false
 * C didn't have enumerations initially, so used integers. But in C89 added enumerations and in C99 a boolean data type. But there is an implicit conversion from booleans to integers. C++/Objective-C are similar.
@@ -147,14 +151,26 @@ But if one boolean type is great, what about more? In `YAML <https://yaml.org/ty
 
 Separate from the definition of the boolean type / set of boolean values is what is considered "falsy" and "truthy" in conditions. Per :cite:`pradelGoodBadUgly2015` it is acceptable and quite useful to coerce to bool in ``if-else``, ``!x``, ``x && y``, and ``x || y``. Probably also in rule guards like ``a | b = c``. Many languages have coercions like this. There should be one standard coercion function ``isFalsey`` or ``isTruthy`` or something used across all of these, for uniformity of behavior. Languages vary on what is considered truthy/falsy, this will have to be decided by vote or experiment or something.
 
+6D. There shall be infix control operations for short circuit conjunction and disjunction of the controlling Boolean expression in conditional and iterative control structures.
+
+In Haskell you can just implement these, ``True and True = True`` and so on.
+
 Numbers
 =======
 
 Mathematically, the definition of `number <https://en.wikipedia.org/wiki/Number#Main_classification>`__ mainly refers to natural numbers, integers, rationals, real numbers, and complex numbers (the "numeric tower"), but other mathematical structures like p-adics or the surreal numbers are also considered numbers.
 
+Steelman 3-1A. The language shall provide distinct numeric types for exact and for approximate computation. Numeric operations and assignment that would cause the most significant digits of numeric values to be truncated (e.g., when overflow occurs) shall constitute an exception situation.
+
+Steelman 3-1C. The range of each numeric variable must be specified in programs and shall be determined by the time of its allocation. Such specifications shall be interpreted as the minimum range to be implemented and as the maximum range needed by the application. Explicit conversion operations shall not be required between numeric ranges.
+
+A decimal number consists of a sign and a list of digits, a decimal marker, and more digits. A floating-point number is similar but adds a base and a separately specified exponent. A rational numbers is a ratio of two integers. A complex number is a real and an imaginary component, or a polar angle and magnitude, in both cases real numbers.
+
 Representation
 --------------
 
+3-3B. Range, precision, and scale specifications shall be required for each component of appropriate numeric type.
+
 There are various ways to represent these numbers. Naturals are generally represented as a list of digits in some base (a decimal). Integers are naturals with a sign. Rationals may be written as a (possibly improper) fraction of integers, a terminating or infinitely repeating decimal, a "mixed number" an integer and a proper fraction, or a floating point of a decimal times an exponent 1/2^n. For the complete fields such as reals and p-adics there are even more representations:
 
 * Cauchy sequence of rationals
@@ -169,6 +185,12 @@ Complex numbers have two main representations, rectangular (1+2i) and polar (sqr
 
 So far we have only considered the variety of mathematical forms. The representation on a computer also involves a certain amount of differentiation based on practicalities. There are arbitrary-precision bignums and symbolic representations that can represent almost all values, subject to memory and computability limits, which are great for those who don't care much about performance. But for reasons of efficiency, and also for faithfulness to standards etc. which specify a representation, many programs will want to use fixed-size types that restrict values to a certain range, precision, and bit representation, such as int8, uint16, or the IEEE floating point formats.
 
+Steelman 3-1D. The precision (of the mantissa) of each expression result and variable in approximate computations must be specified in programs, and shall be determinable during translation. Precision specifications shall be required for each such variable. Such specifications shall be interpreted as the minimum accuracy (not significance) to be implemented. Approximate results shall be implicitly rounded to the implemented precision. Explicit conversions shall not be required between precisions.
+
+3-1F. Integer and fixed point numbers shall be treated as exact numeric values. There shall be no implicit truncation or rounding in integer and fixed point computations.
+
+3-1G. The scale or step size (i.e., the minimal representable difference between values) of each fixed point variable must be specified in programs and be determinable during translation. Scales shall not be restricted to powers of two.
+
 So, how do we deal with this multitude of forms? Generally, programs are not representation-independent, and each algorithm or operation in a program will have a preferred representation that it works with for input and output, preferred for reasons of accuracy, speed, or convenience. We cannot reliably perform automatic conversion between formats, as they differ in ranges and so on; there will be unrepresentable value in one direction or the other, loss of precision in the case of floating-point, and the conversion itself adds nontrivial overhead. Thus, we must consider each representation of a mathematical value to be a distinct programmatic value. There are thus several sets relevant to, for example, the integers:
 
 * Int8, Int16, UInt16, etc.: the sets of integers representable in various fixed representations
@@ -182,6 +204,8 @@ So, how do we deal with this multitude of forms? Generally, programs are not rep
 Syntax
 ------
 
+Steelman 2G: "There shall be built-in decimal literals. There shall be no implicit truncation or rounding of integer and fixed point literals."
+
 Number syntax is mainly `Swift's <https://docs.swift.org/swift-book/ReferenceManual/LexicalStructure.html#grammar_numeric-literal>`__. There is the integer literal ``4211``, extended to the decimal ``12.11``. Different bases are provided, indicated with a prefix - decimal ``1000``, hexadecimal ``0x3e8``, octal ``0o1750``, binary ``0b1111101000``. Exponential notation ``1.23e+3`` may be either integer or rational. Positive exponents with decimal (e) / hexadecimal (p) / binary (b) are allowed. Also there is a sign. Numbers can also have a suffix interpreted as the format. This expand to a term that specifies the format by applying it, e.g.  ``123i8`` expands to ``int8 123``. Formats include IEE 754 float/double, signed and unsigned fixed bit-width integers, and fixed-point rationals. So the full syntax is sign, base, mantissa, exponent, format.
 
 Leadings 0's are significant - literals with leading zeros must be stored in a type that can hold the digits all replaced with their highest value, e.g. ``0001`` cannot be stored in a ``i8`` (type must be able to contain ``9999``). Parsing leading ``0`` as octal is widely acknowledged as a mistake and should not be done. On the other hand trailing 0's are not significant - the decimal point should never be the last character in numeric literals (e.g. 1. is invalid, and must be written as 1 or 1.0).
@@ -210,6 +234,12 @@ Rebol uses comma/period for decimal point so quote was a logical choice. There d
 Operations
 ----------
 
+Steelman 3-1B. There shall be built-in operations (i.e., functions) for conversion between the numeric types. There shall be operations for addition, subtraction, multiplication, division, negation, absolute value, and exponentiation to integer powers for each numeric type. There shall be built-in equality (i.e., equal and unequal) and ordering operations (i.e., less than, greater than, less than or equal, and greater than or equal) between elements of each numeric type. Numeric values shall be equal if and only if they have exactly the same abstract value.
+
+Steelman 3-1E. Approximate arithmetic will be implemented using the actual precisions, radix, and exponent range available in the object machine. There shall be built-in operations to access the actual precision, radix, and exponent range of the implementation.
+
+Steelman 3-1H. There shall be integer and fixed point operations for modulo and integer division and for conversion between values with different scales. All built-in and predefined operations for exact arithmetic shall apply between arbitrary scales. Additional operations between arbitrary scales shall be definable within programs.
+
 Considering the multitude of forms, and the fact that representations are often changed late in a project, it seems reasonable to expect that most code should be representation-agnostic. The library should support this by making the syntax "monotonous", in the sense of `Jef Raskin <https://en.wikipedia.org/wiki/The_Humane_Interface>`__, meaning that there should be only one common way to accomplish an operation. For example, addition should have one syntax, ``a+b``, but this syntax should work on numerous forms. This avoids a profusion of operators such as ``+.`` for addition of floating-point in OCaml which is just noisy and hard to remember. Messing with the basic PEMDAS operations is a recipe for errors.
 
 Internally, each exposed operation is implemented as overloading the symbol for various more specific "primitive" operations, ``(+) = lub [add_int8, add_int16, ...]``. The compiler will be able to use profiling data to observe the forms of the numbers involved and select the appropriate primitive operation, so it should always be possible to replace a direct use of the primitive ``add`` with the normal ``+`` operation without significantly affecting performance. But for expressiveness purposes, it does seem worth exposing the primitives. Conceptually, since the primitives don't overlap, each primitive ``add`` operation is the restriction of the overloaded ``(+)`` to the domain of the specific primitive, so even if we didn't expose the primitives we could define them ourselves as ``add_int8 = (+) : Int8 -> Int8 -> Int8`` and so on. It makes sense to avoid this convolutedness and simply expose the primitives directly - in one stroke, we avoid any potential optimization problems, and we also ensure that the domains of the primitives are only defined in one place (DRY). Of course, such primitives are quite low-level and most likely will only be needed during optimization, as a sanity check that the representation expected is the representation in use.
@@ -235,6 +265,8 @@ Suppose we are multiplying three matrices A, B, C. Since matrix multiplication i
 Strings
 =======
 
+A string is a list of bytes of a given length. Subtypes include C null-terminated strings and UTF-8 encoded strings. Also filenames and paths.
+
 Text types::
 
   Text = Text { bytes : ByteArray#, offset : Int, length : Int } -- sequence of bytes, integers are byte offsets
@@ -245,8 +277,32 @@ Interpolation and internationalization are two things that have to work together
 
 Conversions: https://profpatsch.de/notes/rust-string-conversions
 
-Filenames
-=========
+Steelman 2H: "There shall be a built-in facility for fixed length string literals. String literals shall be interpreted as one-dimensional character arrays."
+
+Steelman 3-2D: Character sets shall be definable as enumeration types. Character types may contain both printable and control characters. The ASCII character set shall be predefined.
+
+I agree with this insofar as that characters are a set. Specifically we can define a character as a Unicode grapheme cluster. But this definition has two problems:
+
+* Unicode is an evolving standard, so the set of characters may change over time. An enumeration is hard to change.
+* The set of grapheme clusters is actually infinite, you can go all Zalgo and stack arbitrarily many modifiers. So the set is not an "enumeration" at all in the sense of being a finite set.
+
+As such my preferred definition of characters is as the set of UTF-8 encoded strings containing exactly one grapheme cluster.
+
+Read/show
+---------
+
+8F. There shall be predefined operations to convert between the symbolic and internal representation of all types that have literal forms in the language (e.g., strings of digits to integers, or an enumeration element to its symbolic form). These conversion operations shall have the same semantics as those specified for literals in programs.
+
+These are like Haskell read/show, I think. It would be good to have a single implementation that works on all values, instead of having to manually "derive" the implementation for each type. There are some issues with overloading though, how to print qualified names and so on.
+
+8C. User level input shall be restricted to data whose record representations are known to the translator (i.e., data that is created and written entirely within the program or data whose representation is explicitly specified in the program).
+
+This is an interesting requirement that suggests as well a "generic read" we should also have a family of type-limited reads, for example ``read Int``, ``read Float``, etc., so that the program does not get a data type it doesn't expect.
+
+Files
+=====
+
+8B. The language shall specify (i.e., give calling format and general semantics) a recommended set of user level input-output operations. These shall include operations to create, delete, open, close, read, write, position, and interrogate both sequential and random access files and to alter the association between logical files and physical devices.
 
 There are different definitions of filenames on different platforms:
 
@@ -256,6 +312,8 @@ There are different definitions of filenames on different platforms:
 
 Taking union, we have that a filename is always a byte sequence. Taking intersection, we have that NFC-normalized sequences of Unicode codepoints excluding ``<>:"/\|?*`` and the Windows reserved names are 1-1 transformable to filenames on all platforms.
 
+I think C's file API is pretty good, although io_uring is also good for async I/O. Don't look at Swift, I tried and there is just too much gunk in Foundation to be usable. There is a `thread <https://forums.swift.org/t/read-text-file-line-by-line/28852/34>`__ where the simple advice is just "use C". Maybe Rust's file API is more Unicode-aware?
+
 Poison values
 =============
 
@@ -313,11 +371,6 @@ ISO/IEC 11404 has a classification of values:
 
 Even ignoring the fact that the multiple definitions are all slightly different, these distinctions are also a matter of definition: we can define a 32-bit integer as one of 2^32 symbols, hence primitive and atomic, or as a list of boolean values of length 32, hence generated and aggregate. It seems easiest to avoid going down this rabbit hole and simply make a big list of all the sets of values, without attempting to create such a broad classification of the sets.
 
-Dictionaries
-============
-
-Wikipedia calls these "associative arrays" and C++ and Haskell calls them maps. There is also the ISO/IEC 11404 "record" which only allows identifiers as keys and has a fixed key set. But dictionary seems to be the accepted term in the data structure textbooks, and it's about the right length as a word.
-
 Tables
 ======
 
@@ -326,6 +379,17 @@ Tables such as those found in SQL are bags of records that all have the same fie
 Typed collections
 =================
 
+Steelman 3-3A. "Composite types shall include arrays (i.e., composite data with indexable components of homogeneous types)."
+
+3-3B. For elements of composite types, the type of each component (i.e., field) must be explicitly specified in programs and determinable during translation. Components may be of any type (including array and record types).
+
+3-3C. A value accessing operation shall be automatically defined for each component of composite data elements. Assignment shall be automatically defined for components that have alterable values. A constructor operation (i.e., an operation that constructs an element of a type from its constituent parts) shall be automatically defined for each composite type. An assignable component may be used anywhere in a program that a variable of the component's type is permitted. There shall be no automatically defined equivalence operations between values of elements of a composite type.
+
+3-3D. Arrays that differ in number of dimensions or in component type shall be of different types. The range of subscript values for each dimension must be specified in programs and may be determinable at the time of array allocation. The range of each subscript value must be restricted to a contiguous sequence of integers or to a contiguous sequence from an enumeration type.
+
+3-3E. There shall be built-in operations for value access, assignment, and catenation of contiguous sections of one-dimensional arrays of the same component type. The results of such access and catenation operations may be used as actual input parameter.
+
+
 A straightforward collection implementation produces a heterogeneous collection that can contain anything. So for example a linked list ``mkList [x,...y] = Cons x (mkList y); mkList [] = Nil``. We can type these lists by a set that contains all the elements, in particular defining ``List t = Cons t (List t) | Nil``. The type of all lists is ``List Any`` We can infer a good type for a list value with ``contents (l : List Any) = { e | e elementOf l }; type (l : UList) = List (contents l)`` - we have ``forall (l : List t). contents l subset t`` so this is the lower bound / principal type.
 
 ::
@@ -434,61 +498,6 @@ They also have generator comprehensions and big operator syntax, but the descrip
 
 The problem with this design is you can accidentally store the ``next`` operation and re-use it. With ``next : Iterator -> Op (Done | Yield a)`` the similar pattern ``let y = next iter in { y; y}`` just results in calling ``next`` twice and does not corrupt the iterator state.
 
-Control structures
-==================
-
- https://www.ccs.neu.edu/home/shivers/papers/loop.pdf / https://www.youtube.com/watch?v=PCzNwWmQdb0
-
-
- see also Reference/Syntax, a lot of potential control structures
-
-Goto/Break/continue
-===================
-
-`Core <https://github.com/core-lang/core/issues/44>`__ proposes to drop break and continue due to implementation complexity and mental complexity. He argues that it is clearer to use an extra boolean variable and only slightly clunkier. Per the `structured program theorem <https://en.wikipedia.org/wiki/Structured_program_theorem#Implications_and_refinements>`__ it is possible to compute any computable function with three control structures, semicolon, if, and while (and no break/continue). There are drawbacks in that the theorem usually must introduce additional local variables and duplicate code. For example consider `this program <https://ecommons.cornell.edu/bitstream/handle/1813/34898/bbang.pdf?sequence=2>`__::
-
-  start = state0
-  state0 | a0 = halt
-         | a1 = p01; state1
-         | a2 = p02; state2
-  state1 | a1 = halt
-         | a0 = p10; state0
-         | a2 = p12; state2
-  state2 | a2 = halt
-         | a1 = p21; state1
-         | a2 = p20;
-
-A translation into structured programming loosely based on the paper::
-
-  state = mut 0
-  halt = mut false
-  while !halt
-    if state == 0 then
-      if α1 then
-         p01; state := 1
-      else if α2 then
-         p02; state := 2
-      else halt := true
-    else if state == 1 then
-      if α2 then
-        p12; state := 2
-      else if α0 then
-        p10; state := 0
-      else halt := true
-    else
-      assert (state == 2) //must be state 2
-      if α0 then
-         p20; state := 0
-      else if α1 then
-         p21; state := 1
-      else halt := true
-
-Notice this is longer than the original description using recursion, mainly due to the extra variables. S. Rao Kosaraju proved that with arbitrary-depth, multi-level breaks from loops it's possible to avoid adding additional variables in structured programming, but known algorithms still duplicate code. In common cases the duplication can be avoided by clever structuring though.
-
-Per https://hal.inria.fr/inria-00072269/document Table 5, the most common flow-affecting constructs in Java were (as a percentage of methods) return (65.5%), short-circuit operators (13.2%), single-level break (3.6%), single-level continue (0.3%), and labelled break/continue (0.13%). A `different comparison <https://sci-hub.se/10.1002/spe.2298>`__ (Table IV) finds that continue and goto are about equal in frequency in C, that synchronized and continue are about equal in frequency in Java, and break is about half as common as try/catch/throw in Java.
-
-In Stroscot, it's not too hard to provide break/continue/goto within the continuation-based I/O model, and many C programmers will expect these operators, so they are planned to be implemented. They will be provided as functions, rather than as keywords, so will be imported and not steal syntax by default.
-
 Work stealing task queues
 =========================
 
@@ -505,18 +514,11 @@ Properties
 
 Partial orders are good, no reason not to have them. The orders defined with posets should be usable dynamically. Similarly they should be in a set ``TotalOrder`` if appropriate. Similarly ``Commutative``, ``Associative`` for binary operators.
 
-Arrays
-======
-
-In Stroscot the only mutable thing is a reference. So mutable arrays could mean two things: a fixed-size immutable array containing mutable values, or a mutable variable storing an immutable array. The second seems more similar to Java's ArrayList or C++ std::vector so is probably what is meant.
-
-The key here for efficient performance is in-place (destructive) update, so that the array re-uses its storage instead of copying on every operation. There is a paper :cite:`hudakAggregateUpdateProblem1985` on how to do it for lazy programming - basically you perform reads eagerly, and delay array update operations as long as possible, until it is clear if you can do in-place update or will have to copy.
-
-https://aplwiki.com/wiki/Leading_axis_theory
-
 Conversion
 ==========
 
+Steelman 3B: "Explicit conversion operations shall be automatically defined between types that are characterized by the same logical properties."
+
 There is a function ``convert : (T : Set) -> Any -> T|Exception`` in a module in the core library. Conversion is intended to produce equivalent values, so these modified equivalence relation properties should hold:
 
 * Reflexive: ``convert T a = a``, if ``a : T``
@@ -529,7 +531,7 @@ Conversion is only a partial function, hence these properties may not hold due t
 
 Conversion for unions is often undefined, because if ``convert T2 (a : T) = b``, and ``a != b``, then by reflexivity we have ``convert (T|T2) a = a``.  and by assumption and reflexivity we have ``convert (T|T2) (convert T2 a) = convert (T|T2) b = b``, violating transitivity. Hence ``convert (T|T2)`` on at least one of ``a`` or ``b`` must be undefined.
 
-Also, it is generally too much work (quadratic) to define all conversions explicitly. Conversion thus relies on an A* search through the conversion graph for the minimum cost conversion. The conversion graph is specified via some functions:
+Also, it is generally too much work (quadratic) to define all conversions explicitly, so as Steelman suggests conversions are automatically derived from a hub-and-spoke skeleton graph of conversions. In particular, conversion relies on an A* search through the conversion graph for the minimum cost conversion. The conversion graph is specified via some functions:
 
 ::
 
@@ -587,6 +589,8 @@ The counter idiom ``x = (x | 0) + 1`` seems to be hardly used, probably not wort
 Equality and comparison
 =======================
 
+Tinman B2 The source language will have a built-in operation which can be used to compare any two data objects (regardless of type) for identity.
+
 The comparison function itself is discussed in Posets. Basically it is a single function ``comp : Any -> Any -> {LessThan,Equal,GreaterThan,Incomparable}`` satisfying the requirements of a "partial comparison operation".
 
 Loose comparison will perform a type conversion when comparing two things. In particular in JS it will convert objects to strings, booleans and strings to numbers, and numbers to bigints, and transitive chains of these. Loose comparison is considered a confusing mistake; equality should not do type conversion. Almost all JS programs do not use this feature, either via ``===`` or by avoiding cases that invoke conversion. :cite:`pradelGoodBadUgly2015`
@@ -648,35 +652,6 @@ Data structures
 
 Copy Python's, they've been optimized and should be as efficient as anything I'll write.
 
-List flattening
-===============
-
-Lists don't automatically flatten, e.g. ``[a,[b,c]] != [a,b,c]``. Instead you can use a flatten function in the standard library, ``flatten [a,[b,c]] = [a,b,c]``. MATLAB's justification for flattening is that ``[A B]`` is the concatenated matrix with ``A`` left of ``B`` and ``[A;B]`` the concatenation with ``A`` above ``B``. This seems hard to remember and infix operators ``A horcat B`` and ``A vertcat B`` are just as clear.
-
-List homomorphisms
-==================
-
-List concatenation is an associative binary operation, as such we can represent repeatedly applying an associative operation (a semigroup) as applying an operation to a (nonempty) list.
-
-::
-
-  combine op list = foldl1 op list
-  sum = combine (+)
-  product = combine (*)
-
-  sum [1,2,3]
-  product [2,3,4]
-
-If the empty list is a possibility we need a monoid, i.e. specifying an identity element for the operation
-
-::
-
-  combine monoid list = foldMap monoid.op monoid.identity list
-  sum = combine { op = (+), identity = 0 }
-  product = combine { op = (*), identity = 1 }
-
-This all works because the set of lists/nonempty lists under concatenation is isomorphic to the free monoid / free semigroup.
-
 Serialization
 =============
 
@@ -706,13 +681,10 @@ Haskell has function composition ``(.)`` and Julia has the "pipe" operator ``(|>
 
 According to `YSAGuide <https://github.com/jrevels/YASGuide#other-syntax-guidelines>`__ pipelines like ``a . b . c`` are bad style and one should instead use intermediate results, ``\x -> { a1 = a x; b1 = b a1; c1 = c b1; return b1 }``, except with better named variables than ``x,a1,b1,c1``. The reason given is that debugging composed functions in the REPL is hard and clutters stacktraces. This sounds like a debugger problem - function pipelines are shorter and easier to read.
 
-Values
-======
-
-These are derived values, so they don't belong on the Values page.
-
 Lists
------
+=====
+
+Steelman 3-3A. "It shall be possible to define types that are Cartesian products of other types."
 
 A list is a function ``Nat -> Value|EndOfList``, with the proviso that if ``list i = EndOfList`` then ``list (i+k) = EndOfList`` for all ``k>=0``. Basic list syntax is the usual syntactic sugar for list values.
 
@@ -721,27 +693,63 @@ A list is a function ``Nat -> Value|EndOfList``, with the proviso that if ``list
   [] // empty list
   arr = [a, b, c]
 
-Heterogeneous lists are possible, ``[1,"x",int32 3]``.
+Heterogeneous lists are possible, ``[1,"x",int32 3]``. Really these are homogeneous lists of element type ``Any``, but you can define a tuple type like ``(Int,String,Int32)`` which is a restriction. In Stroscot tuple is synonymous with list.
+
+List flattening
+---------------
+
+MATLAB has automatic list flattenning, e.g. ``[a,[b,c]] == [a,b,c]``. MATLAB's justification for flattening is that ``[A B]`` is the concatenated matrix with ``A`` left of ``B`` and ``[A;B]`` the concatenation with ``A`` above ``B``. This seems hard to remember and flattening infix operators ``A beside B`` and ``A atop B`` for matrices are just as clear. In Stroscot list construction does not automatically flatten and instead you can use a flatten function in the standard library, ``flatten [a,[b,c]] = [a,b,c]``.
+
+List homomorphisms
+------------------
+
+List concatenation is an associative binary operation, as such we can represent repeatedly applying an associative operation (a semigroup) as applying an operation to a (nonempty) list.
+
+::
+
+  combine op list = foldl1 op list
+  sum = combine (+)
+  product = combine (*)
 
-In Stroscot tuple is synonymous with list.
+  sum [1,2,3]
+  product [2,3,4]
 
+If the empty list is a possibility we need a monoid, i.e. specifying an identity element for the operation
+
+::
+
+  combine monoid list = foldMap monoid.op monoid.identity list
+  sum = combine { op = (+), identity = 0 }
+  product = combine { op = (*), identity = 1 }
+
+This all works because the set of lists/nonempty lists under concatenation is isomorphic to the free monoid / free semigroup.
 
 Arrays
 ------
 
+7H. The number of dimensions for formal array parameters must be specified in programs and shall be determinable during translation. Determination of the subscript range for formal array parameters may be delayed until invocation and may vary from call to call. Subscript ranges shall be accessible within function and procedure bodies without being passed as explicit parameters.
+
 An indexing scheme is a function ``IndexValue -> [0..length]``. An array is a term ``array scheme list``. A typed array also specifies an element type, ``typed_array type scheme list``. Tensors are arrays that use compound values as indexes. Sparse arrays use indexing schemes that map many indices to one value.
 
 As well as being constructed from a list and indexing scheme, arrays have a non-positional syntax ``{17..20 => 100, 11..16 & 1..6 => 0, 7..10 => 100}``, inspired by Ada, similar to record notation.
 
-Strings
--------
+Mutable arrays
+--------------
 
-A string is a list of bytes of a given length. Subtypes include C null-terminated strings, UTF-8 encoded strings, and characters (UTF-8 encoded strings containing exactly one grapheme cluster). Also filenames and paths.
+In Stroscot the only mutable thing is a reference. So mutable arrays could mean two things: a fixed-size immutable array containing mutable values, or a mutable variable storing an immutable array. The second seems more similar to Java's ArrayList or C++ std::vector so is probably what is meant.
+
+The key here for efficient performance is in-place (destructive) update, so that the array re-uses its storage instead of copying on every operation. There is a paper :cite:`hudakAggregateUpdateProblem1985` on how to do it for lazy programming - basically you perform reads eagerly, and delay array update operations as long as possible, until it is clear if you can do in-place update or will have to copy.
+
+https://aplwiki.com/wiki/Leading_axis_theory
 
 Bitvectors
 ----------
 
-A bitvector is the symbol ``bits`` applied to a list of bits, ``bits [1,0,1]``.
+3-4A. It shall be possible to define types whose elements are one-dimensional Boolean arrays represented in maximally packed form (i.e, whose elements are bit strings).
+
+3-4B. Set construction, membership (i.e., subscription), set equivalence and nonequivalence, and also complement, intersection, union, and symmetric difference (i.e., component-by-component negation, conjunction, inclusive disjunction, and exclusive disjunction respectively) operations shall be defined automatically for each bit string type.
+
+A bitvector is the symbol ``bits`` applied to a list of bits, ``bits [1,0,1]``. "Bitvector" is a bit clearer than "bit string" according to ChatGPT.
 
 Date/time
 =========
@@ -760,10 +768,27 @@ Various types of complete and incomplete date/time values. C.f. ISO 8601, `Wikip
 * partial date: a collection of time fields (including timescale) that has the semantics of replacing or overriding the fields of an instant with those of the partial date's
 * recurrence: RRule as documented in the iCalendar RFC. (https://dateutil.readthedocs.io/en/stable/rrule.html)
 
-Records
-=======
+Dictionaries
+============
+
+Wikipedia calls these "associative arrays" and C++ and Haskell calls them maps. There is also the ISO/IEC 11404 "record" which only allows identifiers as keys and has a fixed key set. But dictionary seems to be the accepted term in the data structure textbooks, and it's about the right length as a word.
+
+Steelman 3-3A. "Composite types shall include [...] records (i.e., composite data with labeled components of heterogeneous type).""
+
+3-3F. It shall be possible to declare constants and (unary) functions that may be thought of as record components and may be referenced using the same notation as for accessing record components. Assignment shall not be permitted to such components.
 
-A record is an ordered list of key-value pairs, ``record {a = 1, b = 2, c = 3}``.
+A dictionary is an ordered list of key-value pairs, ``dictionary {a = 1, b = 2, c = 3}``.
+
+Sets
+----
+
+Record specifications can be closed, listing all properties, or open, allowing other associations not mentioned. For example ``{a: "s", b: 2} isElementOf OpenRec [("b",Int)]``. The fields can be ordered or unordered. Some fields can be omitted, i.e. they are optional. This is different from a ``Maybe`` type because ``{a: Nothing}`` is distinct from ``{}``. This can be accomplished by writing ``Rec [("b",Int,Optional)]``.
+
+Rich Hickey seems to think values like ``("b",Int)`` are important and should have names like ``:b`` and ``:c``, so you can write ``{b: 2, c: 3} isElementOf Rec [:b,opt :c]``. I guess? It could be useful if you use the same fields a lot, but it doesn't save too much typing. The real optimization is defining collections of fields, so you can write ``Rec ([:b,:c] ++ standardfields)``.
+
+Clojure also has this thing about namespace qualification for records. XML had qualified names (namespaces), JSON doesn't. Everybody has switched from XML to JSON. So it seems like namespaces are overcomplicating the issue. Generally formats have a single domain of authority and don't need namespacing - the interpretation of a field is a property of the schema, not the value. This can be seen in the evolution of the ``<svg>`` element from
+
+If you do have user-defined fields and need some simple thing to avoid key collisions you can use long key names like "org.mycompany.thing". This relies on the simple assumption that non-namespaced property names won’t have "." in them. But unlike a namespace mechanism this doesn't view unqualified names as incomplete qualified names, rather it sees qualified names as over-specialized names. "Over" is because you can't access it like ``obj.org.mycompany.thing``, you have to write ``obj["org.mycompany.thing"]``.
 
 Maps
 ----
@@ -792,11 +817,6 @@ References
 
 References are a term ``ref r123``, where ``r123`` is a symbol. Most symbols are autogenerated inside the reference creation operation ``ref``, but there is no technical restriction on which symbols are allowed.
 
-Numbers
-=======
-
-A decimal number consists of a sign and a list of digits, a decimal marker, and more digits. A floating-point number is similar but adds a base and a separately specified exponent. A rational numbers is a ratio of two integers. A complex number is a real and an imaginary component, or a polar angle and magnitude, in both cases real numbers.
-
 Mathematical structures
 =======================
 
@@ -815,10 +835,20 @@ Piecewise/interpolated functions are terms that contain the data as a list.
 An optimization problem is a set (the search space, usually R^n or Z^n or so on with some constraints) together with an objective function mapping each possibility to a real number, that is either minimized or maximized.
 
 Intervals
-=========
+---------
 
 An interval has a start value, an end value, and two flags for whether each side is open or closed. There is also a "ball" representation as the middle and distance to each side. The values can be numbers but also other things like date/times.
 
+Finite-state machine
+--------------------
+
+A FSM is represented as some rewrite rules, each state/input rewrites to some new state.
+
+Logical propositions
+--------------------
+
+These are just formulas, terms and binders built using the logical connectives (and, or, not, forall, exists, etc.) With sequent calculus we also allow the turnstile as a symbol.
+
 Units
 =====
 
@@ -854,11 +884,6 @@ Regular expression
 
 A regular expression is a term with symbols for each formalism, like "or", concatenation, capture groups, quantifiers, the wildcard character, character classes, and so on.
 
-Finite-state machine
-====================
-
-A FSM is represented as some rewrite rules, each state/input rewrites to some new state.
-
 GIS
 ===
 
@@ -869,10 +894,6 @@ Quantum computing
 
 Similarly to how a classical computation maps a RAM of n bits to another RAM of n bits, a quantum computation (gate) over n qubits is a unitary matrix of size 2^n by 2^n. A more structured representation involves forming quantum circuits out of qubit wires and quantum gates. There are also measure and set operations to interact with classical computations.
 
-Logical propositions
-====================
-
-These are just formulas, terms and binders built using the logical connectives (and, or, not, forall, exists, etc.) With sequent calculus we also allow the turnstile as a symbol.
 
 Infinite values
 ===============
diff --git a/docs/Commentary/PackageManager.rst b/docs/Commentary/Libraries/PackageManager.rst
similarity index 100%
rename from docs/Commentary/PackageManager.rst
rename to docs/Commentary/Libraries/PackageManager.rst
diff --git a/docs/Commentary/Parsing.rst b/docs/Commentary/Libraries/Parsing.rst
similarity index 95%
rename from docs/Commentary/Parsing.rst
rename to docs/Commentary/Libraries/Parsing.rst
index 6601293..8801010 100644
--- a/docs/Commentary/Parsing.rst
+++ b/docs/Commentary/Libraries/Parsing.rst
@@ -3,6 +3,10 @@ Parsing
 
 Source code by itself is rather difficult to work with. To do anything useful, one would prefer to work with the AST or IR. Extensible syntax complicates the issue because there is no fixed grammar for the language. So we need a parser that can take arbitrary rules. And of course once we have a good parser, there is no harm in exposing it in the language, similar to Raku's inline `grammars <https://docs.raku.org/language/grammars.html>`__. `Per Vognsen <https://mastodon.social/@zwarich@hachyderm.io/109559009711883166>`__ says lexers (and consequently parsers) can be useful for searching through code and indexing it. And also, parser combinator libraries in general are pretty popular, so I'm sure people will write parsers for HTML, JSON, XML, and other common data formats.
 
+There is also Steelman 2B, "The language should have a simple, uniform, and easily parsed grammar and lexical structure." I interpret this requirement loosely as that the language shall have a parser, and this parser shall allow easy access to grammatical and lexical productions.
+
+Similarly, for Steelman 2E, "The user shall not be able to modify the source language syntax. In particular the user shall not be able to introduce new precedence rules or to define new syntactic forms.", we take the opposite: the user shall be able to modify syntax, introduce new precedence rules, and new syntactic forms. There is some justification under Tinman H2. My reply is as follows: the argument is predicated on a distinction between coining new words and moving to another natural language. But in programming, no such distinction exists: all languages can be roughly specified as context-free grammars. The distinction between XML and JSON is much less than that of English and French; in fact, there are xml2json tools which automatically and losslessly convert from one to the other, whereas English-French translations are still described as `imperfect <https://lptranslations.com/learn/how-accurate-is-google-translate/>`__ and subjective. Similarly with HTML, one has a mixture of CSS, JavaScript, and pseudo-XML that is even more of a mishmash than "Franglais", yet no one would claim this undermines the basic understanding of HTML. Rather, different modes of expression suited for different roles are mixed in a complementary manner, guided by standards and formal specifications that are stronger than the "commonalities" found in natural language. The fact that one can write in JS or CSS or so on within an HTML document does not change the basic character or understanding of HTML. The ability to modify the source language syntax simply increases the power of the language in the relevant application areas, and has no drawbacks besides that the new syntax must be learned before it is used.
+
 Scannerless generalized parsing
 ===============================
 
diff --git a/docs/Commentary/Programs.rst b/docs/Commentary/Libraries/Programs.rst
similarity index 100%
rename from docs/Commentary/Programs.rst
rename to docs/Commentary/Libraries/Programs.rst
diff --git a/docs/Commentary/Standard-Library.rst b/docs/Commentary/Libraries/Standard-Library.rst
similarity index 97%
rename from docs/Commentary/Standard-Library.rst
rename to docs/Commentary/Libraries/Standard-Library.rst
index 23a08d0..4dc7008 100644
--- a/docs/Commentary/Standard-Library.rst
+++ b/docs/Commentary/Libraries/Standard-Library.rst
@@ -1,6 +1,11 @@
 Standard library
 ################
 
+12A. There shall be an easily accessible library of generic definitions and separately translated units. All predefined definitions shall be in the library. Library entries may include those used as input-output packages, common pools of shared declarations, application oriented software packages, encapsulations, and machine configuration specifications. The library shall be structured to allow entries to be associated with particular applications, projects, and users.
+
+13B. There will be a standard definition of the language. Procedures will be established for standards control and for certification that translators meet the standard.
+
+
 What is a good standard library? Well, in Stroscot, the goal of the standard library is to simplify the question "For this task, how do I use a 3rd party library?". Obviously, writing your own code is fun and all. But life is short, and reinventing the wheel is a waste of man‑hours that can be better spent elsewhere.
 Still, there is a lot of due diligence necessary to use a 3rd party library. The standard library is the "opinionated and supported path" to build programs. (`Spotify <https://engineering.atspotify.com/2020/08/how-we-use-golden-paths-to-solve-fragmentation-in-our-software-ecosystem/>`__) The library team has already done the due diligence, identified the "golden path", and any caveats will be noted in the documentation.
 
diff --git a/docs/Commentary/Symbolic.rst b/docs/Commentary/Libraries/Symbolic.rst
similarity index 96%
rename from docs/Commentary/Symbolic.rst
rename to docs/Commentary/Libraries/Symbolic.rst
index c676876..e54203e 100644
--- a/docs/Commentary/Symbolic.rst
+++ b/docs/Commentary/Libraries/Symbolic.rst
@@ -15,7 +15,7 @@ Then we need algorithms and operations on these values. For the numbers, it is m
 Design
 ======
 
-Following Stroscot's philosophy of "minimal core", as little as possible of the implementation should be in the language core. We will need to represent symbolic expressions - the easiest thing is to make these syntax trees, so that algorithms are macros that operate on syntax. But of course specific algorithms will operate on restricted subsets of syntax trees.
+Following Stroscot's philosophy of "minimal core", as little as possible of the implementation should be in the language core. We will need to represent symbolic expressions - the easiest thing is to make these terms. So then specific algorithms will operate on restricted subsets of terms. Maybe we want some syntax sugar on top so there are also macros. But in terms of core features, it is just terms and sets and macros, nothing we haven't already seen. So symbolic computation is a library problem.
 
 Oscar Benjamin has been working on speeding up SymPy and has written about `how he would redesign SymPy <https://oscarbenjamin.github.io/blog/czi/post1.html>`__. It is certainly worth taking his design points into consideration. According to Benjamin, fast symbolic computation depends on implementing three things efficiently: robust numerical evaluation, symbolic manipulation, and computational algebra.
 
diff --git a/docs/Commentary/Syntax.rst b/docs/Commentary/Libraries/Syntax.rst
similarity index 83%
rename from docs/Commentary/Syntax.rst
rename to docs/Commentary/Libraries/Syntax.rst
index 76f0f22..bc7252a 100644
--- a/docs/Commentary/Syntax.rst
+++ b/docs/Commentary/Libraries/Syntax.rst
@@ -3,14 +3,14 @@ Syntax
 
 Stroscot's syntax should be clear, readable, friendly, and consistent. It should be easy to learn and use for those who are new to programming. The language should be accessible to a wide range of users, including those with disabilities. Furthermore the syntax should be flexible, extensible, and customizable, so that developers can tailor aspects of the language to their specific needs and preferences.
 
-For now S-expressions are probably sufficient, there is not a huge need for nice syntax. Syntax is one of the most boring aspects of building a language - there's not a lot you can innovate upon (except maybe using a large language model for parsing)
+S-expressions are sufficient for the core language, there is not a huge need for nice syntax. Syntax is one of the most boring aspects of building a language - there's not a lot you can innovate upon (except maybe using a large language model for parsing). So in Stroscot the syntax is defined as a compiler library.
 
 Customization
 =============
 
 Lots of old crappy languages are still in use, terrible design choices or not. Most people can get used to almost any syntax, and some will even come to like it, even if that syntax performs terribly vs other options in controlled studies. So in that sense, most PL syntax decisions do not really have a "bad" option - whatever syntax we end up on will be the syntax used, each individual choice will probably not impact language adoption that much, and even if there are a lot of complaints about the syntax, there probably will not be a consensus so it will stay unchanged for many years. Since the syntax doesn't really matter, it makes sense to make it pluggable and customizable - switch out the standard library, switch out the syntax. This allows supporting use cases like non-English coding and demonstrations of new/changed syntax. Most likely common uses will only be simple formatting customizations like indentation size, tabs vs spaces, line length, and layout vs braces, but once there is a syntax definition file it makes sense to allow putting more things in it. One specific issue is allowing per-checkout customization, so someone can write a git smudge/clean filter that runs the formatter on checkout/commit.
 
-On the other hand, some choice of syntax is necessary for writing the compiler, standard libraries, and documentation examples. Let's call this the "default syntax", since it makes sense to use the same syntax for all of these. Although the impact of individual choices may be minor, overall the default syntax is quite important - it defines the "feel" of the language and is probably the biggest factor of whether people try out the language or close the homepage. As a rough sketch, using numbers from https://blog.mozilla.org/metrics/2009/08/11/how-many-firefox-users-customize-their-browser/, about 1/3 of people will bother to use some amount of customization, while 2/3 will use the default settings. The 1/3 that customize obviously have their reasons for wanting the customization. But with a million users, 2/3 of a million is quite a lot. Syntax is decided by usage, so the default syntax will most likely become the most popular. But it could also be that as the language becomes popular, an alternative syntax emerges and takes over, ending up in an evolution of the default syntax. So none of the choices of the default syntax are final, but neither should they be arbitrary. It's similar to the discussion of IDEs, where I settled on VSCode - of course I'm not going to force a choice of IDE, but neither am I going to spend much effort on supporting other IDEs, at least until the language gets popular and contributors start working on IDE support themselves.
+On the other hand, some choice of syntax is necessary for writing the compiler, standard libraries, and documentation examples. Let's call this the "default syntax", since it makes sense to use the same syntax for all of these. Although the impact of individual choices may be minor, overall the default syntax is quite important - it defines the "feel" of the language and is probably the biggest factor of whether people try out the language or close the homepage. As a rough sketch, using numbers from `Firefox <https://blog.mozilla.org/metrics/2009/08/11/how-many-firefox-users-customize-their-browser/>`__, about 1/3 of people will bother to use some amount of customization, while 2/3 will use the default settings. The 1/3 that customize obviously have their reasons for wanting the customization. But with a million users, 2/3 of a million is quite a lot. Syntax is decided by usage, so the default syntax will most likely become the most popular. But it could also be that as the language becomes popular, an alternative syntax emerges and takes over, ending up in an evolution of the default syntax. So none of the choices of the default syntax are final, but neither should they be arbitrary. It's similar to the discussion of IDEs, where I settled on VSCode - of course I'm not going to force a choice of IDE, but neither am I going to spend much effort on supporting other IDEs, at least until the language gets popular and contributors start working on IDE support themselves.
 
 Some studies use a "Randomo" language which randomizes design choices. It would be useful to implement syntax randomization as part of the customization, so choices could be compared and tested. Basically this uses the per-checkout customization and stores code in the repository as a non-readable base64 or Lisp-like AST dump. Then people get an initial randomized style, but can customize it to their liking, and once we reach 100 users we collect styles/opinions and decide on a final choice as default.
 
@@ -20,9 +20,9 @@ Principles
 Syntax is essentially the UI of the language. As such UI guidelines apply. One such book is `The Humane Interface <https://en.wikipedia.org/wiki/The_Humane_Interface>`__, we can go through the principles:
 
 * Modelessness – in a programming context, this means the same program always produces the same output. It is a bit harder to interpret this principle in the context of language because language is naturally ambiguous with homophones and contextual meanings.
-* Monotony of design – there should be only one way to accomplish a certain atomic task in an application, as opposed to toolbar button, menu dropdown, and keyboard shortcut. This principle makes sense when applied to programming design, one syntax for a given language feature, but is perhaps a bit subtle in the definition of "atomic" - for example Haskell has ``let`` and ``where`` which are almost identical syntaxes, except one puts the main expression above the definitions while the other puts the main expression below.
+* Monotony of design – there should be only one way to accomplish a certain atomic task in an application, as opposed to toolbar button, menu dropdown, and keyboard shortcut. This principle makes sense when applied to programming design, one syntax for a given language feature, but is perhaps a bit subtle in the definition of "atomic" - for example Haskell has ``let`` and ``where`` which are almost identical syntaxes, except one puts the main expression above the definitions while the other puts the main expression below. A similar principle is Steelman 1E "The language should have uniform syntactic conventions and should not provide several notations for the same concept."
 * Universal undo/redo – in programming undo/redo is handled by a combination of the IDE and VCS; the IDE does a limited undo/redo for small changes, and the VCS saves changes persistently and allows full control. Certainly a universal system for programming would be a better UI but it would most likely require making a new IDE. That will have to wait. For debugging there is also reversible debugging that allows undo/redo of execution.
-* Elimination of warning screens – modern software applications often ask the user "are you sure?" before some potentially harmful action. In the context of programming, I think this is most similar to Rust's "unsafe" blocks or Kotlin's "experimental API opt-in" annotations. I agree with Raskin's argument that they are unhelpful because users tend to ignore them out of habit - in an IDE it just adds a monotonous "auto-fix all the stupid compiler issues" step.
+* Elimination of warning screens – modern software applications often ask the user "are you sure?" before some potentially harmful action. In the context of programming, I think this is most similar to Rust's "unsafe" blocks or Kotlin's "experimental API opt-in" annotations. I agree with Raskin's argument that they are unhelpful because users tend to ignore them out of habit - in an IDE it just adds a monotonous "auto-fix all the stupid compiler issues" step. Similarly Steelman 1E "No arbitrary restriction should be imposed on a language feature."
 * Universal use of text – this is discussed more in the `Text-based`_ section, certainly it seems that an icon-based language would be less usable
 
 Another set of principles is Constantine and Lockwood's principles of usage-centered design:
@@ -48,6 +48,15 @@ Defaults should match typical usage patterns.
 * Everything implicit: verbosity adds cognitive overhead and does not significantly reduce the amount of context the reader must remember
 * do not try to prevent developers from misusing features
 
+There is also Steelman 2B: "The language shall have free form syntax and should use familiar notations where such use does not conflict with other goals." As discussed next, I am not sure what familiar notations are usable besides maybe arithmetic.
+
+https://www.youtube.com/watch?v=f3rP3JRq7Mw 15:08-15:59 Avoid adding special cases in your syntax just to save a few bytes. It doesn't really matter how long your code is. For example, Lua has a syntax for writing down a literal key-value table, and between each key-value pair, you put either a comma or a semicolon. It's very simple and straightforward. One guy worked out that in many cases you could avoid having to put this comma or semicolon in. It wouldn't always work, but sometimes you could do it, and it could save 200-300 bytes in his program. But it didn't work all the time, and if you omitted the separator in the wrong place you might get something behaving completely differently. So in the end special cases just make code more difficult to write and understand.
+
+Steelman 4A. The parsing of correct expressions shall not depend on the types of their operands or on whether the types of the operands are built into the language.
+
+4F. The precedence levels (i.e., binding strengths) of all (prefix and infix) operators shall be specified in the language definition, shall not be alterable by the user, shall be few in number, and shall not depend on the types of the operands.
+
+4G. If present, explicit parentheses shall dictate the association of operands with operators. The language shall specify where explicit parentheses are required and shall attempt to minimize the psychological ambiguity in expressions. [Note that this might be accomplished by requiring explicit parentheses to resolve the operator-operand association whenever a nonassociative operator appears to the left of an operator of the same precedence at the least-binding precedence level of any subexpression.]
 
 Design procedure
 ================
@@ -88,9 +97,6 @@ Survey validity
 
 According to :cite:`tewFCS1LanguageIndependent2011` there are two important classes of validity. First is content: establishing the topics to be surveyed, and ensuring they have reasonable coverage of the subject area. IMO Stefik failed on this point - he just picked some basic Java-style keywords and constructs. Hence his research was quite limited - he didn't systematically go through every design choice possible in a programming language. :cite:`tewDevelopingValidatedAssessment2010` went through "widely adopted" introductory textbooks to select a set of CS1 topics, but ended up with more than 400 topics - they pruned them to 29 by limiting to concepts missing from at most one textbook. And with the focus on "wide adoption" they ended up including OO but not FP. The fact that Tew tested their exam only on Java, Matlab, and Python programmers is telling. Since Stroscot is a functional logic programming language, it will likely have some different fundamental concepts, and "wide adoption" is not necessarily the right inclusion criteria. ChatGPT might help here - it knows the basic concept clusters used in programming. The PL tasks should be a mixture of basic tasks common to all languages (operations, control, data structures) and Stroscot-specific tasks that showcase its unique features. But it is quite important to pick representative tasks first, identify their semantics, break them down into constructs and "how do I do this task", and only then apply the principle of "form follows function" to magic up a syntax. But there is also some requirement to differentiate the tasks and avoid overlap - putting two questions with no significant differences on the survey will likely end up with identical responses, or even worse, variant syntaxes for writing the same thing.
 
-15:08-15:59 Avoid adding special cases in your syntax just to save a few bytes. It doesn't really matter how long your code is. For example, Lua has a syntax for writing down a literal key-value table, and between each key-value pair, you put either a comma or a semicolon. It's very simple and straightforward. One guy worked out that in many cases you could avoid having to put this comma or semicolon in. It wouldn't always work, but sometimes you could do it, and it could save 200-300 bytes in his program. But it didn't work all the time, and if you omitted the separator in the wrong place you might get something behaving completely differently. So in the end special cases just make code more difficult to write and understand.
-
-
 A secondary form of validity is construct validity. This ensures that the survey is actually measuring what it is designed to measure, rather than something else. Without some procedures in place, it is easy to write bad questions. They can be unclear, resulting in participants answering the wrong question. They can be biased with "leading questions", resulting in canned answers rather than useful data. Unfortunately, some amount of priming is necessary, because novices do not know what the basic syntactic constructs of a language are. If you give novices a blank page and ask them to design a programming language, you will most likely get a simple language with glaring deficiencies. But similarly if you ask a novice "What syntax should be used for the if-else statement?" there is not much leeway in the question - most likely they will use the if and else keywords. So the wording of a question can be quite important. ChatGPT can probably help here a lot by devising neutral wording that avoids prompting with too much of the answer.
 
 For further verifying construct validity, there are various sanity checks to do:
@@ -128,7 +134,8 @@ The first question is the medium. Most code will be read on a computer screen. C
 
 But let's go through the findings.
 
-* Many sources mention in passing that consistency improves readability. In particular, reading is disrupted when unconventional layouts, colors, or fonts are used, or when different values of such attributes are mixed in unconventional ways. :cite:`beierTypefaceLegibilityDefining2009` This seems generally applicable. The IDE dictates fonts / spacing / coloring so we should ensure good IDEs are used for Stroscot.
+* Many sources mention in passing that consistency improves readability. In particular, reading is disrupted when unconventional layouts, colors, or fonts are used, or when different values of such attributes are mixed in unconventional ways. :cite:`beierTypefaceLegibilityDefining2009` This seems generally applicable. The IDE dictates fonts / spacing / coloring so we should ensure good IDEs are used for Stroscot. :cite:`walkerFontTuningReview2008` found it is 2% faster (15ms) to read a word if the preceding word is in the same font. Similarly there is also a short-term memory "font tuning" effect where one adapts to the glyphs of a specific font and slows down if a glyph from a random readable font is inserted. :cite:`beierHowDoesTypeface2013` found that this "font tuning" adaptation increases with exposure, so that reading the same font for 20 minutes improves the speed of reading paragraphs by 6.3% (0.43 paragraphs or 11wpm). Even after significant font exposure there is also a "familiarity" effect where common fonts such as Times and Helvetica do 9.4% better (0.64 paragraphs or 16wpm) than novel fonts. This could be due either to a quality effect, where common fonts are simply better, or to an exposure effect on longer timescales, e.g. that exposure to a font still increases reading speed in that font weeks or years later. The 20-minute exposure effect is the longest tested in the literature. Since long-term memory is anything longer than 30 seconds and observers can fairly accurately pick out a photo exposed for 3 seconds, even hours or days later, any training effect can probably be entirely explained as a function of the cumulative exposure/learning of the visual patterns of letters.
+* Even so, just letting people pick a style they like does not work. Wallace found that people's preferred font (as found by asking "which font is this paragraph more readable in?") has no correlation with their fastest font among 5 fonts, and that the best font among 16 tested varied randomly per individual. So an actual timed test is really the only option to obtain optimal performance. Wallace found a 14% improvement in WPM reading speed over the subjectively preferred font and 35% over the worst font. Unfortunately Wallace's study did not control for exposure so they may just be measuring noise. Exposure accounted for 30ish wpm in Beier's study and in Wallace's study the difference between fastest and preferred was 39wpm.
 * The general rule for spacing is that it should be significantly larger than its next-smallest spacing unit to clearly identify the boundary it marks.
 
   * Intra-letter spacing should be significantly smaller than inter-word spacing to allow identifying words.
@@ -185,7 +192,6 @@ Another study :cite:`buseMetricSoftwareReadability2008` identified factors for r
 They constructed several models using these factors, mainly a Bayesian classifier, all of which predicted average readability scores better than the original human raters. But the model is not public.
 
 An Ada guideline is "language constructs which do not express similar ideas should not look similar".
-
 Proportional fonts
 ------------------
 
@@ -209,7 +215,7 @@ The advantage of tablike spaces over elastic tabstops is that the underlying tex
 DSLs
 ----
 
-Stroscot aims to be a "pluggable" language, where you can write syntax, type checking, etc. for a DSL and have it embed seamlessly into the main language. This may be possible due to the fexpr semantics, which allows pattern-matching the AST of any expression, like ``javascript (1 + "abc" { 234 })``, or may need more work to also do character-level embedding or seamless integration of parsers / escape sequences.
+Stroscot aims to be a "pluggable" language, where you can write syntax, type checking, etc. for a DSL and have it embed seamlessly into the main language. This may be possible due to macros, which allows pattern-matching the AST of any expression, like ``javascript (1 + "abc" { 234 })``, or may need special handling in the parser to also do character-level embedding or seamless integration of parsers / escape sequences.
 
 Example DSLs:
 
@@ -252,7 +258,7 @@ Per `Coding Horror <https://blog.codinghorror.com/we-are-typists-first-programme
 * Esoteric Literary References: Include obscure literary references or quotes in archaic language, which are hard to understand and type without prior knowledge.
 * Incorporate ASCII Art or Unicode Art: This would require precise spacing and character placement, vastly increasing the difficulty.
 
-I have seen pretty much all of these in real-world programs, besides maybe the palindromes. You might think the difference between ``pos`` and ``position`` is trivial but it's not - one is an English word, one is programming jargon. Many programmers have written about how their keyboards screwed up their hands and their livelihoods with RSI and other such conditions.
+I have seen pretty much all of these in real-world programs, besides maybe the palindromes. Many programmers have written about how their keyboards screwed up their hands and their livelihoods with RSI and other such conditions.
 
 Familiarity
 ===========
@@ -289,7 +295,7 @@ Although both syntax and semantics are essential to a programming language, they
 
 The addition example makes it clear that it's pretty hard to design any sort of syntax without taking into account the semantics. You may want to emulate the syntax of another language (or mathematics, in this case), but it may not be possible. So Stroscot takes the approach of "form follows function": first design the semantics, then design the syntax based on that. Thus there is a clear dependency relationship, rather than them being separate.
 
-Of course, the distinction is more fuzzy in practice. Papers define a syntax along with their semantics, and for clarity, we use that syntax when describing the semantics in the commentary. Similarly, the semantics of extension methods were proposed in order to allow a certain syntax. But formally speaking, most of the commentary is only considering semantics. As discussed in :ref:`Syntax`, the final syntax will be decided upon by a survey describing the abstract semantics of each language construct, with all preconceived ideas for syntax removed from the descriptions as much as possible.
+Of course, the distinction is more fuzzy in practice. Papers define a syntax along with their semantics, and for clarity, we use that syntax when describing the semantics in the commentary. Similarly, the semantics of extension methods were proposed in order to allow a certain syntax. But formally speaking, the language commentary is only considering semantics. The syntax will be decided upon by a survey describing the abstract semantics of each language construct, with all preconceived ideas for syntax removed from the descriptions as much as possible.
 
 Another question is where to draw the line of syntactic sugar vs. language feature. Per `Wikipedia <https://en.wikipedia.org/wiki/Syntactic_sugar>`__, syntactic sugar is a shorthand for an operation canonically expressed in a more verbose form. I would say that syntactic sugar can be clearly identified by being able to write the translation as an unconditional function or macro, like the parser combinator ``some x = x <|> some x`` or the for loop:
 
@@ -335,7 +341,7 @@ People aren't omniglots, so using multiple languages will cause library fragment
 
 Per `Wikipedia <https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers>`__ English has the most speakers, 1.452 billion, while the next, Standard Chinese, has 1.118 billion, and the next (Hindi) less than half English. If we count "second language" liberally, English is as high as 2 billion while Standard Chinese is only 1.5 billion, so the gap only increases slightly. And calculating growth rates from `2021 <https://en.wikipedia.org/w/index.php?title=List_of_languages_by_total_number_of_speakers&direction=prev&oldid=1073408213>`__ and earlier, English increased by 7.7%-9.8%/year while Chinese has remained mostly steady at -0.1% to 3.3%/year. Per `this WP page <https://en.wikipedia.org/wiki/Languages_used_on_the_Internet>`__ English websites are 61.1% English, 1.7% Chinese, while internet users are 25.9% English, 19.4% Chinese. The number of Chinese websites is probably skewed low because most Chinese content is on social sites rather than independent sites, and the firewall makes it hard to index. Still though, across all of these statistics, there is a clear pattern of English being first.
 
-Choosing Standard Chinese also has political problems since the speakers are mainly "native" speakers in China that have been indictrinated via the CCP systematically targeting ethnic minorities and forcing them to learn Standard Chinese in place of their original dialect. In contrast English is mainly a second language - its speakers are spread across many countries, and for the most part learn it as a course in school supplemented with additional voluntary self-education.
+Choosing Standard Chinese also has political problems since the speakers are mainly "native" speakers in China that have been indoctrinated via the CCP systematically targeting ethnic minorities and forcing them to learn Standard Chinese in place of their original dialect. In contrast English is mainly a second language - its speakers are spread across many countries, and for the most part learn it as a course in school supplemented with additional voluntary self-education.
 
 Also Chinese is `just plain hard <http://pinyin.info/readings/texts/moser.html>`__ to learn and remember. Per that article it takes 7-8 years to learn 3000 Chinese characters but half that time to learn a comparable number of French or Spanish words. Then there is the `character amnesia <https://en.wikipedia.org/wiki/Character_amnesia>`__ problem where people can read the character just fine but forget how to write it by hand, only remembering the pinyin Latin-based transcription.
 
@@ -581,6 +587,28 @@ Language syntax does not generally allow bidi overrides, but they can show up in
 
 The solution presented in the paper is to ban unterminated bidi override characters within string literals and comments. This prevents reordering across string and comment boundaries, but poses no restrictions on well-terminated uses of bidi reordering. There are more restrictive solutions like completely banning bidi overrides, but the paper's solution is sufficient to prevent the attack, so seems better.
 
+Reduced representation
+----------------------
+
+Per Steelman 2A "The full set of character graphics that may be used in source programs shall be given in the language definition." We can already state this is the set of Unicode characters, pretty much.
+
+Steelman 2A has this idea of a reduced representation. "Every source program shall also have a representation that uses only the following 55 character subset of the ASCII graphics: [...]  Each additional graphic (i.e., one in the full set but not in the 55 character set) may be replaced by a sequence of (one or more) characters from the 55 character set without altering the semantics of the program. The replacement sequence shall be specified in the language definition." 55 is too small, it doesn't even include lowercase letters. But there remains the idea of using a small number of characters found on every keyboard. Per ChatGPT this list is::
+
+  A-Z
+  a-z
+  0-9
+  space
+  .,;:
+  ()[]{}
+  +-*/
+  =_!?#&%^|\~<>
+  "'
+  `@$
+
+That's a total of 26+26+10+1+4+6+4+13+2+3=95 characters.
+
+So then we need some sort of compatibility syntax so every program / module using Unicode can also be expressed / interacted with using this reduced character set. I am not sure how to do this in a readable manner; there are short aliases defined for emoji, and HTML entities such as lceil, but they don't cover the whole Unicode space. So for example the only way to input 〖 seems to be ``\u3016`` which of course gives no advice to the reader, or else the full Unicode name "Left White Lenticular Bracket" which is a bit long to use. The full name does have the advantage of being ASCII and so on, so maybe it could work as an interim solution. Perhaps ChatGPT or another language model can create decent aliases for all of Unicode.
+
 Natural language
 ================
 
@@ -675,22 +703,12 @@ There is a general convention for the standard library to use lowercase for pote
 Recursive definitions
 ---------------------
 
-We want to support mutually recursive definitions, like so:
+We want to support recursive definitions, like so:
 
 ::
 
   a = 1 : a
 
-  b = 1 : c
-  c = 1 : b
-
-And also sequential execution, like so:
-
-::
-
-  a = openFile "a.txt"
-  b = openFile "b.txt"
-
 And also shadowing variables, like so:
 
 ::
@@ -699,16 +717,34 @@ And also shadowing variables, like so:
   -- interpreted as
   a_new = a_old + 1
 
-In the recursive version ``c`` can be in scope in the body of ``b`` even though it is defined later. Presumably it isn't in scope in the sequential version.
 
-In the recursive version ``a`` is in scope in its own body. In the shadowing version ``a`` is not.
-
-Resolving this probably means a special syntax for something. Choices:
-* ``a <- openFile "a.txt"`` for sequenced actions
-* ``a <- a + 1`` for shadowing
+It is a bit tricky as we could also interpret ``a = 1 : a`` as a shadowing definition ``a_new = 1 : a_old``. The clever solution is to say it shadows if the variable is already in scope, otherwise it is recursive, but I think it is too clever and will cause confusion. So resolving this probably means a special syntax for something.  Choices:
+* ``new a = a + 1`` for shadowing
 * ``rec { a = 1 : a }`` for recursive definitions
 
-Generally sequential blocks do not use recursion. But recursion is used all the time at the module/function level.
+The ``rec`` I think is too noisy as we would similarly need it for mutually recursive variables::
+
+  rec {
+    b = 1 : c
+    c = 1 : b
+  }
+
+In the recursive version ``c`` can be in scope in the body of ``b`` even though it is defined later. It is just a weird syntax as it is not obvious that b/c are in scope outside the block. OTOH, per the discussion in State, MonadFix is only applicable in certain circumstances so it would make sense to have a noisier syntax so that the error messages are clearer. Also it alerts that ``rec`` is expensive, like ``rec {x = x * 2}`` will solve algebraically to get ``x=0`` rather than just doing the least fixed point.
+
+OTOH recursive functions are really convenient, it would be a pain if ``fact x = if x==0 then 1 else x * fact (x-1)`` needed an extra keyword like ``rec``.
+
+Meanwhile shadowing is banned in Coffeescript, Zig, and Elm, and frowned upon in `C++ <https://rules.sonarsource.com/cpp/rspec-1117/>`__. In `Rust <https://doc.rust-lang.org/rust-by-example/variable_bindings/scope.html>`__ it is common but the assignment syntax uses ``let``. I think requiring a keyword is fine, it is pretty uncommon.
+
+Then there is sequential execution, with the monad stuff, like
+
+::
+
+  a = openFile "a.txt"
+  b = openFile "b.txt"
+
+Here it would be an error if the operation for ``a`` tried to refer to ``b`` (because there is no MonadFix continuation instance).
+
+There is also the question of how module declaration work - technically these should be assignments somehow.
 
 Type declarations
 =================
@@ -741,6 +777,8 @@ Comments allow writing documentation inline with the code. This speeds up develo
 
 Tool support can be incomplete because there is a lot of freedom in comments. People can put comments in random places and they can be attached to whatever and indented strangely. With such freedom the reformatter will likely mangle some comments, but probably people will just learn not to do that.
 
+The compiler should always ignore comments; they should have no effect on the semantics of the program.
+
 Shebangs
 --------
 
@@ -788,16 +826,13 @@ Using two characters to start a comment helps prevent the accidental starting of
 
 Multiline block comments have the issue of forgetting the end terminator and matching some other end terminator. Some languages only have EOL comments, presumably to avoid this problem. Nesting solves this because there will be an unterminated comment. Similarly forbidding block start indicators from appearing in a block comment will work. The compiler can also check each line and see if it looks like code, although this prevents commenting out code.
 
-Identifiers
-===========
-
-Whitespace in identifiers... this doesn't work well with Haskell syntax. With whitespace ``do something = ...``` would define the identifier ``do something``, but in Haskell it's a clause ``do _`` that binds ``something``.
+Linebreaks
+==========
 
-OTOH using a string works fine: ``"do something" = ...``
+Steelman 2D has this intruiging requirement "Lexical units (i.e., identifiers, reserved words, single and multicharacter symbols, numeric and string literals, and comments) may not cross line boundaries of a source program." This is in contrast to other languages, where backslashes can be used to continue lines.
 
-You could also make something an atom, then you can write ``do something`` in code but the clause definition is ``do ^something = ...``. The semantics are similar to a single identifier but different enough that I don't think it counts.
+Certainly the days of 80-character terminals are gone, and most editors support soft wrapping. Also for comments, it seems pretty stupid to not allow multiline comments. Tinman H5 clarifies that the requirement has almost no effect as a lexical unit may be composed from multiple sub-units spanning multiple lines.
 
-The alternatives are snake case, camel case, kebab-case, and so on. I think all should be allowed.
 
 Indentation
 ===========
@@ -826,7 +861,7 @@ Lattner's opinion seems representative; on that Twitter thread and in the Google
 Layout
 ======
 
-Indentation-sensitivity (IS) like Python and Haskell seems great. Drawing from `Reddit <https://www.reddit.com/r/ProgrammingLanguages/comments/uo0nq7/end_keywords_vs_pythonstyle_blocks_for_beginners/>`__, the advantages:
+The arrangement of characters in the document is primarily to assist the human reader. However, as Python and Haskell have shown, making the interpretation of the program indentation-sensitive (IS) can avoid mismatches between the compiler and the reader. A brace format, in constrast, can lead to confusing situations where the compiler interprets a missing brace unexpectedly. Drawing from `Reddit <https://www.reddit.com/r/ProgrammingLanguages/comments/uo0nq7/end_keywords_vs_pythonstyle_blocks_for_beginners/>`__, the advantages:
 
 * IS requires less typing. All modern languages are presented with indentation, so IS is just omitting the hard-to-type curly braces or begin-end markers.
 * IS avoids the issue of braces mismatching indentation. In "An Empirical Investigation into Programming Language Syntax", misplaced end tokens were the most common error. It also avoids confusion or arguments about where to put the braces - `WP <https://en.wikipedia.org/wiki/Indentation_style>`__ lists 8 different styles.
@@ -889,8 +924,47 @@ because renaming ``unstable`` will require reindenting the rest of the expressio
 
 
 
+The most obvious instance of layout is the module declaration list, but blocks, lists, and records can start layout as well. For readability, clauses may span multiple lines, so some way of distingishing the start / end of clauses must be defined. This amounts to adding braces and semicolons so as to make it layout-insensitive. The inserted braces are "virtual": they don't grammatically match with braces explicitly written in the file, but are otherwise semantically identical.
 
+::
 
+  assertEqual
+    {
+      a
+        b
+        c
+      d
+    }
+    { a {b; c}; d}
+
+Behavior of a new line depends on its indentation level, relative to the indentation of the previous line. Indentation level is taken to be the sequence of whitespace characters related by "is prefix of", so "space, tab, space" is less than "space, tab, space, em space", more than "space, tab", and incomparable to "tab, space, space". Incomparable levels are an error.
+
+* If it is indented more, it's a sequence given as an argument to the previous line, so a virtual open brace is inserted
+* If it is at the same level, another item in the sequence, so a (virtual) semicolon is inserted
+* If there is a line at lower indentation (or EOF), the sequence is ended as it's a new declaration (`off-side rule <https://en.wikipedia.org/wiki/Off-side_rule>`__). A virtual close brace is inserted at the start of the line.
+
+Layout handling is complicated by the presence of grammar rules without layout that allow free choice of indentation, e.g.
+
+::
+
+  assertEqual
+    a
+      + b
+      + c
+    a {+ b; + c}
+    a + (b + c)
+
+It should be possible to handle these with a fixup phase.
+
+Also, closed operators (e.g. parentheses) inhibit layout; this amounts to skipping whitespace layout when inside an explicit delimiter pair. But of course constructs inside the delimiter pair can start another layout. Finally for constructs that usually use layout we still want to parse 1-line things without braces:
+
+::
+
+  assertEqual
+    let a = b in c
+    let { a = b } in c
+
+Steelman 2D says "All key word forms that contain declarations or statements shall be bracketed (i.e., shall have a closing as well as an opening key word)." This was written before indentation sensitive languages were a thing, so can be discounted.
 
 
 
@@ -937,8 +1011,8 @@ Another option for blind people is the Braille display, but it is expensive and
 `emacspeak <http://tvraman.github.io/emacspeak/manual/emacspeak_002dpython.html>`__ has speech-enabled python-mode and `per ML thread <https://groups.google.com/g/comp.lang.python/c/Dm-qTzO8Db8?hl=en#3216b7a02047873a>`__ reads things like "closes block <block's opening line>" on dedent. But it seems like it is hard to install and not really that popular.
 
 
-Braces and brackets
-===================
+Delimiters
+==========
 
 Haskell uses parentheses for most grouping, ``{}`` for replacing whitespace with explicit syntax, ``[]`` for lists, and has no special meaning for angle brackets.
 
@@ -948,11 +1022,29 @@ Seems a bit weird, he cites Python as an example but Python still uses list lite
 
 Julia doesn't require parens around conditions in ``if`` and ``while``, ``if a == b`` instead of ``if (a == b)``.
 
-Ada uses round brackets for array component references. use round rather than square brackets "The advantage of  this is a uniform notation within an expression for reference to array components and for function calls. The same argument of uniform referents justifies using the same syntax for component selection of records."
+Ada uses round brackets for array component references. "The advantage of this is a uniform notation within an expression for reference to array components and for function calls. The same argument of uniform referents justifies using the same syntax for component selection of records."
+
+For brevity, trailing delimiters can be omitted:
+
+::
+
+  assertEqual
+    3+1/(7+1/(15+1/1
+    355/113
+
+This goes directly against Steelman 2D "Programs may not contain unmatched brackets of any kind", but Tinman H8 admits that it makes programs easier to write; it's a useful convenience for saving writing extra parentheses. To address concerns about correctness, you can set Stroscot to warn/error/autofix unmatched delimiters. This is one of the innovations Ada doesn't have, the ability to have multiple language dialects under a flag.
+
+6B. The language shall not impose arbitrary restrictions on programming style, such as the choice between statement terminators and statement separators, unless the restriction makes programming errors less likely.
+
+According to the Green rationale, a study found that terminators were less error-prone than separators.
 
 Function syntax
 ===============
 
+Steelamn 7A. Functions (which return values to expressions) and procedures (which can be called as statements) shall be definable in programs. Functions or procedures that differ in the number or types of their parameters may be denoted by the same identifier or operator (i.e., overloading shall be permitted). [Note that redefinition, as opposed to overloading, of an existing function or procedure is often error prone.]
+
+Steelman 5G. The language shall distinguish between open scopes (i.e., those that are automatically included in the scope of more globally declared variables) and closed scopes (i.e., those in which nonlocal variables must be explicitly Imported). Bodies of functions, procedures, and processes shall be closed scopes.
+
 Stroscot has first-class functions with lexically scoped name binding.
 
 Lambdas are defined using whatever syntax. The ``\x.y`` style is closest to the mathematical notation (barring Unicode), Haskell uses ``\x -> y``, Cliff likes ``{x => y}``.
@@ -1124,6 +1216,13 @@ The condition can be split between a common discriminator and individual cases.
 Assignment
 ==========
 
+Steelman 5A. It shall be possible to declare constants of any type. Such constants shall include both those whose values-are determined during translation and those whose value cannot be determined until allocation. Programs may not assign to constants.
+
+Steelman 5D. Procedures, functions, types, labels, exception situations, and statements shall not be assignable to variables, be computable as values of expressions, or be usable as nongeneric parameters to procedures or functions.
+
+Stroscot does allow sets to be assigned to variables, also exceptions and labels and statements (blocks). Procedures and functions are a bit strange; you can store the symbol of the procedure/function, and you can store a lambda, but the rewrite rule itself is stored in a module and you can't really access it alone except as an AST.
+
+
 `Discussion <https://craftofcoding.wordpress.com/2021/02/19/evolution-of-the-assignment-operator/>`__. Stroscot's assignment syntax is complicated because I want separate initialization (declarative assignment) and reassignment (mutating assignment).
 
 .. list-table:: Comparison
@@ -1215,10 +1314,14 @@ would have an unused return value. Maybe this value could be marked as optional
 Conditionals
 ============
 
+Steelman 6C. There shall be conditional control structures that permit selection among alternative control paths. The selected path may depend on the value of a Boolean expression, on a computed choice among labeled alternatives, or on the true condition in a set of conditions. The language shall define the control action for all values of the discriminating condition that are not specified by the program. The user may supply a single control path to be used when no other path is selected. Only the selected branch shall be compiled when the discriminating condition is a translation time expression.
+
 There is a "unified" syntax by `Simon <https://github.com/soc/soc.me/tree/main/_languages>`__, also a very similar paper on the "ultimate" syntax :cite:`parreauxUltimateConditionalSyntax`. Interesting idea, but still has to be tested real-world somehow.
 
 Per the internet, Ruby's ``unless-else`` is unintuitive. Only support ``if-else`` and ``unless`` without the else. Also ``if not`` is a possible replacement for ``unless``.
 
+Per :cite:`ichbiahRationaleDesignADA1979` there is the `dangling else <https://en.wikipedia.org/wiki/Dangling_else>`__ ambiguity. I like Haskell's solution, there is a separate keyword for if-then, ``when () {}``, and if can only be used as ``if-then-else``. Else-if chains are pretty much necessary.
+
 Tuples and records
 ==================
 
@@ -1236,7 +1339,7 @@ Precedence
 
 Julia has juxtaposed multiplication. `Jump <https://jump.dev/JuMP.jl/dev/developers/style/#Juxtaposed-multiplication>`__ recommends limiting to cases where RHS is a symbol, like ``2 pi``.
 
-`This post <https://ericlippert.com/2020/02/27/hundred-year-mistakes/>`__ describes a mistake in C-style precedence: it should be ``&&, ==, &`` but is instead ``&&, & ==``, causing a footgun. "Swift, Go, Ruby and Python get it right."
+`This post <https://ericlippert.com/2020/02/27/hundred-year-mistakes/>`__ describes a mistake in C-style precedence: it should be ``&&, ==, &`` but is instead ``&&, &, ==``, causing a footgun. "Swift, Go, Ruby and Python get it right."
 
 Per Tremblay, Experiments in language design (Gannon, 1975) have indicated that APL's "natural" right-to-left rule is in fact harder to understand than simple priority rules. Gannon, J. D.: "Language Design to Enhance Programming Reliability," Report CSRG-47,Computer Systems Research Group, University of Toronto, Toronto, Canada, January 1975.
 
@@ -1268,7 +1371,7 @@ For example, reversing a list::
   % [2,1]
 
 
-In practice Prolog syntax is pretty bad; programs are heavy on meaningless intermediate variables such as ``R`` in the above.
+In practice Prolog syntax is pretty bad, due to being so uniform; programs are heavy on meaningless intermediate variables such as ``R`` in the above.
 
 Functions
 ---------
@@ -1402,91 +1505,52 @@ Scala and JS have support for native XML literals. Scala had one syntax in 2 and
 
  All these support interpolation, so that's clearly a necessary feature.
 
-Identifier conventions
-======================
-
-Per Binkley, a study of real-world code found identifiers were 43.8% 1 part, 29.5% 2 parts, 16.4% 3 parts, 6.2% 4 parts, and then we can calculate that 5+ parts were 4.1%. Longer identifiers require more time to read, due to requiring more fixations - Binkley found that going from 2 to 3 identifier parts increased the cloud word recognition task time by 0.981 seconds. For identifiers of one part there is only uppercase vs lowercase (ignoring stUDly CaPs), but for more parts we can choose how to join them: snake case (underscore style), camel case, kebab case (hyphen style), or Unicode characters like a shaded box ░ or half box ▆.
-
-* The shaded box performed well in Epelboim's study, being only 4% harder to read than normal text in conditions such as "░this░ ░is░" and "░░an░░example░░". Unfortunately the pattern of interest "some░filler" was untested, but just reading it with my eyes it doesn't seem too hard to read. Visually, a shaded box should allow easily distinguishing the extent of an identifier. Of course typing the shaded box character is somewhat difficult as it's not on standard keyboards.
-
-* The underscore style provides clear separation between parts when looking at the top, but may require a little work to see the underscore and identify the extent of an identifier. Sharif's figure 4 of underscore fixations suggests that underscore-trained programmers do not need any extra fixations to see the underscores, but that the last identifier part may have a longer fixation. There is also a "bumpy case" variation that uses double underscores, I don't think this improves readability.
-
-* Kebab case moves the line upwards, closer to the centers of the fixations, and may thus make it easier to determine identifier extents. The half box could provide a similar benefit. Searching Google Scholar for kebab case produced no academic studies on readability.
-
-* Camel case requires identifying parts by letter height variation, which is visually difficult and requires longer or multiple fixations. Camel case does however provide good extent identification. In long lines of identifiers it is probably as bad as text with no spaces, which takes 44% mean percent longer to read. Binkley and Sharif found 13.5% and 20% longer times with camel case over underscores for the cloud task. 2-part identifiers take approximately the same time as underscores - it is 3-part identifiers that lose big.
-
-* Whitespace - there are ways to allow true multi-word identifiers using the space character, but these suffer greatly from extent identification because there is visual indication at all.
-
-When reading there is an accuracy vs speed tradeoff. If something is really easy to read, one tends to just skim it and gloss over small mispellings. Because camel case is harder to read, in Binkley's studies the participants took longer to answer the questions and had slightly improved accuracy overall compared to snake case. There was however one contrary question: ``attackerHealth/attackedHealth`` was easily confused in camel case but in snake case ``attacker_health/attacked_health`` it was more distinguishable due to the stem of the d.
-
- so far have not had enough statistical power to state any effects. Binkley found, if there are effects, in his studies they were most likely that training on an identifier style increased the speed of reading that style but made it harder to read other styles. Training on a style may also make it harder to read normal prose that had been randomly chunked into identifiers of that style.
+Identifiers
+===========
 
-:cite:`walkerFontTuningReview2008` found it is 2% faster (15ms) to read a word if the preceding word is in the same font. :cite:`beierHowDoesTypeface2013` found that this "font tuning" adaptation increases, so that reading the same font for 20 minutes improves the speed of reading paragraphs by 6.3% (0.43 paragraphs or 11wpm). Even after significant font exposure there is also a "familiarity" effect where common fonts such as Times and Helvetica do 9.4% better (0.64 paragraphs or 16wpm) than novel fonts.
+Let's go through Steelman 2E.
 
-There may also be an effect on longer timescales, where exposure to a font increases reading speed in that font weeks or years later, but the 20-minute exposure effect was only
+"Mnemonically significant identifiers shall be allowed." The mnemonically significant identifiers makes a lot of sense. I'm not sure what a language without them would look like - symbols like ``gen872060``? Even random numbers have some significance though. Maybe an esolang like Whitespace or Brainfuck that simply doesn't allow user-defined identifiers. Strange considerations aside, Stroscot allows defining identifiers so this requirement is satisfied.
 
- There is probably a similar effect for 20-minute exposure to a given identifier convention (TODO: conifrm with formal study). Since long-term memory is anything longer than 30 seconds and observers can fairly accurately pick out a photo exposed for 3 seconds, even hours or days later, any training effect can probably be entirely explained as a function of the exposure to the visual patterns of letters.
+The actual names chosen do affect readability. Binkley found that the phrases likely to be found in programs were answered incorrectly more often than prose phrases. This is likely due to the phrases being less memorable, e.g. it seems like it would be easier to confuse "full pathname" with "full pathnum" than "river bank" with "river tank" (unfortunately the most common distractor is not reported).
 
+"There shall be a break character for use within identifiers." This is a bit trickier. There are many choices: whitespace in identifiers, snake case  (underscore_style), camel case (CapsStyle), kebab-case, or Unicode characters like a shaded box ░ or half box ▆. In contrast for identifiers of one part there are really only 3 styles, Uppercase vs lowercase vs ALLCAPS (ignoring stUDly CaPs). Per Binkley, a study of real-world code found identifiers were 43.8% 1 part, 29.5% 2 parts, 16.4% 3 parts, 6.2% 4 parts, and then we can calculate that 5+ parts were 4.1%. Longer identifiers require more time to read, due to requiring more fixations - Binkley found that going from 2 to 3 identifier parts increased the cloud word recognition task time by 0.981 seconds.
 
-There is also a short-term memory "font tuning" effect where one adapts to the glyphs of a specific font and slows down if a glyph from a random readable font is inserted, but this mainly means to avoid a mixture of identifier styles. It's worth checking if there is a tuning effect only for mixing identifiers of the same syntactic category or if having module names in camel case is slower if variables are in snake case vs camel case.
+* The shaded box performed well in Epelboim's study, being only 4% harder to read than normal text in conditions such as "░this░ ░is░" and "░░an░░example░░". Unfortunately the pattern of my interest "some░filler" was untested, but just reading it with my eyes it doesn't seem too hard to read. Visually, a shaded box should allow easily distinguishing the extent of an identifier. Of course typing the shaded box character is somewhat difficult as it's not on standard keyboards.
 
-The actual names also affect readability. Binkley found that the phrases likely to be found in programs were answered incorrectly more often than prose phrases. This is likely due to the phrases being less memorable, e.g. it seems like it would be easier to confuse "full pathname" with "full pathnum" than "river bank" with "river tank" (unfortunately the most common distractor is not reported).
+* The underscore style provides clear separation between parts when looking at the top, but may require a little work to see the underscore and identify the extent of an identifier. Sharif's figure 4 of underscore fixations suggests that underscore-trained programmers do not need any extra fixations to see the underscores, but that the last identifier part may have a longer fixation. There is also a "bumpy case" variation that uses double underscores, I don't think this improves readability.
 
-Theoretically, the best identifier style could vary by individual and depend on external factors such as font or eye correction. It's unlikely, because the eye-tracking data says that camel case needs more eye fixations, and that's a physical constraint of the human visual system that probably doesn't change by individual. But, automatically translating between different identifier styles is possible, so identifier style could be yet another option in the code formatter, along with a compile option to translate the style of libraries.
+* Kebab case moves the line upwards, closer to the centers of the fixations, and thus most likely makes it easier to determine identifier extents than underscore style, although perhaps still performing worse than the shaded box ░. The half box ▆ could provide a similar benefit, probably again performing between underscore and shaded box. Searching Google Scholar for kebab case produced no academic studies on readability, but Lisp aficionados on Google say they prefer kebab to snake.
 
-Even so, just letting people pick a style would not work. Wallace found that people's preferred font (as found by asking "which font is this paragraph more readable in?") has no correlation with their fastest font among 5 fonts, and that the best font among 16 tested varied randomly per individual. So an actual timed test is really the only option to obtain optimal performance. Wallace found a 14% improvement in WPM reading speed over the subjectively preferred font and 35% over the worst font. Unfortunately Wallace's study did not control for exposure so they may just be measuring noise. Exposure accounted for 30ish wpm in Beier's study and in Wallace's study the difference between fastest and preferred was 39wpm.
+* Camel case requires identifying parts by letter height variation, which is visually difficult and requires longer or multiple fixations. Camel case does however provide good extent identification. In long lines of identifiers it is probably as bad as text with no spaces, which takes 44% mean percent longer to read. Binkley and Sharif found 13.5% and 20% longer times with camel case over underscores for the cloud task. 2-part identifiers take approximately the same time as underscores - it is 3-part identifiers that lose big.
 
+* Whitespace - there are ways to allow true multi-word identifiers using the space character, but these suffer greatly from extent identification because there is no visual differentiation between within-identifier and between-identifier spacing. For example with Haskell syntax, ``do something = ...``` can either define the identifier ``do something``, or be a clause ``do _`` that binds ``something``. Even using a string, ``"do a really complicated thing" = ...``, it is a bit easy to miss the quotes at first glance and imagine it is a pattern. A solution like ``do ^something = ...`` is a bit tricky to evaluate - it matches the 2-atom ``do something``, so "do something" is not parsed as a single identifier but rather a term of two symbols. This allows some flexibility but also is distinct enough from an identifier-with-breaks that I don't think it counts.
 
-This doesn't solve the issue of picking a default style for the standard library though. Non-programmers prefer the underscore style, and it's more readable, so that's probably what common variables should use. Since that puts us in the Python camp and Python is the most popular language we should probably just follow Python's identifier conventions uniformly.
+When reading there is an accuracy vs speed tradeoff. If something is really easy to read, one tends to just skim it and gloss over small mispellings. Because camel case is harder to read, in Binkley's studies the participants took longer to answer the questions and had slightly improved accuracy overall compared to snake case. There was however one contrary question: ``attackerHealth/attackedHealth`` was easily confused in camel case but in snake case ``attacker_health/attacked_health`` it was more distinguishable due to the stem of the d.
 
+So far the effects of training have not had enough statistical power to state any effects. Binkley found, if there are effects, in his studies they were most likely that training on an identifier style increased the speed of reading that style but made it harder to read other styles. Training on a style may also make it harder to read normal prose that had been randomly chunked into identifiers of that style. I would say, based on the studies of font tuning, there is likely is a tuning effect, and it is best to standardize on a single identifier style for common usage so as to encourage developers to tune their visual system to that style. But again based on the font tuning studies, this tuning effect is relatively small, less than 10%, and a developer will mostly be adapted within 20 minutes, so it is not a huge deal.
 
-Layout
-======
+One question I'd like addressed is if using different identifier part styles for different semantic categories also is subject to the tuning effect. For example, if variables are in snake case, does having module names in camel case significantly slow down skimming a module vs. having module names in upper snake case? I think a developer can get used to having a mixed set of identifier styles but maybe it is slower.
 
-The arrangement of characters in the document is primarily to assist the human reader. However, as Python and Haskell have shown, making the interpretation of the program indentation-sensitive can avoid mismatches between the compiler and the reader. A free format, in constrast, can lead to confusing situations where the compiler interprets a missing brace unexpectedly.
+Overall I think there are trade-offs for every break character, I don't see any reason to mandate the use of a particular style, other than forbidding whitespace due to implementation difficulties and lack of readability. For the standard library, kebabs should be the preferred style, as based on the limited evidence they are the most readable. The half box seems attractive for users that have access to full Unicode input. Another choice is the underscore style, as that puts us in the Python camp and Python is the most popular language currently.
 
-The most obvious instance of layout is the module declaration list, but blocks, lists, and records can start layout as well. For readability, clauses may span multiple lines, so some way of distingishing the start / end of clauses must be defined. This amounts to adding braces and semicolons so as to make it layout-insensitive. The inserted braces are "virtual": they don't grammatically match with braces explicitly written in the file, but are otherwise semantically identical.
+Theoretically, the best identifier style could vary by individual and depend on external factors such as font or eye correction. It's unlikely, because the eye-tracking data says that camel case needs more eye fixations, and that's a physical constraint of the human visual system that probably doesn't change by individual. But, automatically translating between different identifier styles is possible, so identifier style should be yet another option in the code formatter, along with a compile option to translate the style of libraries.
 
-::
+"The language and its translators shall not permit identifiers or reserved words to be abbreviated. (Note that this does not preclude reserved words that are abbreviations of natural language words.)"
 
-  assertEqual
-    {
-      a
-        b
-        c
-      d
-    }
-    { a {b; c}; d}
+You might think the difference between ``pos`` and ``position`` is trivial but it's not - one is a relatively common English word, and one is programming jargon. I haven't looked for a study but I find I often end up "mispelling" an identifier by typing it out fully when it is abbreviated, or vice-versa. I think given the advent of large screens, gigabyte storage, and autocomplete, the best policy is zero abbreviations - neither in user identifiers nor in reserved / language identifiers. A policy that identifiers should always be fully spelled out is consistent and memorable.
 
-Behavior of a new line depends on its indentation level, relative to the indentation of the previous line. Indentation level is taken to be the sequence of whitespace characters related by "is prefix of", so "space, tab, space" is less than "space, tab, space, em space", more than "space, tab", and incomparable to "tab, space, space". Incomparable levels are an error.
+Now regarding that the language should not permit abbreviations within the source, I have to disagree for reasons similar to allowing unmatched delimiters - at a REPL prompt the abbreviations are helpful in shortening the input necessary. But of course again it is worth turning on the warn/error/fix for an IDE. Notably, with IDE autocomplete, it is most likely harder to input an abbreviation than to input the full name.
 
-* If it is indented more, it's a sequence given as an argument to the previous line, so a virtual open brace is inserted
-* If it is at the same level, another item in the sequence, so a (virtual) semicolon is inserted
-* If there is a line at lower indentation (or EOF), the sequence is ended as it's a new declaration (`off-side rule <https://en.wikipedia.org/wiki/Off-side_rule>`__). A virtual close brace is inserted at the start of the line.
+Reserved words
+==============
 
-Layout handling is complicated by the presence of grammar rules without layout that allow free choice of indentation, e.g.
+Steelman 2F: "The only reserved words shall be those that introduce special syntactic forms (such as control structures and declarations) or that are otherwise used as delimiters. Words that may be replaced by identifiers, shall not be reserved (e.g., names of functions, types, constants, and variables shall not be reserved). All reserved words shall be listed in the language definition."
 
-::
-
-  assertEqual
-    a
-      + b
-      + c
-    a {+ b; + c}
-    a + (b + c)
-
-It should be possible to handle these with a fixup phase.
-
-Also, closed operators (e.g. parentheses) inhibit layout; this amounts to skipping whitespace layout when inside an explicit delimiter pair. But of course constructs inside the delimiter pair can start another layout. Finally for constructs that usually use layout we still want to parse 1-line things without braces:
-
-::
-
-  assertEqual
-    let a = b in c
-    let { a = b } in c
+I've never felt comfortable with reserved words, they always seem like restrictions on the use of the language. Why can't I define a variable named ``if``? It's just an arbitrary parser restriction - ``if`` is lexed as a "reserved token" and therefore it is not in the class of "identifier tokens" so the assignment grammar production does not match. But any human can read some code like ``if = 1; print(if+2)`` and understand the intent. In Java, they work around this issue with ``Class klazz`` and other weird mispellings. IMO it is better to just hack the parser so that you can write ``Class class`` like the programmer intended. So Stroscot shall have no reserved words.
 
 Operators
----------
+=========
 
 New operators can be declared with `mix <http://www.cse.chalmers.se/~nad/publications/danielsson-norell-mixfix.pdf>`__ `fix <http://www.bramvandersanden.com/publication/pdf/sanden2014thesis.pdf>`__ semantics, e.g.
 
@@ -1525,23 +1589,15 @@ Most operators are textual:
 
 Minus is both a unary prefix operator and a binary infix operator with special support to disambiguate the two. ``(-)`` denotes the binary minus operator and ``neg`` the unary minus operation.
 
-Parentheses
------------
-
-Parentheses can be used to group expressions and override parsing as usual.
-
-For brevity, trailing parentheses can be omitted:
+Expressions
+===========
 
-::
+Steelman 4D. Expressions of a given type shall be allowed wherever both constants and variables of the type are allowed.
+4E. Expressions that can be evaluated during translation shall be permitted wherever literals of the type are permitted. Translation time expressions that include only literals and the use of translation time facilities (see 11C) shall be evaluated during translation.
 
-  assertEqual
-    3+1/(7+1/(15+1/1
-    355/113
-
-Although it parses, you can set Stroscot to warn or error on unmatched parentheses, or run the code formatter which will add them.
 
 Chained Comparison
-------------------
+==================
 
 ::
 
@@ -1562,8 +1618,8 @@ Identifiers cannot be directly reassigned; you can shadow, which generates a war
 
 Mutable assignment (``:=``) is completely distinct from name binding (``=``). They have distinct notation and mutable assignment cannot create new bindings.
 
-Functions
-=========
+Clauses
+=======
 
 Sequential matching:
 
@@ -1838,6 +1894,28 @@ Control structures
 
 These are things that can show up in blocks and have blocks as arguments.
 
+Steelman 6A. The (built-in) control mechanisms should be of minimal number and complexity.
+
+Well, Stroscot aims for functionality, so maximal number and complexity.
+
+6A (cont'd).  Each shall provide a single capability and shall have a distinguishing syntax. Nesting of control structures shall be allowed.
+
+This part seems fine.
+
+6A (cont'd) There shall be no control definition facility.
+
+Nope nope nope, definitely you will be able to define control structures.
+
+6A (cont'd) Local scopes shall be allowed within the bodies of control statements.
+
+5G. The language shall distinguish between open scopes (i.e., those that are automatically included in the scope of more globally declared variables) and closed scopes (i.e., those in which nonlocal variables must be explicitly Imported). Bodies of classical control structures shall be open scopes.
+
+This is pretty interesting, sort of a Python-style thing where variables are in scope in the whole body of a procedure.
+
+6A (cont'd). Control structures shall have only one entry point and shall exit to a single point unless exited via an explicit transfer of control (where permitted, see goto), or the raising of an exception.
+
+This is pretty vague with regards to what constitutes a "point" but I interpret it as the condition enforced on imperative program threads that they execute one action at a time and this action must be chosen deterministically. It seems that Stroscot's use of the continuation monad enforces this structured approach so it isn't an issue.
+
 ::
 
   a = if true then 1 else 2 -- just a function if_then_else_ : Bool -> a -> a -> a
@@ -1885,6 +1963,58 @@ The do-block always executes, and the while-block only executes if the condition
     processChar(c);
   }
 
+Steelman 6E. There shall be an iterative control structure. The iterative control may be exited (without reentry) at an unrestricted number of places. A succession of values from an enumeration type or the integers may be associated with successive iterations and the value for the current iteration accessed as a constant throughout the loop body.
+
+ https://www.ccs.neu.edu/home/shivers/papers/loop.pdf / https://www.youtube.com/watch?v=PCzNwWmQdb0
+
+Goto/Break/continue
+-------------------
+
+Steelman 6G. There shall be a mechanism for control transfer (i.e., the go to). It shall not be possible to transfer out of closed scopes, into narrower scopes, or into control structures. It shall be possible to transfer out of classical control structures. There shall be no control transfer mechanisms in the form of switches, designational expressions, label variables, label parameters, or alter statements.
+
+`Core <https://github.com/core-lang/core/issues/44>`__ proposes to drop break and continue due to implementation complexity and mental complexity. He argues that it is clearer to use an extra boolean variable and only slightly clunkier. Per the `structured program theorem <https://en.wikipedia.org/wiki/Structured_program_theorem#Implications_and_refinements>`__ it is possible to compute any computable function with three control structures, semicolon, if, and while (and no break/continue). There are drawbacks in that the theorem usually must introduce additional local variables and duplicate code. For example consider `this program <https://ecommons.cornell.edu/bitstream/handle/1813/34898/bbang.pdf?sequence=2>`__::
+
+  start = state0
+  state0 | a0 = halt
+         | a1 = p01; state1
+         | a2 = p02; state2
+  state1 | a1 = halt
+         | a0 = p10; state0
+         | a2 = p12; state2
+  state2 | a2 = halt
+         | a1 = p21; state1
+         | a2 = p20;
+
+A translation into structured programming loosely based on the paper::
+
+  state = mut 0
+  halt = mut false
+  while !halt
+    if state == 0 then
+      if α1 then
+         p01; state := 1
+      else if α2 then
+         p02; state := 2
+      else halt := true
+    else if state == 1 then
+      if α2 then
+        p12; state := 2
+      else if α0 then
+        p10; state := 0
+      else halt := true
+    else
+      assert (state == 2) //must be state 2
+      if α0 then
+         p20; state := 0
+      else if α1 then
+         p21; state := 1
+      else halt := true
+
+Notice this is longer than the original description using recursion, mainly due to the extra variables. S. Rao Kosaraju proved that with arbitrary-depth, multi-level breaks from loops it's possible to avoid adding additional variables in structured programming, but known algorithms still duplicate code. In common cases the duplication can be avoided by clever structuring though.
+
+Per https://hal.inria.fr/inria-00072269/document Table 5, the most common flow-affecting constructs in Java were (as a percentage of methods) return (65.5%), short-circuit operators (13.2%), single-level break (3.6%), single-level continue (0.3%), and labelled break/continue (0.13%). A `different comparison <https://sci-hub.se/10.1002/spe.2298>`__ (Table IV) finds that continue and goto are about equal in frequency in C, that synchronized and continue are about equal in frequency in Java, and break is about half as common as try/catch/throw in Java.
+
+In Stroscot, it's not too hard to provide break/continue/goto within the continuation-based I/O model, and many C programmers will expect these operators, so they are planned to be implemented. They will be provided as functions, rather than as keywords, so will be imported and not steal syntax by default.
 
 Programs
 ========
@@ -1904,6 +2034,8 @@ So top-level statements and function calls are allowed. For example you can impl
 Comments
 ========
 
+Steelman 2I "The language shall permit comments that are introduced by a special (one or two character) symbol and terminated by the next line boundary of the source program." This is just the simplest EOL comment, but there are other types.
+
 ::
 
   // comment
diff --git a/docs/Commentary/Time.rst b/docs/Commentary/Libraries/Time.rst
similarity index 97%
rename from docs/Commentary/Time.rst
rename to docs/Commentary/Libraries/Time.rst
index 40e73b9..acedbba 100644
--- a/docs/Commentary/Time.rst
+++ b/docs/Commentary/Libraries/Time.rst
@@ -1,6 +1,8 @@
 Time API
 ########
 
+9E. It shall be possible to access a real time clock. There shall be translation time constants to convert between the implementation units and the program units for real time. A process may have an accessible clock giving the cumulative processing time (i.e., CPU time) for that process.
+
 Instant
 =======
 
diff --git a/docs/Commentary/Units.rst b/docs/Commentary/Libraries/Units.rst
similarity index 100%
rename from docs/Commentary/Units.rst
rename to docs/Commentary/Libraries/Units.rst
diff --git a/docs/Commentary/Libraries/index.rst b/docs/Commentary/Libraries/index.rst
new file mode 100644
index 0000000..3cd1a92
--- /dev/null
+++ b/docs/Commentary/Libraries/index.rst
@@ -0,0 +1,10 @@
+Libraries
+#########
+
+This section is focused on details of the standard library and syntax of the language. The core language is the current focus of development so this section is naturally a bit vague.
+
+.. toctree::
+  :maxdepth: 2
+  :glob:
+
+  *
\ No newline at end of file
diff --git a/docs/Commentary/Code-of-conduct.rst b/docs/Commentary/Meta/Code-of-conduct.rst
similarity index 100%
rename from docs/Commentary/Code-of-conduct.rst
rename to docs/Commentary/Meta/Code-of-conduct.rst
diff --git a/docs/Commentary/Community.rst b/docs/Commentary/Meta/Community.rst
similarity index 87%
rename from docs/Commentary/Community.rst
rename to docs/Commentary/Meta/Community.rst
index 184aea3..b94758e 100644
--- a/docs/Commentary/Community.rst
+++ b/docs/Commentary/Meta/Community.rst
@@ -172,3 +172,12 @@ Culture policing
 ================
 
 Something different is culture policing, ensuring communication stays on-tone in project channels. This too can be automated but it is a little more tricky because the bot should not censor speech, only fight speech with more speech. So somehow we have to get the AI to understand a concept like `Graham's hierarchy of disagreement <https://themindcollection.com/revisiting-grahams-hierarchy-of-disagreement/>`__ and make productive comments that encourage discussions to go up the hierarchy rather than down. Just calling out name-calling as name-calling is unlikely to dissuade a contributor from their path of unintelligent argument. Indeed calling their argument unintelligent is itself a form of name-calling, although when an emotionless language model says your argument is unintelligent it does carry a bit more weight. But if the bot can make level-6 arguments like "you said X is an idiot but actually X has won the Nobel prize", even if the information is hallucinated, it will still be funny enough to derail the conversation onto a more productive track. And if instead the bot starts an argument with a troll and the troll spends all their time arguing with the bot, that is in some sense a win too.
+
+Over-engineering
+================
+
+Discussions are a fundamental part of software development. They increase scalability and allow multiple people to work together. One common discussion pattern is "over-engineering", when one or more engineers make a subjective assessment that a design is "more robust or complicated than is necessary". This can be an early warning sign of potential quality problems and missed milestones.
+
+Now there are ways to justify robustness and complexity based on design requirements. Maybe the robustness and complexity is a safety factor, or performance-critical, or broad functionality is required or necessary for the design. But often, systems are overbuilt because it is felt it will add "intrinsic value". The complexity is not tied to a design requirement but rather speculation of future requirements. Or else it is based on a "design principle" such as DRY or MVC or so on, with no investigation of the source of this design principle or whether the design principle is actually applicable. Maybe there is a new framework that a developer wants to try out. These can be evaluated using https://xkcd.com/1205/ - the time to learn vs. time saved. If we build a website once a year, we can guess that even looking for a framework costs half an hour, so it is not worth it unless the framework saves us more than 6 minutes. And since we will have to spend time learning the framework, probably it is not worth it. But if we build 50 websites a day, we can spend a few days learning a framework and it will pay off even if it just saves a few seconds. And we are assuming that you have learned the basic JS API already - if you are starting from scratch, maybe there are more jQuery tutorials than basic DOM tutorials, so the framework is even more justifiable.
+
+So the design requirements itself are not sufficient to determine over-engineering. The context of the software and the existing experience of the programmer must be taken into account. Knowing the design requirements is also necessary, e.g. to determine if a framework is applicable and can save time. However, even when the full picture is available, over-engineering is somewhat subjective in that building a better foundation may save time for maintainers in the future.
diff --git a/docs/Commentary/Documentation.rst b/docs/Commentary/Meta/Documentation.rst
similarity index 91%
rename from docs/Commentary/Documentation.rst
rename to docs/Commentary/Meta/Documentation.rst
index 08ddc31..9ff9a33 100644
--- a/docs/Commentary/Documentation.rst
+++ b/docs/Commentary/Meta/Documentation.rst
@@ -5,6 +5,11 @@ Documentation
 
 Dijkstra also says a clear separation of concerns in a helpful manner is "a (technical) conditio sine qua non" [an indispensable condition] for any complicated undertaking. Its absence/presence is one of the more telling indications for judging the expected (in)competence for the task at hand.
 
+Steelman 1F says "The language shall be composed from features that are understood. The semantics of each feature should be sufficiently well specified and understandable that it will be possible to predict its interaction with other features." Steelman 1H says "The language shall be completely and unambiguously defined. To the extent that a formal definition assists in achieving the above goals (i.e., all of section 1), the language shall be formally defined."
+
+13A. The language shall have a complete and unambiguous defining document. It should be possible to predict the possible actions of any syntactically correct program from the language definition. The language documentation shall include the syntax, semantics, and appropriate examples of each built-in and predefined feature. A recommended set of translation diagnostic and warning messages shall be included in the language definition.
+
+
 Format
 ======
 
diff --git a/docs/Commentary/Funding.rst b/docs/Commentary/Meta/Funding.rst
similarity index 100%
rename from docs/Commentary/Funding.rst
rename to docs/Commentary/Meta/Funding.rst
diff --git a/docs/Commentary/Guidelines-locbylanguage.csv b/docs/Commentary/Meta/Guidelines-locbylanguage.csv
similarity index 100%
rename from docs/Commentary/Guidelines-locbylanguage.csv
rename to docs/Commentary/Meta/Guidelines-locbylanguage.csv
diff --git a/docs/Commentary/Guidelines.rst b/docs/Commentary/Meta/Guidelines.rst
similarity index 88%
rename from docs/Commentary/Guidelines.rst
rename to docs/Commentary/Meta/Guidelines.rst
index 1fc9b1f..3db992a 100644
--- a/docs/Commentary/Guidelines.rst
+++ b/docs/Commentary/Meta/Guidelines.rst
@@ -25,10 +25,14 @@ Principles
 * If it isn't documented, it doesn't exist. Not only does it have to be doc'd, but it has to be explained and taught and demonstrated. Do that, and people will be excited -- not about your documentation, but about your product. (`Mike Pope <https://web.archive.org/web/20230529230333/https://www.mikepope.com/blog/displayblog.aspx?permalink=1680>`__ via `Coding Horror <https://blog.codinghorror.com/if-it-isnt-documented-it-doesnt-exist/>`__). Corollary: There is no undefined behavior, only undocumented behavior.
 * Liking increases monotonically for both higher complexity and higher number of presentations. :cite:`madisonRepeatedListeningIncreases2017` (Originally this was "some complexity is desirable" from :cite:`normanLivingComplexity2010` page 13, but then I looked for sources and there was a stronger conclusion. Fig. 2 seems to have a liking dip for the highest complexity. Other studies used unnatural stimuli or did not control for familiarity, resulting in averaging experienced and inexperienced participants. :cite:`gucluturkDecomposingComplexityPreferences2019` There is a difference between objective vs. subjective complexity measures; using Fisher information to convert from objective to subjective measures of information produces the typical inverted U-shape for objective measures. :cite:`grzywaczDoesAmountInformation2022`)
 * Better depends on your customer's goodness metric. It is time for us to reject the simple-minded interpretation of the slogan "worse is better", and start putting out software that really is better (on the dimension of goodness that our customers have, not necessarily our own). (`Jim Waldo <http://web.archive.org/web/20210325222034/https://www.artima.com/weblogs/viewpost.jsp?thread=24807>`__)
+* "Good Design Is Easier to Change Than Bad Design". A thing is well designed if it adapts to the people who use it. For code, that means it must adapt by changing. So, a good design is Easier To Change (ETC). As far as we can tell, every design principle out there is a special case of ETC. Why is decoupling good? Because by isolating concerns we make each easier to change. Why is the single responsibility principle useful? Because a change in
+requirements is mirrored by a change in just one module. Why is naming important? Because good names make code easier to read, and you have to read it to change it.  (Pragmatic Programmer 2019 edition, page 28)
 
 Paradigms
 =========
 
+"General purpose" programming languages are "general purpose" in that you can write any system. But in Java 1.5 you couldn't do currying - no closures. You could do something with an interface and an anonymous class, maybe some weird decorator pattern to make it less verbose and a library for currying, but it's just not the same as writing ``foo 1`` and having it work. Paradigms strongly dictate how you structure your code. Semantics matters - it can change how we think about the problem that we're trying to solve. For example, concurrency is easier if all your values are immutable. Also performance of paradigms is a consideration - generally you can express the same algorithm both ways and it will get compiled the same, but with for example immutability it may not necessarily compile to in-place update. But performance is not automatically faster or slower with a given paradigm - there are generally examples of both speedups and slowdowns - whereas if a paradigm is suited for a given task it's really obvious. You can use inheritance to model any problem, but if it's modeling the lambda calculus, then using a functional paradigm with ADTs and built-in lambdas in Haskell is going to be a lot `easier <https://bor0.wordpress.com/2019/03/19/writing-a-lambda-calculus-evaluator-in-haskell/>`__ and less code than `Java <https://github.com/noti0na1/LambdaCalculus-java>`__.
+
 The programming languages checklist has a few paradigms: functional, imperative, object-oriented, procedural, stack-based, "multi-paradigm". In linguistics, a paradigm is "a set of linguistic items that form mutually exclusive choices in particular syntactic roles," specifically "a table of all the inflected forms of a particular verb, noun, or adjective." This seems to be a usable definition of a PL paradigm - you have all related versions of a semantic entity.
 
 Unfortunately people seem to use paradigms as labels of entire languages, rather than to refer to individual syntactic features. Stroscot, like every other language, is "multi-paradigm" - even assembly is multi-paradigm since it is imperative (syscalls) and structured (conditional jump). So the adjectives "object-oriented", "functional", etc. are avoided outside of this page in favor of the specific semantic constructs, since "functional object-oriented language" sounds weird. Still, it's good to have a map from paradigms to constructs, and to know which constructs embed into which other constructs. This list is based on Wikipedia's `list of paradigms <https://en.wikipedia.org/w/index.php?title=Template:Programming_paradigms&oldid=1114006717>`__:
@@ -207,6 +211,14 @@ Functionality
 
 Stroscot aims to be a `wide-spectrum language <https://en.wikipedia.org/wiki/Wide-spectrum_language>`__. That is, for every way to do X, Stroscot should also allow doing X in that way. The logic behind this is simple: If Stroscot can't do X, then people will choose to use another language that can do X. Practically, I have limited the domain of X to activities expressed in research publications and other programming languages, i.e., a systematic survey, so that the amount of functionality to consider is at least finite. I've mainly found novel ideas and techniques in obscure papers from decades ago, but there have also been a rare few published in the past few years. It is actually really hard to come up with better ideas than the old papers. And I'm not aware of any other programming languages that have tried to do a systematic search through the literature for features; academic languages are narrowly focused and practical languages do not innovate much. So Stroscot is at least somewhat innovative in its design by aiming for functionality in this way.
 
+Motivation for this comes from :cite:`ichbiahRationaleDesignADA1979` (edited):
+
+  We believe that the language designer should not forbid a facility. He should never take the attitude of the Newspeak [Or 50] designer:
+
+    "Don’t you see that the whole aim of Newspeak is to narrow the range of thought? In the end we shall make thought-crime impossible, because there will be no words in which to express it."
+
+  Rather, he should always strive to expand the expressive power of the language.
+
 Many languages suffer from "idea envy", where they try to retrofit new ideas from other languages. For example C++ and Java have added lambdas relatively recently. When a programming language changes significantly in this way, it loses its identity - for example, Python 2 and Python 3 are effectively separate programming languages, as are Perl 5 and Raku (Perl 6). There are already projects that advertise themselves as "modern C++17" rather than simply "C++". A new language needs new tools and new libraries; in this case, a split between non-updated and updated C++ tools. Minimizing the number of new languages / breaking language changes is best. The source of these changes is clear: ideas that were missed out on in the initial design. The lambda calculus dates to the 1930s, and anonymous functions were included in Lisp in 1958, long before C++ was designed in the 1980s. The retrofitting in C++ is due to a shallow intellectual base. By instead preferring coverage of all functionality from the start, we ensure a future-proof design. Even if new ideas emerge after the initial design, they are generally small tweaks on old ideas. With sufficient research these old ideas can be uncovered and incorporated, making it a minimal change to accommodate the new ideas.
 
 You may point to INTERCAL's COMEFROM as something best avoided, but it's not hard to implement. The trickier parts are actually at the low level, interfacing memory management and calling conventions, and the value proposition there for a powerful interface should be clear. Providing a broad set of features will mean that the language is suitable for whatever project someone is thinking about. Another theory is that, even if Stroscot fails as a language, implementing lots of features will make people copy Stroscot's list of features.
@@ -223,9 +235,13 @@ This is an Ecstasy principle. But it's misleading - going infinitely downward wo
 Minimal core
 ------------
 
+Tinman I5 "The source language will contain a simple, clearly identifiable base, or kernel, which houses all the power of the language. To the extent possible, the base will be minimal with each feature providing a single unique capability not otherwise duplicated in the base. The choice of the base will not detract from the efficiency, safety, or understandability of the language."
+
 Minimalism is bad. If you build on an existing language but include no new features, then there’s no incentive to use your language. If your language only provides a minimal Turing-complete set of operations like Brainfuck, figuring out how to express programs in it will be difficult, and the resulting encoding most likely will be incomprehensible. Thus, minimalism must take second priority to functionality. But, given that we must provide all possible features, minimalism offers an approach to implementing them in a methodical, useful manner.
 
-Certainly, there is the possibility of just implementing them all independently as some sort of hodgepodge, but I like GHC's structure of having a smallish "core" language (System FC), and translating the rest of the language down to it. In fact there is not much to Haskell besides System FC; the language proper is quite small, and most of the idioms of Haskell are implemented in libraries. Similarly, for Stroscot I would like to define a "core" language that provides only the basic, necessary abstractions and tools for defining more abstractions, such as macros and syntactic extensions. Then the compiler only has to focus on handling these core constructs well, but the standard library can implement all the parts that users interact with. With suitable abstraction facilities, this approach doesn't lose any expressiveness because we can still implement any language construct we can think of. We have not "surrender[ed] the adequate representation of a single datum of experience", but merely reduced the reducible elements. The surface language is still complex, modern, and slick. Developers can focus on learning the core language's general constructs, and then learn libraries by reading their source code, or they can follow more of a "learn by doing" approach where they learn the libraries they like from the documentation and examples, without understanding the implementation.
+Certainly, there is the possibility of just implementing them all independently as some sort of hodgepodge, but I like GHC's structure of having a smallish "core" language (System FC), and translating the rest of the language down to it. In fact there is not much to Haskell besides System FC; the language proper is quite small, and most of the idioms of Haskell are implemented in libraries. Similarly, for Stroscot I would like to define a "core" language that provides only the basic, necessary abstractions and tools for defining more abstractions, such as macros and syntactic extensions. Then the compiler only has to focus on handling these core constructs well, but the standard library can implement all the parts that users interact with. With suitable abstraction facilities, this approach doesn't lose any expressiveness because we can still implement any language construct we can think of. We have not "surrender[ed] the adequate representation of a single datum of experience", but merely reduced the reducible elements. We can satisfy Steelman 1E: "The [core] language should not contain unnecessary complexity. It should have a consistent semantic structure that minimizes the number of underlying concepts. It should be as small as possible consistent with the needs of the intended applications. It should have few special cases and should be composed from features that are individually simple in their semantics."
+
+The surface language is still complex, modern, and slick. Developers can focus on learning the core language's general constructs, and then learn libraries by reading their source code, or they can follow more of a "learn by doing" approach where they learn the libraries they like from the documentation and examples, without understanding the implementation.
 
 So what defines the "core" language? Well, per Einstein, each element should be basic, simple, and irreducible, and there should be as few elements as possible. More formally, we can consider the "core" as an orthonormal basis in an inner product space, with vectors as programming elements. Then our "core" must satisfy the following conditions:
 
@@ -270,7 +286,7 @@ Similarly, Stroscot aims for all four of these:
 
 * simple - Stroscot does aim to be "simple", in the etymological sense of "minimalism". Stroscot concentrates the language into a "core", a basis of features that can express all others and is as small as possible.
 * complex - The rest of the language, the standard libraries, user, libraries and user programs, braids the core features together and is "complex". Hickey argues that a project should not be complex (complected), but he is using a different metric, of how interleaved the conceptual parts of the program are, rather than its interleaving of language features. There is some benefit to ensuring a tree-structured call graph in a program, but I don't think this is a good argument to remove recursion.
-* easy - Stroscot aims for its standard library to make things "easy". There's no downside, right?
+* easy - Stroscot aims for its standard library to make things "easy", doable without much training and in few lines of code. There's no downside, right?
 * hard - Stroscot also aims to have a "strong", "powerful" standard library, that doesn't change often, in other words a "hard" standard library.
 
 Looking at this, despite my saying that Stroscot aims to be simple in the sense of minimality or mathematical elegance, it doesn't seem that the language can be marketed as simple; there are just too many mixed messages. The fault does not lie with Stroscot, but rather the inadequacy of these words to express all aspects of an intricate design. As `Edsger Dijkstra <http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD896.html>`__ put it, "complexity sells better". If you spend all this time hyping up a language, and then it turns out it's so simple the design fits on a postcard, your audience will feel cheated and dismiss the result as trivial. As measured by :cite:`yaofeichenEmpiricalStudyProgramming2005`, "simplicity" and "implementability" are both correlated with a lack of adoption as a developer's primary language, while "extensibility" and "generality" are preferred. Fortunately though, this is all in the marketing. For example, people seem to say that Haskell is `extremrly complex <https://news.ycombinator.com/item?id=21567443>`__, but in the sense of Dijkstra, Haskell is "just syntax sugar" for System F, and has a simple theory. GHC Core is `9 constructors <https://www.youtube.com/watch?v=Gml1m-3L47s>`__. It is "only" the libraries and syntax sugar that add in the complexity.
@@ -324,9 +340,11 @@ Readability
 
 Humans interact with code in a variety of ways: skimming, reading, writing, understanding, designing, discussing, reviewing, and refactoring code, as well as learning and teaching how to code. Focusing on readability means leaving out other activities. Ergonomics covers all of these activities and is perhaps too broad, being difficult to connect directly with development costs. Humans have limitations in the domains of perception, memory, reasoning, and decision-making and the language should take these HCI factors into account. The design should aim for productivity, ergonomics, and comfort, reducing errors and fatigue and making the language accessible.
 
-Using the literal definition, "ease of understanding code", readability is included as part of the edit-test cycle time principle. `Yue Yao <https://tripack45.github.io/2018/11/03/edit-compile-run/>`__ says "The shorter the [edit-test] cycle, the happier the programmer." Per `here <https://web.archive.org/web/20060213015737/http://blogs.msdn.com/peterhal/archive/2006/01/04/509302.aspx>`__, the cycle time can be broken down into 70% Understanding Code, 25% Modifying Existing Code, 5% Writing New Code. In particular we estimate that there is 14x as much read time as write time. But this estimate is probably only appropriate for application code - the true average varies depending on scenario. Per APL, if a language is quick to program in, it may be faster to write small programs from scratch than to read and understand another person's program. So the 70/25/5 may turn into something more like 50/20/30 in a scripting context, only a 1.6x read-write factor. On the other hand, common library functions may be read many times but only modified or added rarely, giving read/write factors of 100x, 1000x, or more.
+Using the literal definition, "ease of understanding code", readability is measured as the edit-test cycle time. `Yue Yao <https://tripack45.github.io/2018/11/03/edit-compile-run/>`__ says "The shorter the 'edit-compile-run' cycle, the happier the programmer." Per `here <https://web.archive.org/web/20060213015737/http://blogs.msdn.com/peterhal/archive/2006/01/04/509302.aspx>`__, the cycle time can be broken down into 70% Understanding Code, 25% Modifying Existing Code, 5% Writing New Code. In particular we estimate that there is 14x as much read time as write time. But this estimate is probably only appropriate for application code - the true average varies depending on scenario. Per APL, if a language is quick to program in, it may be faster to write small programs from scratch than to read and understand another person's program. So the 70/25/5 may turn into something more like 50/20/30 in a scripting context, only a 1.6x read-write factor. On the other hand, common library functions may be read many times but only modified or added rarely, giving read/write factors of 100x, 1000x, or more.
+
+Steelman 1C lists "clarity, understandability, and modifiability of programs" as the meaning of readability. This also accords with cycle time - clarity involves skimming and debugging, understandability involves reading, and modifiability involves writing. Notably it does not accord with the intuitive understanding of readability as reading - since when has modification been part of readability?
 
-Cycle time has the benefit of being empirical - just provide some code and an editing task, time it, and average across a pool of subjects. In contrast, readability is more subjective - the author of some code will most likely consider his code perfectly readable, particularly immediately after writing said code, even if an average programmer would not. Of course, in a week or a few years, depending on the author's memory, any domain-specific knowledge will fade away and the author will struggle with his code just as much as any average programmer, but waiting ages just to measure readability is not feasible.
+Cycle time has the benefit of being empirically measurable - just provide some code and an editing task, time it, and average across a pool of subjects. In contrast, readability per se is more subjective - the author of some code will most likely consider their code perfectly readable, particularly immediately after writing said code, even if an average programmer would not. Of course, in a week or a few years, depending on the author's memory, any domain-specific knowledge will fade away and the author will struggle with their code just as much as any average programmer, but waiting ages just to convince someone of their code's innate (un)readability is not feasible.
 
 Most articles that discuss readability go on to describe "readable code", defined by various properties:
 
@@ -342,7 +360,7 @@ These definitions are somewhat subjective and unreliable. What makes a name mean
 
 Cycle time also accounts for the aphorism "Perfect is the enemy of good". One could spend hours optimizing for readability by fixing spelling mistakes and other nits and not get anything useful done. In the time it takes to write a long descriptive comment or poll coworkers for a meaningful variable name, one could have skipped writing comments, used 1-letter names, run and debugged the code, and moved on to a new task. Perfect readability is not the goal - the code just has to be understandable enough that any further readability improvements would take more cycle time than they will save in the future. And with hyperbolic discounting, reducing future maintenance effort is generally not as important as shipping working code now. This calculation does flip though when considering the programming language syntax and standard library, where small readability improvements can save time for millions of programmers (assuming the language becomes popular, so there is again a discounting factor).
 
-Not included in cycle time (or readability) is the time to initially write a program. Maintainance cost is much more important in the long run than the initial investment for most programs.
+Not included in cycle time (or readability) is the time to initially write a program. Maintainance cost is much more important in the long run than the initial investment for most programs. This is borne out when Steelman 1C lists readability under maintenance.
 
 Terseness
 ---------
@@ -410,6 +428,8 @@ There are not really any best practices to encourage adoption but :cite:`meyerov
 Performance
 -----------
 
+Steelman 1D: "The language design should aid the production of efficient object programs." Is this really a goal? How efficient do we need to be?
+
 2-10x speedups
 ~~~~~~~~~~~~~~
 
@@ -428,11 +448,17 @@ So, yes, performance matters. If you can't write fast code in the language, the
 Predicting performance
 ~~~~~~~~~~~~~~~~~~~~~~
 
-Predicting program performance (without running the code) is hard. For example, consider simple binary questions like whether a program is CPU-bound or I/O-bound, or which of two programs will execute faster. Certainly there are obvious cases, but when tested with a handpicked selection of tricky examples, even an experienced programmer's answers will be more like random guesses than any sort of knowledge. When the task is more realistic, like a huge, complex, confusing program written in an unfamiliar language, the situation is just as bad. Per :cite:`knuthStructuredProgrammingGo1974`, "the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail." The programmer feels confident - "I think these will be the hotspots, so I'm going to design the architecture around them, pick out some algorithms with good time complexity, and write them in low-level assembly style." But per `C2 <http://wiki.c2.com/?ProfileBeforeOptimizing>`__, "this almost never works". Per ChatGPT "the task is inherently uncertain".
+Predicting program performance (without running the code) is hard. For example, consider simple binary questions like whether a program is CPU-bound or I/O-bound, or which of two programs will execute faster. Certainly there are obvious cases, but when tested with a handpicked selection of tricky examples, even an experienced programmer's answers will be more like random guesses than any sort of knowledge. When the task is more realistic, like a huge, complex, confusing program written in an unfamiliar language, the situation is worse. Per :cite:`knuthStructuredProgrammingGo1974`, "the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail." The programmer feels confident - "I think these will be the hotspots, so I'm going to design the architecture around them, pick out some algorithms with good time complexity, and write them in low-level assembly style." But per `C2 <http://wiki.c2.com/?ProfileBeforeOptimizing>`__, "this almost never works". Per ChatGPT "the task is inherently uncertain". So Steelman 1D "Constructs that have unexpectedly expensive implementations should be easily recognizable" is simply unimplementable.
 
 I would say performance is difficult to predict for several reasons. First, hardware has simply become too complex for a programmer to reason about in their head - there is register renaming, multiple levels of caching, out-of-order execution, and instruction-level parallelism. Even the most accurate timing model at present, uiCA :cite:`abelUiCAAccurateThroughput2021`, still has errors of around 1% compared to real measurements. If we use less accurate models, like LLVM's, then the errors are much higher, 10% or more. Certainly one can argue about what level of error is acceptable for what purpose but the fact remains that the errors are not 0% and the performance of even low-level assembly code is simply not predictable. Long gone are the days of `Mel <https://users.cs.utah.edu/~elb/folklore/mel.html>`__ where programmers order their instructions by hand and take each cycle into account.
 
-Another reason is that in the translation from source code to assembly, there is a lot of room for optimization to affect performance. For example, there is this post, `C is not a low-level language <https://queue.acm.org/detail.cfm?id=3212479>`__. It argues that the C abstract machine does not map in any understandable way to modern hardware abstractions. Spectre and Meltdown mitigations impose significant performance penalties not visible in the source code. Simple translation does not provide fast code. Optimizers must fight the C memory model, sequential execution model, and layout guarantees, using millions of lines of code to avoid performing "obvious" optimizations that are actually unsound when closely inspecting the C semantics. On the positive side, compilers have gotten quite good at working optimization magic, and these optimizations can transform algorithms beyond recognition and even improve asymptotic complexity. For example there are some papers on how partial evaluation of naive string matching algorithms can lead to optimal matching algorithms such as Knuth-Morris-Pratt and Boyer-Moore. Optimization can cause amazing and unexpected speedups when an optimization works, and equally unexpected and disappointing performance when an optimization fails to work. If even C's performance is unpredictable, then what programming language is?
+Another reason is that in the translation from source code to assembly, there is a lot of room for optimization to affect performance. For example, there is this post, `C is not a low-level language <https://queue.acm.org/detail.cfm?id=3212479>`__. It argues that the C abstract machine does not map in any understandable way to modern hardware abstractions. Spectre and Meltdown mitigations impose significant performance penalties not visible in the source code. Simple translation does not provide fast code. Optimizers must fight the C memory model, sequential execution model, and layout guarantees, using millions of lines of code to avoid performing "obvious" optimizations that are actually unsound when closely inspecting the C semantics. So C's performance is unpredictable and it does not map well to hardware. As C is probably the simplest language in common use today, Steelman 1D "Features should be chosen to have a simple and efficient implementation in many object machines" is simply impossible on account of the "simple" requirement. And as far as "efficient", that is evaluated on a program-by-program basis. A feature is not intrinsically efficient or inefficient; only a use of a feature in a specific program can be evaluated as efficient or inefficient. And the cost of the use's object code must be evaluated against how much it reduces the burden of maintenance of the code: how difficult the program would be to write without the feature, and how difficult it is to improve the performance of the feature by modifying either the program or the compiler.
+
+On the positive side, compilers have gotten quite good at working optimization magic, and these optimizations can transform algorithms beyond recognition and even improve asymptotic complexity. For example there are some papers on how partial evaluation of naive string matching algorithms can lead to optimal matching algorithms such as Knuth-Morris-Pratt and Boyer-Moore. Such optimizations do not follow much pattern, other than that programming with mathematical functions appears more suited to such optimizations. So per Steelman 1D "Features should be chosen [...] to maximize the number of safe optimizations available to translators", we could also choose functional logic programming as the base paradigm based on performance considerations.
+
+Optimization can cause amazing and unexpected speedups when an optimization works, and equally unexpected and disappointing performance when an optimization fails to work.
+
+unused and constant portions of programs will not add to execution costs. Execution time support packages of the language shall not be included in object code unless they are called.
 
 A third reason is that measuring programming language performance is subjective and often based more on marketing than any hard evidence. Rust has claimed to be "blazing fast" since `2014 <https://github.com/rust-lang/prev.rust-lang.org/commit/863e4176f92483853338f6237dafdf1a127a91ce>`__. But this claim is not backed up by an official benchmark suite or anything of the sort. In fact, in `an explicit test of this claim on Google <https://youtu.be/ou8kQ4rIGqQ?t=1948>`__, C was faster. The programming language benchmarks game is often criticized because it compares implementation using SIMD to those without, but it too has often shown that C is faster. Even if the benchmark suite was bulletproof, there will be some who point to expensive but expressive features of the language and say that it is slow.
 
@@ -547,21 +573,23 @@ So in practice, languages come with libraries, and in fact a language is chosen
 
 Similarly we can look at C++'s multiple standard library forks for an example of what not to do. These forks exist for various reasons, such as exception handling/allocators (EASTL), backwards compatibility (Abseil), and missing functionality (Boost/folly), and people also still often handroll their own data structures.
 
+Steelman 1C says "The language should encourage user documentation of programs." I think the main way to do this is comments, specifically documentation comments that are processed by a documentation generator. Literate programming is interesting but in the case of TeX actually made it harder to use TeX as a library; it is only with great effort that LuaTeX has de-webbed TeX and turned it back into C. When you have interleaved comments and documentation it is really just a matter of style whether it is code embedded in text or text embedded in code, and the consensus across languages is that text embedded in code is easier to understand.
+
 Defect rate
 ~~~~~~~~~~~
 
-Jones says that the rate of defects per function point varies by programming language. In :cite:`jonesEstimatingSoftwareCosts2007` Table 17.8, we see that Smalltalk has the lowest number of errors per function point (0.14), most languages have under 2, C/C++ are pretty high, and assembly has the highest (5.01-9.28). Programming language syntax, notation, and tools can be designed to minimize the natural human tendency to make errors. Jones suggests regular syntax, minimal use of complex symbols, and garbage collection, among other facilities. Jones notes that individual factors cause 3x variations and so will matter more in practice except for particularly bug-prone languages.
+Jones says that the rate of defects per function point varies by programming language. In :cite:`jonesEstimatingSoftwareCosts2007` Table 17.8, we see that Smalltalk has the lowest number of errors per function point (0.14), most languages have under 2, C/C++ are pretty high, and assembly has the highest (5.01-9.28). Programming language syntax, notation, and tools can be designed to minimize the natural human tendency to make errors. Jones suggests regular syntax, minimal use of complex symbols, and garbage collection, among other facilities. Jones notes that individual factors cause 3x variations and so will matter more in practice except for particularly bug-prone languages. Nonetheless, it does make sense as in Steelman 1B to "design the language to avoid error-prone features".
 
 A 2014 study analyzed the rate of bugfix keywords in Github commits by language. They found an additional 100-150 expected bug-fixing commits per 700 commits (20%) on average when moving from Clojure to C++ (best to worst). This magnitude of effect held up after two re-analyses, :cite:`bergerImpactProgrammingLanguages2019` and :cite:`furiaApplyingBayesianAnalysis2021`. But there was a lot of noise, something like a 2x average variation between projects. The rankings of languages by bug rate were not consistent between the analysis and re-analyses. Generally, functional programming was less error-prone, Javascript was a little above the middle, and C/C++/Objective-C were at the top (in that order). Notably this disagrees with Jones's table, where C is the worst and Objective-C has the lowest defect rate of the C family. The re-analyses mention that the regexes used to count bugfix commits were quite suspect; probably they also captured variations in commit message styles between languages such as NSError for Objective-C.
 
-:cite:`zeiglerComparingDevelopmentCosts1995` lists $10.52 per line of C code and $6.62 per line of Ada. It notes that proportionally more of the Ada code was "specification" or "header" lines, which are not redundant but are somewhat boilerplate in order to make entities visible. These specification files also encourage more comments and whitespace (not counted in SLOC). Ada also takes more lines to implement a feature than C. The paper attributes the cheaper development cost per line of Ada to its specifications. Looking at the defect rates of .676 (C) and .096 (Ada) per SLOC, it seems this amount of SLOC weighting is not sufficient to explain the discrepancy and it could also be that C is simply more error-prone. Zeigler states they encountered many issues with C despite following as many development precautions as they could: "Customers were far more unhappy with features implemented in C than they were in Ada."
+:cite:`zeiglerComparingDevelopmentCosts1995` lists $10.52 per line of C code and $6.62 per line of Ada. It notes that proportionally more of the Ada code was "specification" or "header" lines, which are not redundant but are somewhat boilerplate in order to make entities visible. Perhaps as a result, Ada takes more lines to implement a feature than C; counting these specification lines as half lines would make the line counts and cost per line closer. These specification files also encourage more comments and whitespace (not counted in SLOC). The paper attributes the cheaper development cost per line of Ada to its specification, but looking at the defect rates of 0.676 (C) and 0.096 (Ada) per SLOC, it seems SLOC weighting is not sufficient to explain the discrepancy and it really is that C is simply more error-prone. Zeigler states they encountered many issues with C despite following as many development precautions as they could: "Customers were far more unhappy with features implemented in C than they were in Ada."
 
-Jones is not a fan of cost per defect. In :cite:`jonesSoftwareEconomicsFunction2017` Jones mentions time-motion studies and says that actual defect repair costs are flat and do not change very much, roughly 6 hours. The reason there are "urban legends" that post-release defects cost 100 times as much as early defects is that defects become scarcer while test cases and other fixed costs of defect finding remain. So for example, let's say you're in the initial stages of a project, writing unit tests. Finding a bug is random chance, fixing it costs $380, and fixed costs are $2000. Let's say you find 50 bugs, then you spend 50*$380+$2000 and the cost per defect is $420. Then in the late stages of a project, finding a bug is still random chance, fixing it costs $380, and fixed costs are $2000, but you've found most of the bugs already, so you only find one bug so the cost per defect is $2380. If you didn't discover any defects, then the cost per defect is infinite. The definition of defect varies and also the reporting period for defects after release, so this can really happen. Jones recommends accounting using cost of defects per function point instead, and for releases the proportion of bugs found before/after a release. All hours spent testing and debugging a function point can be totalled up; it is generally a significant fraction of the overall effort spent on that function point.
+Jones is not a fan of cost per defect. In :cite:`jonesSoftwareEconomicsFunction2017` Jones mentions time-motion studies and says that actual defect repair costs are flat and do not change very much, mode roughly 6 hours. The reason there are "urban legends" that post-release defects cost 100 times as much as early defects is that defects become scarcer while test cases and other fixed costs of defect finding remain. So for example, let's say you're in the initial stages of a project, writing unit tests. Finding a bug is random chance, fixing it costs $380, and fixed costs are $2000. Let's say you find 50 bugs, then you spend 50*$380+$2000 and the cost per defect is $420. Then in the late stages of a project, finding a bug is still random chance, fixing it costs $380, and fixed costs are $2000, but you've found most of the bugs already, so you only find one bug so the cost per defect is $2380. If you didn't discover any defects, then the cost per defect is infinite. The definition of defect varies and also the reporting period for defects after release, so this can really happen. Jones recommends accounting using cost of defects per function point instead, and for releases the proportion of bugs found before/after a release. All hours spent testing and debugging a function point can be totalled up; it is generally a significant fraction of the overall effort spent on that function point.
 
 Maintenance
 ~~~~~~~~~~~
 
-Typically a new language only has a compiler available. However, a more concise language and a full suite of programming tools for a language makes maintaining a language easier. Per Jones, with assembly, a maintainer can only maintain about 250 function points. With Java, a maintainer can keep up with updates for 750 function points. With best-in-class tools for automated complexity analysis, code restructuring, re-engineering, reverse-engineering, configuration control, and defect tracking, a maintainer may oversee 3500 function points without difficulty.
+Typically a new language only has a compiler available. However, a more concise language and a full suite of programming tools for a language makes maintaining a language easier. Per Jones, with assembly, a maintainer can only maintain about 250 function points. With Java, a maintainer can keep up with updates for 750 function points. With best-in-class tools for automated complexity analysis, code restructuring, re-engineering, reverse-engineering, configuration control, and defect tracking, a maintainer may oversee 3500 function points without difficulty. Thus, when Steelman 1C says "the language should promote ease of program maintenance", this can be interpreted as that the language should have a low burden of maintenance per function point, and the way to lower this burden is to provide more tooling and also to lower the lines of code per function point (increase functionality / expressiveness).
 
 Fixed costs
 ~~~~~~~~~~~
@@ -587,3 +615,20 @@ Best practices
 * Design to do the hard things every day. Take all the common daily tasks considered to be painful and hard, and figure out a design that will allow each task to be accomplished efficiently in a few seconds of actual work. It is unacceptable to require detailed pre-planning or coordination for each task. The biggest overheads should be testing the result and writing documentation. (Linus on `why Git works so well <https://web.archive.org/web/20170920135205/https://www.linuxfoundation.org/blog/10-years-of-git-an-interview-with-git-creator-linus-torvalds/>`__)
 
 * What we need, then, are more interfaces that are durable, non-leaky, and beautiful (or, at least, extremely workable). A durable interface lasts - [it] will run on many years’ worth of systems. A non-leaky interface reveals little or nothing about the interface’s implementation. Beautiful interfaces do not expose too much or too little functionality, nor are they more complex than they need to be. A lot of work and a lot of iteration is required to create a beautiful interface — so this conflicts somewhat with durability. The worthwhile research problem, then, is to create interfaces having these properties in order that the code living in between interface layers can be developed, understood, debugged, and maintained in a truly modular fashion. To some extent this is a pipe dream since the “non-leaky” requirement requires both correctness and performance opacity: both extremely hard problems. Another problem with this idea — from the research point of view — is that a grant proposal “to create durable, non-leaky, beautiful interfaces” is completely untenable, nor is it even clear that most of this kind of work belongs in academia. On the other hand, it seems clear that we don’t want to just admit defeat either. If we disregard the people who are purely chasing performance and those who are purely chasing correctness, a substantial sub-theme in computer systems research can be found where people are chasing beautiful interfaces. `John Regehr <https://blog.regehr.org/archives/666>`__
+
+* Most people assume that maintenance begins when an application is released, that maintenance means fixing bugs and enhancing features. We think these people are wrong. Programmers are constantly in maintenance mode. Our understanding changes day by day. New requirements arrive as we're designing or coding. Perhaps the environment changes. Whatever the reason, maintenance is not a discrete activity, but a routine part of the entire development process
+
+  When we perform maintenance, we have to find and change the representation of things-those capsules of knowledge embedded in the application. The problem is that it's easy to duplicate knowledge in the specifications, processes, and programs that we develop, and when we do so, we invite a maintenance nightmare - one that starts well before the application ships.
+
+  We feel that the only way to develop software reliably, and to make our developments easier to understand and maintain, is to follow what we call the DRY principle: "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system."
+
+  Why do we call is DRY? It's an acronym - Don't Repeat Yourself.
+
+  The alternative is to have the same thing expressed in two or more places. If you change one, you have to remember to change the others, or, like the alien computers, your program will be brought to its knees by a contradiction. It isn't a question of whether you'll remember: it's a question of when you'll forget. (The Pragmatic Programmer 1st edition)
+
+  Now other people have added clarifications:
+
+  * There is a minimum size at which the language's mechanisms for sharing operate, and below this level it is not appropriate to try to avoid repetition because such attempts will actually make the code longer or more confusing. For example, "In the heart of the forest, the forest seemed to whisper secrets only the forest knew." - trying to shorten this sentence by using "it" instead of "the forest" will only make it more confusing as there is also the noun "heart" which "it" could refer to. Generally, saving a dozen lines of code is worthwhile, saving 1-2 is not. Sometimes, copy-and-pasting that 12-line snippet is exactly the way to proceed, because you will modify it beyond recognition. Refactoring a 12-line function into two 8-line functions or four 5-line functions is not an improvement; neither is factoring two 6-line functions into 3 5-line functions. (`Tamir <https://rotemtam.com/2020/05/18/the-dry-principle-is-bad-advice/>`__) There are clone detection tools which have logic and filters for which duplications are significant vs. not.
+
+  * Sometimes duplication is inherent. For example, a formula in the business logic may have repeated subformulas. Two test cases for the application domain may have identical validation logic. A database may not be in 3NF and have repeated columns for performance reasons. A package or library may be deliberately forked to iterate fast, adapt to new requirements, and avoid coupling or version constraints. Code may be written to follow the documentation and copy most of its structure and semantics. These duplications are "natural" and avoiding them is often unworkable or infeasible. Generally, it is best to document these duplications, and the history/reasoning behind them, so that the developer can get an idea of which version is authoritative for which purpose. Now, in practice, with distributed DVCS development, no version is completely authoritative. But, we can divide them into "active" forks, "temporary" forks, and "abandoned" forks. Generally there are only 1-2 active forks, and these are what matter. But this is an empirical observation rather than a best practice - it may happen that there really are 10 active forks or libraries that do substantially the same thing. (`Nordmann <https://medium.com/israeli-tech-radar/the-case-against-dry-10d1202f8bb8>`__) For code contracts, the versions can be maintained in sync by duplicating the contract assertions at both sides.
+
diff --git a/docs/Commentary/Learning.rst b/docs/Commentary/Meta/Learning.rst
similarity index 100%
rename from docs/Commentary/Learning.rst
rename to docs/Commentary/Meta/Learning.rst
diff --git a/docs/Commentary/Meta.rst b/docs/Commentary/Meta/Meta.rst
similarity index 100%
rename from docs/Commentary/Meta.rst
rename to docs/Commentary/Meta/Meta.rst
diff --git a/docs/Commentary/OtherPL.rst b/docs/Commentary/Meta/OtherPL.rst
similarity index 95%
rename from docs/Commentary/OtherPL.rst
rename to docs/Commentary/Meta/OtherPL.rst
index 2f0bde6..0e10243 100644
--- a/docs/Commentary/OtherPL.rst
+++ b/docs/Commentary/Meta/OtherPL.rst
@@ -157,9 +157,25 @@ PyPL index (top 28)
 17. Ada
 
 * Still in use in aviation and DoD applications
-* Considered somewhat legacy, but has many useful features
+* Considered somewhat legacy, but has many useful features ("C++98 with a design review")
 * SPARK language is a dialect which extends contract support
-* interesting design process (`Wikipedia <https://en.wikipedia.org/wiki/Ada_(programming_language)#History>`__): committee gathered requirements (resulting in Steelman report). 4 contractors put forward proposals - after two rounds, one was selected. The reference manual was written, and comments and corrections were received. The specification was then frozen and implementations were designed and validated.
+* interesting design process (`Wikipedia <https://en.wikipedia.org/wiki/Ada_(programming_language)#History>`__):  committee gathered requirements and revised them several times (resulting in the Steelman report). 4 contractors put forward proposals - after two rounds, one was selected. The reference manual was written, and comments and corrections were received. The specification was then frozen and implementations were designed and validated. So far I have gone through
+
+  * `Steelman report <https://www.adahome.com/History/Steelman/steelman.htm>`__
+  * `Tinman report <http://iment.com/maida/computer/requirements/tinman.htm>`__
+  * GREEN rationale
+  * :cite:`fisherCommonProgrammingLanguage1976`
+
+  Eventually I would also like to go through:
+
+  * Full set of earlier requirements such as Strawman, Woodenman, Ironman, and Revised Ironman, linked `here <https://dwheeler.com/steelman/>`__ under "History"
+  * Stoneman support requirements
+  * RED rationale
+  * 83, 95, 05, 2012 rationales
+  * 83/95 style guides
+  * Dijkstra's comments on the requirements and designs
+  * Annotated 2012 reference manual
+
 * paradigm: imperative
 
 18. Dart
diff --git a/docs/Commentary/Meta/index.rst b/docs/Commentary/Meta/index.rst
new file mode 100644
index 0000000..96ded3b
--- /dev/null
+++ b/docs/Commentary/Meta/index.rst
@@ -0,0 +1,10 @@
+Meta
+####
+
+This section is focused on non-technical details of the language - goals, project structure, and so on.
+
+.. toctree::
+  :maxdepth: 2
+  :glob:
+
+  *
\ No newline at end of file
diff --git a/docs/Commentary/Resource-Management.rst b/docs/Commentary/Resource-Management.rst
deleted file mode 100644
index 874dcef..0000000
--- a/docs/Commentary/Resource-Management.rst
+++ /dev/null
@@ -1,293 +0,0 @@
-Resource management
-###################
-
-Resources are things that can be acquired and released, and are available in limited quantities due to OS or physical limitations. Examples include memory allocations, file handles, internet sockets, mutexes/locks, process table entries, and process identifiers (PIDs). A resource leak happens if the program does not promptly release a resource it has acquired after the program is finished with the resource. A resource management technique prevents resource leaks by releasing resources promptly.
-
-It is possible to manage resources by hand, but it is quite error-prone and tedious. Automatic resource management aims to provide a concise and safe replacement.
-
-Manual approach
-===============
-
-The manual approach is to release sequentially after acquiring, for example::
-
-  f1 = open "file.txt"
-  f2 = open "file2.txt"
-  do_something f1 f2
-  close f1
-  do_something_2 f2
-  close f2
-
-There are several issues:
-
-* Early exit leak: There is a resource leak if do_something contains a return, exception, goto, break, or continue out of the do_something block. `Raymond Chen <https://web.archive.org/web/20201017213150/https://devblogs.microsoft.com/oldnewthing/?p=36783>`__ complains about C++ macros hiding return and goto statements in the function body.
-* Failed acquire: if open fails, do_something and close will be called on an invalid value. But with automatic exception propagation this will be fine since ``close InvalidHandle = InvalidHandle``.
-* Adjacency: If do_something is long, the release is far from the acquisition, so manual inspection cannot easily identify if the corresponding release function is actually called.
-* Encapsulation: the release requirement exposes a detail of the value, namely that it is a resource. This means it cannot be stored in a data structure or returned from a function without special care to call release later.
-* Interleaving: Not a problem here, but many solutions impose the requirement that f2 is freed before f1, a LIFO nesting.
-
-try-finally
-===========
-
-The try-finally modifies the basic approach by marking the do_something and release blocks explicitly. There are two patterns in Java 6, `standard pattern <http://www.javapractices.com/topic/TopicAction.do?Id=25>`__ and open-in-try (`here <https://stackoverflow.com/questions/2699209/java-io-ugly-try-finally-block>`__)::
-
-  f = open "file.txt"
-  try:
-      do_something f
-  finally:
-      close f
-
-  f = var InvalidHandle
-  try:
-      f := open "file.txt"
-      do_something f
-  finally:
-    if(f != InvalidHandle)
-      close f
-
-The unwinding protection solves the early exit leak. The failed acquire again doesn't really matter. This pattern does not solve adjacency, encapsulation, or interleaving, meaning that it is still awkward and verbose. For adjacency, throws act as go-downs and you have to scan down almost the whole function to the finally block to see what cleanups are waiting and if an allocation has a matching clean-up. Also there is an indentation pyramid with nested try-finally.
-
-
-try-with-resources
-==================
-
-The try-with-resources solves adjacency by making ``close`` a Java built-in interface, but it still can't do encapsulation or interleaving::
-
-  try(f = open "file.txt")
-      do_something f
-
-In particular `chained resources <https://stackoverflow.com/questions/12552863/correct-idiom-for-managing-multiple-chained-resources-in-try-with-resources-bloc>`__ are awkward.
-
-C# has a similar ``using (f = newThing) { ... }`` syntax, and Python has ``with``.
-
-Bracket
-=======
-
-Haskell has a ``bracket acquire release action`` combinator which functions similarly to try-with-resources. It similarly doesn't support interleaving or encapsulation.
-
-The `D guy <https://vimeo.com/97329153>`__ claims you need an even larger combinator than bracket, which goes as follows:
-
-::
-
-  if acquire then
-    when !next
-      rollback
-    release
-
-With masking this looks like:
-
-::
-
-  generalBracket acquire next rollback release =
-    mask $ \unmasked -> do
-      resource <- acquire
-      b <- unmasked (next resource) `catch` \e -> do
-        _ <- rollback resource
-        _ <- release resource
-        throw e
-      c <- release resource
-      return (b, c)
-
-acquire has to be masked because there could be an async exception between the action and running next.
-
-Scope guard
-===========
-
-This approach involves a "deferred release", which is called when the scope is exited. For example::
-
-  f = open "file.txt"
-  defer (close f)
-  do_something f
-
-It is available in C++ as the ScopeGuard class, in Go and Zig as the defer statement, and in D via the `scope <http://ddili.org/ders/d.en/scope.html>`__ keyword, where there are ``scope(exit)``, ``scope(success)``, and ``scope(failure)`` (referring to exiting the scope unconditionally as with ``defer``, without exception, or by exception respectively).
-
-It does solve early exit.
-
-The pattern nests in the sense that ``defer``-ed statements are run latest-defined to earliest-defined on exit, but again doesn't allow interleaving.
-
-It solves adjacency by putting the cleanup before the code that uses the resource, right after the code that's being cleaned up after. It's easy to verify that allocations match up with their nearby cleanups, but the clean-up is delayed to the scope's exit. The syntax is very compact, just specify a cleanup function, but is a statement rather than an expression, meaning that the acquire must also be a statement.
-
-``defer`` and ``scope`` do not allow returning a resource from a function and skipping the cleanup. They also introduce memory allocation questions since the deferred expressions can capture local variables. So no encapsulation.
-
-Goto cleanup
-============
-
-C has this pattern::
-
-  f = open "file"
-  if (f == InvalidHandle)
-    return
-  if (isException (do_something f))
-    goto cleanup
-  if (isException (do_something_2 f))
-    goto cleanup
-  cleanup:
-    close f
-
-This suffers from early return but addresses adjacency in that the cleanup label is present and can handle interleaving with conditionals. It is not encapsulated though and it is easy to mess up handling a failed acquire.
-
-Garbage collection
-==================
-
-GC almost solves the problem by making closing automatic. It looks like::
-
-  f = open "file.txt"
-  do_something f
-
-It solves all the issues of early exit, adjacency (implicit in the open), and encapsulation (the GC is based on global program flow). Unfortunately GC does not guarantee it will promptly call the finalizer, so in practice the semantics are not usable.
-
-RAII
-====
-
-C++ uses RAII, which looks like GC, but uses deterministic memory management, such as stack allocation or reference counting. It ensures you cannot insert any failure points between the allocation and the start of the cleanup's scope. A resource can even be allocated with its corresponding cleanup in the middle of an expression. But defining a new class for every cleanup operation is tedious - fortunately in C++0x it is possible to define one "RAII lambda" class and be done. C++'s semantics define lambda memory handling so there is no allocation issue.
-
-With C++17 it is apparently possible to copy/move RAII types and store them in a data structure or return them from a function, but it seems fragile. In case of a complicated object graph, such as multiple objects sharing a resource, RAII falls down because the available strategies for deterministic memory management are insufficient.
-
-Finalizers
-==========
-
-Stroscot's finalizers are inspired by RAII and GC but free resources "promptly" as opposed to the "eventually" of GC or the stack or reference counting discipline of RAII. Similarly to ``defer`` the cleanup is placed near the acquire but similarly to ``finally`` there can be a marker for the end of the scope. The pattern is ``newFinalizer f; ... ; use f; ...; use f``. The finalizer will run after the last ``use`` if code flows normally and soon after raising an exception otherwise. Finalizers allow nesting, in the natural way ``newFinalizer a; newFinalizer b; ...; use b; use a``, and interleaving, by reordering to ``use a; use b``. But the execution order on an exception is latest-defined-first-run. Finalizers also support encapsulation such as returning the allocated resource. It is also possible to embed the finalizer in a resource handle and use RAII style programming - each operation calls ``use`` and extends the finalizer's life.
-
-Finalizers directly implement D's ``scope(exit)``, and with an extra flag variable they can implement ``scope(success)`` and ``scope(failure)``:
-
-::
-
-  scope_failure rollback =
-    flag = mut true
-    f = newFinalizer (if flag then { rollback } else {})
-    ScopeFailure flag f
-
-  use (ScopeFailure flag f) =
-     flag := false
-     use f
-
-
-  s <- scope_failure rollback
-  code
-  use s
-
-`Herb Sutter <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4152.pdf>`__ claims that the extra ``use`` is "tedious and fragile" and forces the programmer to think about the placement of failure determination. One can define a proper ``scope_failure rollback { ... }`` block structure, and even redefine return/continue/break to not run the rollback. But personally I think the finalizers are more powerful because they allow interleaving scopes. The combinators only allow nesting which isn't as expressive. In Haskell land Snoyman created `ResourceT <https://hackage.haskell.org/package/resourcet>`__ even though there was ``bracket``, exactly because of this.
-
-Exceptions in finalizers
-========================
-
-What to do when throwing an exception from a finalizer? Say we have
-
-::
-
-  x =
-    f = newFinalizer (throw Bar)
-    throw Foo
-    use f
-
-  y = x catch \case
-          Foo -> print "foo"
-          Bar -> print "bar"
-
-Per Stroscot's semantics, the finalizer runs as soon as it is known that ``use`` will not be called - i.e. immediately after constructing the finalizer, before the ``throw Foo`` (because ``use f`` is unreachable). So ``x`` is equivalent to ``throw Bar; throw Foo``. Then per normal exception semantics ``throw Bar`` wins since it is first and ``y`` outputs ``bar``.
-
-If we had an I/O operation ``writeOrThrow`` instead of ``throw Foo``, then ``use f`` is reachable. So then we have two cases:
-
-* if ``writeOrThrow`` throws, then it is known that ``use`` will not be called. The finalizer will be delayed until just before the first I/O operation executed after the ``writeOrThrow``, in this case ``print "foo"``. So ``y`` will exit with a ``Bar`` error without printing anything.
-* if ``writeOrThrow`` doesn't throw, then the finalizer will execute after the last ``use``. Again ``y`` will exit with a ``Bar`` error without printing anything.
-
-As far as I can tell this is a logical choice. C++ instead decided to terminate the program on the ``throw Bar`` in the destructor. The justification seems to be that it slightly simplified the implementation of unwinding, and that the C++ STL wanted to "arbitrarily require that [destructors] may not throw." (`Herb Sutter <https://ptgmedia.pearsoncmg.com/imprint_downloads/informit/aw/meyerscddemo/DEMO/MAGAZINE/SUTTER.HTM>`__, also Item 16 "Destructors That Throw and Why They're Evil" of his 1999 book Exceptional C++) `John Kalb and David Abraham <http://web.archive.org/web/20130728131646/cpp-next.com/archive/2012/08/evil-or-just-misunderstood/>`__ say "The reason we can’t have throwing destructors is that nobody worked out how to deal with multiple exceptions wanting to propagate through the same set of stack frames. [...] we think termination is a bit draconian. Frankly, we don’t think it’s so hard to nail down the final details of how this should work."
-
-Kalb proposes to "drop the second [destructor] exception on the floor and propagate the original one", but this is a bad idea. Since (per Sutter's convention) destructors generally don't fail, an exception during a destructor is going to be fairly serious, such as an OOM. Ignoring this exception in favor of a trivial logic exception is the wrong approach.
-
-Exception safety
-================
-
-Exception safe code is code that works correctly even when exceptions are thrown. The basic issue is in Rust/C++ almost everything is a resource. In Stroscot almost everything is a value, inert data - copy/clone is built into the language and can't fail. Taking away explicit memory management makes it much easier to ensure exception safety. Reasoning about invariants with pure values is straightforward, and fail-fast coding styles mean that the program doesn't live long. And when writing cleanups the programmer is already thinking about exception safety and restoring invariants, so will write an exception-safe cleanup.
-
-Still, what about exception safety for a mutable data structure? C++ has `levels of safety <https://en.wikipedia.org/wiki/Exception_safety>`__ for stateful functions based on what invariants are preserved.
-
-* no-throw means forward progress is guaranteed. It's nice when you can give this, but most functions can fail due to insufficient memory.
-* Strong safety means that state transitions happen atomically and a failure will return to the old state. To ensure this one needs basic safety and to copy the relevant data beforehand and write it back afterwards.  This is infeasible for even simple data structures in C++ due to overloaded assignment and copy operators being able to throw.
-* Basic safety means that the final state will be valid, i.e. all invariants hold. You need to safeguard against each function call throwing. This requires adding handling code to each call and trusting the documentation for the list of thrown exceptions (or using no-throw).
-
-These levels only work for stateful data structures that call a small and easily auditable set of other functions.
-
-In Stroscot there are two steps to making code exception-safe:
-
-* identify invariants. These can be written as assertions using the pure read operations on stores. With this the static verification will identify the function and the exceptional control flow that breaks the invariant.
-* place exception cleanup handlers to restore broken invariants
-
- This code in Rust or C++ is not exception safe: (based on `this code <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1995/N0623.asc>`__ and `this code <https://github.com/rust-lang/rfcs/blob/master/text/1236-stabilize-catch-panic.md#background-what-is-exception-safety-in-rust>`__)
-
-::
-
-  push_ten_more : (v : Vec T) -> T -> Op { v : Vec (T|uninitialized) }
-  push_ten_more (this@(readRef -> Vector arr)) t =
-    new_arr = alloc (length arr + 10)
-    for (i in indexes arr)
-      copy arr[i] to new_arr[i]
-      delete arr[i]
-    this := Vector new_arr
-
-    for i in 0..10 {
-      (ptr v) offset (len + i) := t.clone()
-    }
-  }
-
-The update to the Vector happens when the next 10 elements are uninitialized, and ``Vec`` has an internal invariant that its elements are safe to deallocate. So if `t.clone` throws then the initialization will not be called. Vec's destructor that assumes the invariant will then free uninitialized memory.
-
-In this code the Rust/C++ semantics require calling a destructor ``delete`` on each element of a ``vec``, and copying values with ``copy_to_`` and ``clone`` operations that can fail -  Similarly we wouldn't necessarily call any finalizers (``delete``) - the finalizer is called after the last use, and likely there are other copies and this is not the last use. Even if the Stroscot code was written to call an operation ``clone`` that could throw exceptions, the rest of the elements will be deallocated if needed, but otherwise not. In all cases memory is safe due to the finalizer semantics.
-
-Another issue is the uninitialized array. This means the array may be filled with ``uninitialized`` values (exceptions). The result type reflects this possibility. With careful rewriting, the code can provide the strong guarantee that the resulting vector only contains values of type T. This can be done by extending the array one element at a time or by saving the exception(s) thrown in a separate list and rethrowing at the end as an exception group.
-
-A simple example is ``TwoList``, which maintains two mutable linked lists with the invariant that they are the same. What does adding an element look like?
-
-::
-
-  add a (List l) =
-    head = read l
-    l := Cons a head
-
-  add a (TwoLists l1 l2) =
-    add a l1
-    add a l2
-
-But this is not exception safe for the invariant, because an async exception between the two adds may add a to l1 but not l2.
-
-If add is no-throw we can fix this just by adding uninterruptibleMask. But add allocates so can throw out of memory. But remove is no-throw so we can use remove:
-
-::
-
-  add a (TwoLists l1 l2) =
-    mask_ $
-      case try (add a l1) of
-        Err e -> throw e
-        Ok ->
-          case try (allowInterrupt >> add a l2) of
-            Ok -> return
-            Err e -> uninterruptibleMask (remove a l1) >> throw e
-
-Here add should have strong safety, i.e. it restores the state if an exception is thrown during the add.
-
-Poisoning
----------
-
-Rust has "poisoning" for taking locks, which is a safeguard against trying to use data that was corrupted due to an exception, e.g. a concurrent heap that doesn't have the heap property anymore. Poisoned data will tear down all connected threads.
-
-Essentially it uses the scope_failure cleanup to poison the lock on failure:
-
-::
-
-  getGuard mutex =
-    acquireLock mutex
-    flag = mut true
-    f = newFinalizer {
-      if flag then
-        poisonLock mutex
-      releaseLock mutex
-    }
-    Guard flag f
-
-  finishGuard (Guard flag f) =
-    flag := false
-    use f
-
-Once the lock is poisoned then locking the mutex will throw an exception.
-
-However, the invariant checking approach to exception safety discussed above will throw an exception anyway when trying to use a data structure operation on a corrupted data structure. It also covers the single-threaded case where no mutex is used but the recovery from an exception is incomplete. So poisoning isn't really that useful. But a PoisoningMutex can at least be in the standard library for those who want the functionality.
-
diff --git a/docs/Commentary/index.rst b/docs/Commentary/index.rst
index 6e0aac0..999b2ce 100644
--- a/docs/Commentary/index.rst
+++ b/docs/Commentary/index.rst
@@ -1,7 +1,7 @@
 Commentary
 ##########
 
-Every compiler needs an explanation of all the decisions made and possible options considered.
+Every language needs an explanation of all the decisions made and possible options considered.
 
 .. toctree::
   :maxdepth: 2