Skip to content

Latest commit

 

History

History
141 lines (92 loc) · 7.21 KB

design.md

File metadata and controls

141 lines (92 loc) · 7.21 KB

Introduction

A programming language specifically designed such that ASTs generated by markov chain are likely to produce programs with meaningful effects

— mcc (@mcclure111) July 11, 2015
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Xenoglossia is a text manipulation language inspired by languages such as sed, AWK, and jq. Xenoglossia is specifically designed to be:

  1. Fun to read, and
  2. Easy to assemble automatically via markov chains, randomization, &c.

Basics

A Xenoglossia program is comprised of one or more function calls, arranged in a sequence known as a series. A program is given a string as an input, which is passed to the first function in the series. This function's return value is then passed as the input to the next function in the series; this continues in a purely sequential manner until the program terminates, at which time the final function's output is returned as a string.

Syntax

Xenoglossia programs consist of a series of function calls. The chaining between function calls is implicit; there is no special syntax to indicate the piping of a function's output into another function. For example, this sequence is a series which pipes the output of function one into function two, then that function's output into three:

one two three

Function arguments are passed into a function by following a function identifier with one or more string literals; the argument list is terminated when another identifier appears. For example, this sequence is a series which calls function one with the arguments "first" and "second", then passes the output of that function into function two:

one "first" "second" two

Note that there is no way to pass the output of other functions as arguments; arguments may only be literals.

Error-handling

There is no form of in-language error handling. Xenoglossia is designed to be as forgiving as possible; normal error conditions, such as calling functions with too few/many arguments or with the wrong type, should be silently handled by the interpreter without notifying the programmer.

The only form of fatal errors are:

  1. Reference to non-extant functions, and
  2. Invalid syntax.

If either occurs, the interpreter must immediately halt execution and notify the user.

Types

Xenoglossia has two types: strings, and arrays. The input to the first function in a Xenoglossia program is always a string, and the output of a program will also be a string; beyond this, there is no restriction as to which type a function may define as its input or output.

Strings

Strings are collections of Unicode characters. Xenoglossia source code should be encoded using UTF-8, though implementations may use any external encoding they wish.

String literals are defined as a collection of Unicode characters enclosed within single or double quotes, similar to languages such as Python or Ruby. There is no difference between the two types of strings, and programmers are encouraged to use whichever they prefer. Strings do not support any form of escape sequence.

Arrays

There are no array literals. Arrays can only be produced via operations on strings, or by coercing the input to a function.

Arrays are sequential lists of zero or more strings. Array indices are sequential integers beginning with zero. (Note that, despite this reference, there is no "integer" type available to programmers.)

Because there are no variables outside of function scope, it is not defined whether arrays are mutable. Whether or not the array returned by a function is the same object as the array passed into that function is considered an implementation detail.

If the terminating function in a series returns an array, then the array will be coerced into a string before the output of the program is returned.

Other types

Functions may use other types internally if this is convenient to the implementation, as long as no other types are exposed to the programmer. Function input, arguments, and return values can only be strings or arrays.

Coercion

Some functions expect to operate on strings, and others expect to operate on arrays. All functions must be able to accept an input in either type and convert it to the appropriate type before proceeding.

The specific manner of the conversion is function-specific. Most functions use the following coercions:

  • Split a string into an array of individual characters, for example "string" => ["s", "t", "r", "i", "n", "g"]
  • Join an array without any separators, for example ["s", "t", "r", "i", "n", "g"] => "string"

This is not a hard and fast rule, however. If using different coercion rules produces a more interesting result, do so.

The latter array-to-string conversion is used when returning the final output of a program.

Variables

Xenoglossia does not have variables.

Functions

Programs cannot themselves contain function definitions; the only functions which exist are the functions located in the Xenoglossia standard library and in third-party libraries.

Function identifiers consist of a lower or uppercase letter, followed by one or more alphanumeric characters. Xenoglossia programs are case-sensitive.

Arguments

Every function takes an implicit input argument, which is the value threaded through every function in a chain. A given function may also take one or more extra arguments; this is optional.

A function must be able to work if it is passed too few or too many arguments. Functions should use good default arguments in the case that one or more expected arguments is missing. While certain defaults may be "obvious", such as substituting empty strings, functions should be structured such that it is difficult for the programmer to end up with a boring result. For example, a string replacement function should try to avoid returning an empty string unless that is unambigously what the programmer asked for. Functions should prefer returning interesting values over "correct" values if any doubt exists.

Naming

Xenoglossia programs should be surprising and entertaining to read in any possible permutation.

Prefer names that are evocative, easy to read, and obvious in retrospect. Avoid names a writer might guess on the first try. Encourage the joy of discovery, not the joy of least astonishment.

For example, for a function which splits a string into an array, here are a few examples of xenoglossy alternatives to a traditional name like "split":

  • burst
  • explode
  • separate

Note how these names differ from languages like C by being conversational and readable, but also differ from languages like Python by being more or less impossible to guess.

For functions which come in logically opposed pairs, such as split/join functions, consider naming one of the functions in an easily guessable way and the other function in an obscure way. For example, Xenoglossia's function for removing matching elements from an array is named "reject", which has precedent in other languages. Its counterpart, which removes non-matching elements from an array, is named "accept". This is hopefully without precedent.