Skip to content

Commit 136dddd

Browse files
committed
Split tutorial and better part 1
1 parent 1feba29 commit 136dddd

File tree

6 files changed

+665
-568
lines changed

6 files changed

+665
-568
lines changed

src/2-0-tutorial.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Tutorial : computing your taxes
2+
3+
Welcome to this tutorial, whose objective is to guide you through the features
4+
of the Catala language and teach you how to annotate a simple legislative text
5+
using the language, and get out an executable program that compute your taxes!
6+
7+
This tutorial does not cover the installation of Catala. For more information
8+
about this, please refer to the [installation
9+
section](./1-1-getting_started.md). If you want follow this tutorial locally,
10+
read the [section about creating your first program](./1-2-first.md) and simply
11+
copy-paste the code snippets of the tutorial into your Catala program file.
12+
13+
At any point, please refer to [the Catala syntax cheat
14+
sheet](https://catalalang.github.io/catala/syntax.pdf) or the [reference
15+
guide](./5-catala.md) for an exhaustive view of the syntax and features of
16+
Catala; this tutorial is rather designed to ease you into the language and its
17+
common use patterns.

src/2-1-basic-blocks.md

Lines changed: 308 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,308 @@
1+
# Basic blocks of a Catala program
2+
3+
In this section, the tutorial introduces the basic blocks of a Catala program :
4+
the difference between law and code, data structures,
5+
6+
scopes, variables and
7+
formulas. By the end of the section, you should be able to write a simple Catala
8+
program equivalent to a single function with local variables whose definitions
9+
can refer to one another.
10+
11+
## Mixing law and code
12+
13+
Catala is a language designed around the concept of *literate programming*, that
14+
is the mixing between the computer code and its specification in a single
15+
document. Why literate programming? Because it enables a fine-grained
16+
correspondance between the specification and the code. Whenever the
17+
specification is updated, knowing where to update the code is trivial with
18+
literal programming. This is absolutely crucial for enabling long-term
19+
maintenance of complex and high-assurance programs like tax or social benefits
20+
computation.
21+
22+
Hence, a Catala source code file looks like a regular Markdown
23+
document, with the specification written down and styled as Markdown text,
24+
with the Catala code only present in well-bounded Catala code blocks introduced
25+
by a line with `` ```catala `` and ended by a line with `` ``` ``.
26+
27+
Before writing any Catala code, we must introduce the specification of the
28+
code for this tutorial. This specification will be based on a fictional Tax Code
29+
defining a simple income tax. But in general, anything can be used as a
30+
specification for a Catala program: laws, executive orders, court cases
31+
motivations, legal doctrine, internal instructions, technical specifications,
32+
etc. These sources can be mixed to form a complete Catala program that
33+
relies on these multiple sources. Concretely, incorporating a legal source
34+
of specification into the Catala program amounts to copy-pasting the
35+
text and formatting it in Markdown syntax inside the source code file.
36+
37+
Without further ado, let us introduce the first bit of specification for
38+
our fictional income tax, Article 1 of the CTTC (Catala Tutorial Tax Code):
39+
40+
> #### Article 1
41+
>
42+
> The income tax for an individual is defined as a fixed percentage of the
43+
> individual's income over a year.
44+
45+
The spirit of writing code in Catala is to stick to the specification at all
46+
times in order to put the code snippets where they belong. Hence, we will
47+
introduce below the Catala code snippets that translate Article 1, which
48+
should be put just below Article 1 in the Catala source code file.
49+
50+
These code
51+
snippets should describe the program that computes the income tax, and contain
52+
the rule defining it as a multiplication of the income as rate. It is time
53+
to dive into Catala as a programming language.
54+
55+
56+
```catala
57+
# We will soon learn what to write here in order to translate the meaning
58+
# of Article 1 into Catala code.
59+
60+
# To create a block of Catala code in your file, bound it with Markdown-style
61+
# "```catala" and "```" delimiters. You can write comments in Catala code blocks
62+
# by prefixing lines with "#"
63+
```
64+
65+
## Setting up data structures
66+
67+
68+
The content of Article 1 assumes a lot of implicit context: there exists an
69+
individual with an income, as well as an income tax that the individual has
70+
to pay each year. Even if this implicit context is not verbatim in the law,
71+
we have to explicit it in the computer code, in the form of data structures
72+
and function signatures.
73+
74+
Catala is a
75+
[strongly-typed](https://blog.merigoux.ovh/en/2017/07/19/static-or-dynamic.html),
76+
statically compiled language, so all data structures and function signatures
77+
have to be explicitly declared. So, we begin by declaring the type information
78+
for the individual, the taxpayer that will be the subject of the tax
79+
computation. This individual has an income and a number of children, both pieces
80+
of information which will be needed for tax purposes :
81+
82+
```catala
83+
# The name of the structure, "Individual", must start with an
84+
# uppercase letter: this is the CamelCase convention.
85+
declaration structure Individual:
86+
# In this line, "income" is the name of the structure field and
87+
# "money" is the type of what is stored in that field.
88+
# Available types include: "integer", "decimal", "money", "date",
89+
# "duration", and any other structure or enumeration that you declare.
90+
data income content money
91+
# The field names "income" and "number_of_children" start by a lowercase
92+
# letter, they follow the snake_case convention.
93+
data number_of_children content integer
94+
```
95+
96+
This structure contains two data fields, `income` and `number_of_children`.
97+
Structures are useful to group together data that goes together. Usually, you
98+
get one structure per concrete object on which the law applies (like the
99+
individual). It is up to you to decide how to group the data together, but we
100+
advise you to aim at optimizing code readability.
101+
102+
Sometimes, the law gives an enumeration of different situations. These
103+
enumerations are modeled in Catala using an enumeration type, like:
104+
105+
```catala
106+
# The name "TaxCredit" is also written in CamelCase.
107+
declaration enumeration TaxCredit:
108+
# The line below says that "TaxCredit" can be a "NoTaxCredit" situation.
109+
-- NoTaxCredit
110+
# The line below says that alternatively, "TaxCredit" can be a
111+
# "ChildrenTaxCredit" situation. This situation carries a content
112+
# of type integer corresponding to the number of children concerned
113+
# by the tax credit. This means that if you're in the "ChildrenTaxCredit"
114+
# situation, you will also have access to this number of children.
115+
-- ChildrenTaxCredit content integer
116+
```
117+
118+
In computer science terms, such an enumeration is called a "sum type" or simply
119+
an enum. The combination of structures and enumerations allow the Catala
120+
programmer to declare all possible shapes of data, as they are equivalent to
121+
the powerful notion of [algebraic data types](https://en.wikipedia.org/wiki/Algebraic_data_type).
122+
123+
Notice that these data structures that we have declared cannot always be
124+
attached naturally to a particular piece of the specification text. So, where to
125+
put these declarations in your literate programming file? Since you will be
126+
often going back to these data structure declarations during programming, we
127+
advise you to group them together in some sort of prelude in your code source
128+
file. Concretely, this prelude section containing the data structure declaration
129+
will be your one stop shop when trying to understand the data manipulated by the
130+
rules elsewhere in the source code file.
131+
132+
## Scopes as basic computation blocks
133+
134+
We've defined and typed the data that the program will manipulate. Now, we have
135+
to define the logical context in which this data will evolve. Because Catala is
136+
a [functional programming](https://en.wikipedia.org/wiki/Functional_programming)
137+
language, all code exists within a function. And the equivalent to a function in
138+
Catala is called a *scope*. A scope is comprised of :
139+
* a name,
140+
* input variables (similar to function arguments),
141+
* internal variables (similar to local variables),
142+
* output variables (that together form the return type of the function).
143+
144+
For instance, Article 1 declares a scope for computing the income tax:
145+
146+
```catala
147+
declaration scope IncomeTaxComputation:
148+
# Scope names use the CamelCase naming convention, like names of structs
149+
# or enums Scope variables, on the other hand, use the snake_case naming
150+
# convention, like struct fields.
151+
input individual content Individual
152+
# This line declares an input variable of the scope, which is akin to
153+
# a function parameter in computer science term. This is the piece of
154+
# data on which the scope will operate.
155+
internal fixed_percentage content decimal
156+
output income_tax content money
157+
```
158+
159+
The scope is the basic abstraction unit in Catala programs, and scopes
160+
can be composed. Since a function can call other functions, scopes can also
161+
call other scopes. We will see later how to do this, but first let us focus
162+
on the inputs and outputs of scopes.
163+
164+
The declaration of the scope is akin to a function signature: it contains a list
165+
of all the arguments along with their types. But in Catala, scope variables can
166+
be `input`, `internal` or `output`. `input` means that the variable has to be
167+
provided whenever the scope is called, and cannot be defined within the scope.
168+
`internal` means that the variable is defined within the scope and cannot be
169+
seen from outside the scope; it's not part of the return value of the scope.
170+
`output` means that a caller can retrieve the computed value of the variable.
171+
Note that a variable can also be simultaneously an input and an output of the
172+
scope, in that case it should be annotated with `input output`.
173+
174+
Once the scope has been declared, we can use it to define our computation
175+
rules and finally code up Article 1!
176+
177+
178+
## Defining variables and formulas
179+
180+
Article 1 actually gives the formula to define the `income_tax` variable of
181+
scope `IncomeTaxComputation`.
182+
183+
> #### Article 1
184+
>
185+
> The income tax for an individual is defined as a fixed percentage of the
186+
> individual's income over a year.
187+
>
188+
> ```catala
189+
> scope IncomeTaxComputation:
190+
> definition income_tax equals
191+
> individual.income * fixed_percentage
192+
> ```
193+
194+
Let us unpack the code above. Each `definition` of a variable (here,
195+
`income_tax`) is attached to a scope that declares it (here,
196+
`IncomeTaxComputation`). After `equals`, we have the actual expression for the
197+
variable : `individual.income * fixed_percentage`. The syntax for formulas uses
198+
the classic arithmetic operators. Here, `*` means multiplying an amount of
199+
`money` by a `decimal`, returning a new amount of `money`. The exact behavior of
200+
each operator depends on the types of values it is applied on. For instance,
201+
here, because a value of the `money` type is always an integer number of cents,
202+
`*` rounds the result of the multiplication to the nearest cent to provide the
203+
final value of type `money` (see [the FAQ](./4-1-design.md) for more information
204+
about rounding in Catala). About `individual.income`, we see that the `.` notation
205+
lets us access the `income` field of `individual`, which is actually a structure
206+
of type `Individual`.
207+
208+
However, at this point we're still missing the definition of `fixed_percentage`.
209+
This is a common pattern when coding the law: the definitions for various
210+
variables are scattered in different articles. Fortunately, the Catala compiler
211+
automatically collects all the definitions for each scope and puts them
212+
in the right order. Here, even if we define `fixed_percentage` after
213+
`income_tax` in our source code, the Catala compiler will switch the order
214+
of the definitions internally because `fixed_percentage` is used in the
215+
definition of `income_tax`. More generally, the order of toplevel definitions
216+
and declarations in Catala source code files does not matter, and you can
217+
refactor code around freely without having to care about dependency order.
218+
219+
In this tutorial, we'll suppose that our fictional CTTC specification defines
220+
the percentage in the next article. The Catala code below should not surprise
221+
you at this point.
222+
223+
> #### Article 2
224+
>
225+
> The fixed percentage mentioned at article 1 is equal to 20 %.
226+
>
227+
> ```catala
228+
> scope IncomeTaxComputation:
229+
> # Writing 20% is just an alternative for the decimal "0.20".
230+
> definition fixed_percentage equals 20 %
231+
> ```
232+
233+
## Common values and computations in Catala
234+
235+
So far, we have seen values that have types like `decimal`, `money`, `integer`.
236+
One could object that there is no point in distinguishing these three concepts,
237+
as they are merely numbers. However, the philosophy of Catala is to make every
238+
choice that affects the result of the computation explicit, and the
239+
representation of numbers does affect the result of the computation. Indeed,
240+
financial computations vary according to whether we consider money amount as an
241+
exact number of cents, or whether we store additional fractional digits after
242+
the cent. Since the kind of programs [Catala is designed for](./0-intro.md)
243+
implies heavy consequences for a lot of users, the language is quite strict
244+
about how numbers are represented. The rule of thumb is that, in Catala,
245+
numbers behave exactly according to the common mathematical semantics one
246+
can associate to basic arithmetic computations (`+`, `-`, `*`, `/`).
247+
248+
In particular, that means that `integer` values are unbounded and can never
249+
[overflow](https://en.wikipedia.org/wiki/Integer_overflow). Similarly, `decimal`
250+
values can be arbitrarily precise (although they are always rational, belonging
251+
to ℚ) and do not suffer from floating-point imprecisions. For `money`, the
252+
language makes an opinionated decision: a value of type `money` is always
253+
an integer number of cents.
254+
255+
These choices has several consequences:
256+
* `integer` divided by `integer` gives a `decimal` ;
257+
* `money` cannot be multiplied by `money` (instead, multiply `money` by `decimal`) ;
258+
* `money` multiplied (or divided) by `decimal` rounds the result to the nearest cent ;
259+
* `money` divided by `money` gives a `decimal` (that is not rounded whatsoever).
260+
261+
Concretely, this gives:
262+
263+
```catala
264+
10 / 3 = 3.333333333...
265+
$10 / 3.0 = $3.33
266+
$20 / 3.0 = $6.67
267+
$10 / $3 = 3.33333333...
268+
```
269+
270+
The Catala compiler will guide you into using the correct operations explicitly,
271+
by reporting compiler errors when that is not the case. For instance, when
272+
trying to add an `integer` and a `decimal`:
273+
274+
```text
275+
┌─[ERROR]─
276+
277+
│ I don't know how to apply operator + on types integer and decimal
278+
279+
├─➤ tutorial_en.catala_en
280+
│ │
281+
│ │ definition x equals 1 + 2.0
282+
│ │ ‾‾‾‾‾‾‾
283+
284+
│ Type integer coming from expression:
285+
├─➤ tutorial_en.catala_en
286+
│ │
287+
│ │ definition x equals 1 + 2.0
288+
│ │ ‾
289+
290+
│ Type decimal coming from expression:
291+
├─➤ tutorial_en.catala_en
292+
│ │
293+
│ │ definition x equals 1 + 2.0
294+
│ │ ‾‾‾
295+
└─
296+
```
297+
298+
To fix this error, you need to use explicit casting, for instance by replacing
299+
`1` by `decimal of 1`. Refer to the [language reference](./5-catala.md) for all
300+
possible casting, operations and their associated semantics.
301+
302+
## Checkpoint
303+
304+
This concludes the first section of the tutorial. By setting up data structures
305+
like `structure` and `enumeration`, representing the types of `scope`
306+
variables, and `definition` of formulas for these variables, you should now be able to
307+
code in Catala the equivalent of single-function programs that perform common
308+
arithmetic operations and define local variables.

0 commit comments

Comments
 (0)