|
| 1 | +# Basic blocks of a Catala program |
| 2 | + |
| 3 | +In this section, the tutorial introduces the basic blocks of a Catala program : |
| 4 | +the difference between law and code, data structures, |
| 5 | + |
| 6 | +scopes, variables and |
| 7 | +formulas. By the end of the section, you should be able to write a simple Catala |
| 8 | +program equivalent to a single function with local variables whose definitions |
| 9 | +can refer to one another. |
| 10 | + |
| 11 | +## Mixing law and code |
| 12 | + |
| 13 | +Catala is a language designed around the concept of *literate programming*, that |
| 14 | +is the mixing between the computer code and its specification in a single |
| 15 | +document. Why literate programming? Because it enables a fine-grained |
| 16 | +correspondance between the specification and the code. Whenever the |
| 17 | +specification is updated, knowing where to update the code is trivial with |
| 18 | +literal programming. This is absolutely crucial for enabling long-term |
| 19 | +maintenance of complex and high-assurance programs like tax or social benefits |
| 20 | +computation. |
| 21 | + |
| 22 | +Hence, a Catala source code file looks like a regular Markdown |
| 23 | +document, with the specification written down and styled as Markdown text, |
| 24 | +with the Catala code only present in well-bounded Catala code blocks introduced |
| 25 | +by a line with `` ```catala `` and ended by a line with `` ``` ``. |
| 26 | + |
| 27 | +Before writing any Catala code, we must introduce the specification of the |
| 28 | +code for this tutorial. This specification will be based on a fictional Tax Code |
| 29 | +defining a simple income tax. But in general, anything can be used as a |
| 30 | +specification for a Catala program: laws, executive orders, court cases |
| 31 | +motivations, legal doctrine, internal instructions, technical specifications, |
| 32 | +etc. These sources can be mixed to form a complete Catala program that |
| 33 | +relies on these multiple sources. Concretely, incorporating a legal source |
| 34 | +of specification into the Catala program amounts to copy-pasting the |
| 35 | +text and formatting it in Markdown syntax inside the source code file. |
| 36 | + |
| 37 | +Without further ado, let us introduce the first bit of specification for |
| 38 | +our fictional income tax, Article 1 of the CTTC (Catala Tutorial Tax Code): |
| 39 | + |
| 40 | +> #### Article 1 |
| 41 | +> |
| 42 | +> The income tax for an individual is defined as a fixed percentage of the |
| 43 | +> individual's income over a year. |
| 44 | +
|
| 45 | +The spirit of writing code in Catala is to stick to the specification at all |
| 46 | +times in order to put the code snippets where they belong. Hence, we will |
| 47 | +introduce below the Catala code snippets that translate Article 1, which |
| 48 | +should be put just below Article 1 in the Catala source code file. |
| 49 | + |
| 50 | +These code |
| 51 | +snippets should describe the program that computes the income tax, and contain |
| 52 | +the rule defining it as a multiplication of the income as rate. It is time |
| 53 | +to dive into Catala as a programming language. |
| 54 | + |
| 55 | + |
| 56 | +```catala |
| 57 | +# We will soon learn what to write here in order to translate the meaning |
| 58 | +# of Article 1 into Catala code. |
| 59 | +
|
| 60 | +# To create a block of Catala code in your file, bound it with Markdown-style |
| 61 | +# "```catala" and "```" delimiters. You can write comments in Catala code blocks |
| 62 | +# by prefixing lines with "#" |
| 63 | +``` |
| 64 | + |
| 65 | +## Setting up data structures |
| 66 | + |
| 67 | + |
| 68 | +The content of Article 1 assumes a lot of implicit context: there exists an |
| 69 | +individual with an income, as well as an income tax that the individual has |
| 70 | +to pay each year. Even if this implicit context is not verbatim in the law, |
| 71 | +we have to explicit it in the computer code, in the form of data structures |
| 72 | +and function signatures. |
| 73 | + |
| 74 | +Catala is a |
| 75 | +[strongly-typed](https://blog.merigoux.ovh/en/2017/07/19/static-or-dynamic.html), |
| 76 | +statically compiled language, so all data structures and function signatures |
| 77 | +have to be explicitly declared. So, we begin by declaring the type information |
| 78 | +for the individual, the taxpayer that will be the subject of the tax |
| 79 | +computation. This individual has an income and a number of children, both pieces |
| 80 | +of information which will be needed for tax purposes : |
| 81 | + |
| 82 | +```catala |
| 83 | +# The name of the structure, "Individual", must start with an |
| 84 | +# uppercase letter: this is the CamelCase convention. |
| 85 | +declaration structure Individual: |
| 86 | + # In this line, "income" is the name of the structure field and |
| 87 | + # "money" is the type of what is stored in that field. |
| 88 | + # Available types include: "integer", "decimal", "money", "date", |
| 89 | + # "duration", and any other structure or enumeration that you declare. |
| 90 | + data income content money |
| 91 | + # The field names "income" and "number_of_children" start by a lowercase |
| 92 | + # letter, they follow the snake_case convention. |
| 93 | + data number_of_children content integer |
| 94 | +``` |
| 95 | + |
| 96 | +This structure contains two data fields, `income` and `number_of_children`. |
| 97 | +Structures are useful to group together data that goes together. Usually, you |
| 98 | +get one structure per concrete object on which the law applies (like the |
| 99 | +individual). It is up to you to decide how to group the data together, but we |
| 100 | +advise you to aim at optimizing code readability. |
| 101 | + |
| 102 | +Sometimes, the law gives an enumeration of different situations. These |
| 103 | +enumerations are modeled in Catala using an enumeration type, like: |
| 104 | + |
| 105 | +```catala |
| 106 | +# The name "TaxCredit" is also written in CamelCase. |
| 107 | +declaration enumeration TaxCredit: |
| 108 | + # The line below says that "TaxCredit" can be a "NoTaxCredit" situation. |
| 109 | + -- NoTaxCredit |
| 110 | + # The line below says that alternatively, "TaxCredit" can be a |
| 111 | + # "ChildrenTaxCredit" situation. This situation carries a content |
| 112 | + # of type integer corresponding to the number of children concerned |
| 113 | + # by the tax credit. This means that if you're in the "ChildrenTaxCredit" |
| 114 | + # situation, you will also have access to this number of children. |
| 115 | + -- ChildrenTaxCredit content integer |
| 116 | +``` |
| 117 | + |
| 118 | +In computer science terms, such an enumeration is called a "sum type" or simply |
| 119 | +an enum. The combination of structures and enumerations allow the Catala |
| 120 | +programmer to declare all possible shapes of data, as they are equivalent to |
| 121 | +the powerful notion of [algebraic data types](https://en.wikipedia.org/wiki/Algebraic_data_type). |
| 122 | + |
| 123 | +Notice that these data structures that we have declared cannot always be |
| 124 | +attached naturally to a particular piece of the specification text. So, where to |
| 125 | +put these declarations in your literate programming file? Since you will be |
| 126 | +often going back to these data structure declarations during programming, we |
| 127 | +advise you to group them together in some sort of prelude in your code source |
| 128 | +file. Concretely, this prelude section containing the data structure declaration |
| 129 | +will be your one stop shop when trying to understand the data manipulated by the |
| 130 | +rules elsewhere in the source code file. |
| 131 | + |
| 132 | +## Scopes as basic computation blocks |
| 133 | + |
| 134 | +We've defined and typed the data that the program will manipulate. Now, we have |
| 135 | +to define the logical context in which this data will evolve. Because Catala is |
| 136 | +a [functional programming](https://en.wikipedia.org/wiki/Functional_programming) |
| 137 | +language, all code exists within a function. And the equivalent to a function in |
| 138 | +Catala is called a *scope*. A scope is comprised of : |
| 139 | +* a name, |
| 140 | +* input variables (similar to function arguments), |
| 141 | +* internal variables (similar to local variables), |
| 142 | +* output variables (that together form the return type of the function). |
| 143 | + |
| 144 | +For instance, Article 1 declares a scope for computing the income tax: |
| 145 | + |
| 146 | +```catala |
| 147 | +declaration scope IncomeTaxComputation: |
| 148 | + # Scope names use the CamelCase naming convention, like names of structs |
| 149 | + # or enums Scope variables, on the other hand, use the snake_case naming |
| 150 | + # convention, like struct fields. |
| 151 | + input individual content Individual |
| 152 | + # This line declares an input variable of the scope, which is akin to |
| 153 | + # a function parameter in computer science term. This is the piece of |
| 154 | + # data on which the scope will operate. |
| 155 | + internal fixed_percentage content decimal |
| 156 | + output income_tax content money |
| 157 | +``` |
| 158 | + |
| 159 | +The scope is the basic abstraction unit in Catala programs, and scopes |
| 160 | +can be composed. Since a function can call other functions, scopes can also |
| 161 | +call other scopes. We will see later how to do this, but first let us focus |
| 162 | +on the inputs and outputs of scopes. |
| 163 | + |
| 164 | +The declaration of the scope is akin to a function signature: it contains a list |
| 165 | +of all the arguments along with their types. But in Catala, scope variables can |
| 166 | +be `input`, `internal` or `output`. `input` means that the variable has to be |
| 167 | +provided whenever the scope is called, and cannot be defined within the scope. |
| 168 | +`internal` means that the variable is defined within the scope and cannot be |
| 169 | +seen from outside the scope; it's not part of the return value of the scope. |
| 170 | +`output` means that a caller can retrieve the computed value of the variable. |
| 171 | +Note that a variable can also be simultaneously an input and an output of the |
| 172 | +scope, in that case it should be annotated with `input output`. |
| 173 | + |
| 174 | +Once the scope has been declared, we can use it to define our computation |
| 175 | +rules and finally code up Article 1! |
| 176 | + |
| 177 | + |
| 178 | +## Defining variables and formulas |
| 179 | + |
| 180 | +Article 1 actually gives the formula to define the `income_tax` variable of |
| 181 | +scope `IncomeTaxComputation`. |
| 182 | + |
| 183 | +> #### Article 1 |
| 184 | +> |
| 185 | +> The income tax for an individual is defined as a fixed percentage of the |
| 186 | +> individual's income over a year. |
| 187 | +> |
| 188 | +> ```catala |
| 189 | +> scope IncomeTaxComputation: |
| 190 | +> definition income_tax equals |
| 191 | +> individual.income * fixed_percentage |
| 192 | +> ``` |
| 193 | +
|
| 194 | +Let us unpack the code above. Each `definition` of a variable (here, |
| 195 | +`income_tax`) is attached to a scope that declares it (here, |
| 196 | +`IncomeTaxComputation`). After `equals`, we have the actual expression for the |
| 197 | +variable : `individual.income * fixed_percentage`. The syntax for formulas uses |
| 198 | +the classic arithmetic operators. Here, `*` means multiplying an amount of |
| 199 | +`money` by a `decimal`, returning a new amount of `money`. The exact behavior of |
| 200 | +each operator depends on the types of values it is applied on. For instance, |
| 201 | +here, because a value of the `money` type is always an integer number of cents, |
| 202 | +`*` rounds the result of the multiplication to the nearest cent to provide the |
| 203 | +final value of type `money` (see [the FAQ](./4-1-design.md) for more information |
| 204 | +about rounding in Catala). About `individual.income`, we see that the `.` notation |
| 205 | +lets us access the `income` field of `individual`, which is actually a structure |
| 206 | +of type `Individual`. |
| 207 | +
|
| 208 | +However, at this point we're still missing the definition of `fixed_percentage`. |
| 209 | +This is a common pattern when coding the law: the definitions for various |
| 210 | +variables are scattered in different articles. Fortunately, the Catala compiler |
| 211 | +automatically collects all the definitions for each scope and puts them |
| 212 | +in the right order. Here, even if we define `fixed_percentage` after |
| 213 | +`income_tax` in our source code, the Catala compiler will switch the order |
| 214 | +of the definitions internally because `fixed_percentage` is used in the |
| 215 | +definition of `income_tax`. More generally, the order of toplevel definitions |
| 216 | +and declarations in Catala source code files does not matter, and you can |
| 217 | +refactor code around freely without having to care about dependency order. |
| 218 | +
|
| 219 | +In this tutorial, we'll suppose that our fictional CTTC specification defines |
| 220 | +the percentage in the next article. The Catala code below should not surprise |
| 221 | +you at this point. |
| 222 | +
|
| 223 | +> #### Article 2 |
| 224 | +> |
| 225 | +> The fixed percentage mentioned at article 1 is equal to 20 %. |
| 226 | +> |
| 227 | +> ```catala |
| 228 | +> scope IncomeTaxComputation: |
| 229 | +> # Writing 20% is just an alternative for the decimal "0.20". |
| 230 | +> definition fixed_percentage equals 20 % |
| 231 | +> ``` |
| 232 | +
|
| 233 | +## Common values and computations in Catala |
| 234 | +
|
| 235 | +So far, we have seen values that have types like `decimal`, `money`, `integer`. |
| 236 | +One could object that there is no point in distinguishing these three concepts, |
| 237 | +as they are merely numbers. However, the philosophy of Catala is to make every |
| 238 | +choice that affects the result of the computation explicit, and the |
| 239 | +representation of numbers does affect the result of the computation. Indeed, |
| 240 | +financial computations vary according to whether we consider money amount as an |
| 241 | +exact number of cents, or whether we store additional fractional digits after |
| 242 | +the cent. Since the kind of programs [Catala is designed for](./0-intro.md) |
| 243 | +implies heavy consequences for a lot of users, the language is quite strict |
| 244 | +about how numbers are represented. The rule of thumb is that, in Catala, |
| 245 | +numbers behave exactly according to the common mathematical semantics one |
| 246 | +can associate to basic arithmetic computations (`+`, `-`, `*`, `/`). |
| 247 | +
|
| 248 | +In particular, that means that `integer` values are unbounded and can never |
| 249 | +[overflow](https://en.wikipedia.org/wiki/Integer_overflow). Similarly, `decimal` |
| 250 | +values can be arbitrarily precise (although they are always rational, belonging |
| 251 | +to ℚ) and do not suffer from floating-point imprecisions. For `money`, the |
| 252 | +language makes an opinionated decision: a value of type `money` is always |
| 253 | +an integer number of cents. |
| 254 | +
|
| 255 | +These choices has several consequences: |
| 256 | +* `integer` divided by `integer` gives a `decimal` ; |
| 257 | +* `money` cannot be multiplied by `money` (instead, multiply `money` by `decimal`) ; |
| 258 | +* `money` multiplied (or divided) by `decimal` rounds the result to the nearest cent ; |
| 259 | +* `money` divided by `money` gives a `decimal` (that is not rounded whatsoever). |
| 260 | +
|
| 261 | +Concretely, this gives: |
| 262 | +
|
| 263 | +```catala |
| 264 | +10 / 3 = 3.333333333... |
| 265 | +$10 / 3.0 = $3.33 |
| 266 | +$20 / 3.0 = $6.67 |
| 267 | +$10 / $3 = 3.33333333... |
| 268 | +``` |
| 269 | +
|
| 270 | +The Catala compiler will guide you into using the correct operations explicitly, |
| 271 | +by reporting compiler errors when that is not the case. For instance, when |
| 272 | +trying to add an `integer` and a `decimal`: |
| 273 | + |
| 274 | +```text |
| 275 | +┌─[ERROR]─ |
| 276 | +│ |
| 277 | +│ I don't know how to apply operator + on types integer and decimal |
| 278 | +│ |
| 279 | +├─➤ tutorial_en.catala_en |
| 280 | +│ │ |
| 281 | +│ │ definition x equals 1 + 2.0 |
| 282 | +│ │ ‾‾‾‾‾‾‾ |
| 283 | +│ |
| 284 | +│ Type integer coming from expression: |
| 285 | +├─➤ tutorial_en.catala_en |
| 286 | +│ │ |
| 287 | +│ │ definition x equals 1 + 2.0 |
| 288 | +│ │ ‾ |
| 289 | +│ |
| 290 | +│ Type decimal coming from expression: |
| 291 | +├─➤ tutorial_en.catala_en |
| 292 | +│ │ |
| 293 | +│ │ definition x equals 1 + 2.0 |
| 294 | +│ │ ‾‾‾ |
| 295 | +└─ |
| 296 | +``` |
| 297 | + |
| 298 | +To fix this error, you need to use explicit casting, for instance by replacing |
| 299 | +`1` by `decimal of 1`. Refer to the [language reference](./5-catala.md) for all |
| 300 | +possible casting, operations and their associated semantics. |
| 301 | + |
| 302 | +## Checkpoint |
| 303 | + |
| 304 | +This concludes the first section of the tutorial. By setting up data structures |
| 305 | +like `structure` and `enumeration`, representing the types of `scope` |
| 306 | +variables, and `definition` of formulas for these variables, you should now be able to |
| 307 | +code in Catala the equivalent of single-function programs that perform common |
| 308 | +arithmetic operations and define local variables. |
0 commit comments