-
-
Notifications
You must be signed in to change notification settings - Fork 5
Assembly Language Syntax
- Each line pertains to at most one instruction or label
- Whitespace is generally ignored except for the minimal amount required to separate parts of an instruction.
- Any characters after and including a semicolon
;
on any given line are consider to be comments
Anytime a numeric values is to be expressed, whether it be a immediate value or a memory address, it can be written in decimal, hex, or binary form as shown here:
Type | Syntax |
---|---|
Decimal | 124 |
Hex | $7C |
Hex | 0x7C |
Binary | b01111100 |
Binary | %01111100 |
Numeric expressions that can be resolved at compile time are supported. A numeric expression can be composed of any number of explicit numeric values, address labels, constant labels, or numeric operators. The supported operators are:
Operator | Description |
---|---|
+ |
Addition |
- |
Subtraction |
* |
Multiply |
/ |
Divide |
& |
Bit-wise AND |
| |
Bit-wise OR |
^ |
Bit-wise XOR |
( and )
|
Expression grouping. Parenthesis must be paired. |
Note that numeric expressions are not the same thing as an offset for a register indirect addressing mode, though the offset value can be expressed as a numeric expression.
A label is a string that can be resolved at compile time to a specific numeric value. Labels can be composed of alphanumeric characters and the underscore _
, and cannot start with a number. All labels must be distinct. A label cannot have an equivalent name as any directive, and non-register labels cannot have the same name as a register label.
An address label represents a specific address in the byte sequence being assembled. A label does not generate byte code on its own, but can be used as an instruction argument to specify a specific address value. A label's address value is implied by its relative location among the lines to be assembled.
A label is represented by any alphanumeric character string the immediately precedes a colon :
. There will be only one label allowed per line. However, a label can be followed by either a directive or an instruction on the same line. For example, this is valid:
a_label: .byte $22 ; directive on same line as label
A constant is a special label that has an explicitly assigned numeric value. Constants can be placed anywhere in the assembly code, as its value is only set by the assigned value. Assignment uses the following syntax based on the =
sign:
constant_var = 10204
Constants cannot be assign a numeric expression, they must be assigned an explicit numeric value.
A register label is defined in the instruction set configuration file. It is used to represent hardware registers in operands of instructions. Note that address and constant labels cannot use a string that has been declared a register label.
Both address labels and constant labels can be defined to be applicable only in a given scope. A scope defines to what extent a label is visible and usable by other lines of code. The allowed scopes are:
- Global - The label is visible and usable by all lines of code.
-
File - The label is visible and usable by only those lines of code sourcing form the same file as the label is defined in. A label is made to be in the file scope when its name is prefixed by a
_
character. -
Local - The label is visible and usable by only those lines in between the the same two non-Local labels within the same source file that the local label is defined in between. A label is made to be in the local scope when its name is prefixed with a
.
character.- If the line of code precedes the first non-local label in a source file or is in between a
.org
directive and a non-local label, then local labels cannot be defined. - A local scope label is only definable and usable between two non-local labels in the same source file, or between a non-local label and then an
.org
directive in the same source file, or between a non-local label and the end of file
- If the line of code precedes the first non-local label in a source file or is in between a
Instruction are converted into byte code. It is composed of a specific instruction mnemonic and an option list of operands according to this format:
MNEMONIC [OPERAND1[, OPERAND2[...]]]
- Instruction operands are separated by a comma
- Instruction operands supported types are configured in the Instruction Set Configuration File.
BespokeASM supports several addressing mode notations for instruction operands, though the precise meaning of each is defined by the instruction set configuration file and the hardware that the instruction set will run on. Explained here is the nominal application of each addressing mode notation.
Mode | Notation | Description | Hardware Expectations |
---|---|---|---|
Immediate | numeric_expression |
A constant value to be used as an operand. The constant value is indicated by a numeric expression. | Values embedded I program byte code should be generally readable. |
Indirect | [numeric_expression] |
A value that resides at a memory address indicated by a constant value. The constant value memory address is indicated by a numeric expression. | Ability to set a memory address register or similar. |
Deferred | [[numeric_expression]] |
The numeric constant value indicated by a numeric expression represents an address at which is the value of another address where the value of interest resides. Basically, this is a doubly dereferenced memory address. Note the use of double square brackets in the notation. | Ability to follow a doubly dereferenced memory address. |
Register | register_label |
The value in a specified register. The register is indicated by a register label. | Hardware registers that are generally accessible. |
Indirect Register | [register_label + offset] |
The specified register contains a memory address where the value is. An offset can be provided which should be added to the value in the register get the memory address where the desired value is. The register is indicated by a register label, and the offset is provided as a numeric expression and follows the register label with a + or - sign in between it and and the register label. |
Hardware registers that can set the memory address used to access memory devices. In order to support offsets, there should be the ability to produce a memory address by adding a value to the register value without necessarily changing the register value. |
Indirect Indexed Register | [register_label + offset_operand] |
Similar to Indirect Register, except that the offset can be set by any other addressing mode operand. When the configure offset operant is a numeric type, this behaves the same as Indirect Register except that the offset can only be + to the register, and there are no bounds checking on the value. The true value of this addressing mode is when the offset operand is configured to be Register, Indirect Register or Indirect value. |
Similar hardware needs as Indirect Register, with the general ability to set the offset value from any configured offset operand source. |
Directives tell the assembler to do specific things when creating the byte code. Directives start with a period .
or a hash #
.
There are a few addressing and byte code generation directives supported:
Directive | Description |
---|---|
.org X |
Resets the location counter (address) the assembler is using to address X
|
.fill N, Y |
Fills the next N bytes with the byte value Y
|
.zero N |
Shorthand for .fill N, 0
|
.zerountil X |
Fills the next bytes up to and including address X with the value of 0 . Will emit nothing if address X is less than the address location of this directive. |
A data directive allows for explicitly set byte code. Like an instruction, its relative position in the assembly code defines its memory address, but unlike the instruction the byte code edited is directly defined in the assembly code. When paired with a label, a data directive can be used to define variables and other memory blocks.
The data directives have several forms, each indicating how much data is being defined:
Directive | Data Value Size | Data Length | Endian |
---|---|---|---|
.byte |
1 byte | Variable | N/A |
.2byte |
2 bytes | Variable | Default |
.4byte |
4 bytes | Variable | Default |
.cstr |
1 byte | Variable | N/A |
The syntax of usage is simply the directive followed the a data values to be written. More than one value can be provided by a comma separated list of values or labels/constants. The value assembled into the byte code will be masked by the data value size of the directive.
The .byte
or .cstr
directive can be used to define character strings delineated by a "
or '
. Quotes and apostrophes within the quoted string should be escaped. The data values generated will be the ASCII values for each character in the string. Python-style character escapes (e.g., \t
, \n
, \x21
) can be used. The .cstr
directive can be used only with strings and will appends a null (zero) value byte at the end of the string.
For multi-byte types (.2byte
, .4byte
, etc), the endian representation of each individual value uses the configured default endianness specified in the instruction set configuration file.
This example includes a label to be used to make the data's address usable elsewhere in the assembly code:
const_value = $BE
single_bytes:
.byte $DE
.byte $AD
.byte const_value
.byte $EF
byte_list:
.byte $DE, 0xAD, const_value, $EF
str_with_no_terminating_null:
.byte "It\'s a test string"
str_with_terminating_null:
.cstr "It\'s a test string"
int16_value:
.2byte $dead, $beef
int32_value:
.4byte $deadbeef
Additional assembly files other than the target file indicated in the command invocation can be included in the compilation. This is done with the #include
preprocessor directive. The specific format is:
#include "filename.asm"
Where filename.asm
is the name of the file desired to be included. BespokeASM will search the include directories to find a file with the indicated filename. The include directory list includes the directory that contains the target file identified on command invocation, and any additional include directories identified by arguments to the command invocation.
When an assembly file is included by this directive, it is functionally equivalent to the the contents of the included file be present where the #include
directive is. If .org
directives are used in the included file, care should be taken such that the address of instructions do not collide between source files. BespokeASM will error if it detects that two or more instructions occupy the same address.
The inclusion of assembly files can be nested. However, BespokeASM will error if any given file ends up being included more than once.
An assembly source file can require a version check of the assembly language version as identified identifier
key of the General section of the assembly language configuration file being used for compilation. This is done using a #require
preprocessor directive. The specific format is:
#require "language-id comparator version-string"
where:
-
language-id
is the languagename
value in theidentifier
block of the general configuration section. -
comparator
is a comparison operator, such as>=
,>
,==
, etc. The most common comparison operator will be>=
. -
version-string
is a semantic version string, e.g.1.2.3
The version check is done at the moment the line with the #require
preprocessor directive is processed. This means any given code file can have multiple #require
checks. This is useful if you want to enforce a version range. For example:
#require "test-lang >= 0.5.0"
#require "test-lang < 1.0.0"
This would requires that the configuration file being used for compilation be for the language with the name test-lang
and be a version between 0.5.0
inclusive and 1.0.0
exclusive.
The following example using the instruction set for Ben Eater's SAP-1 Breadboard CPU.
; Count by Loop
;
; For the Ben Eater SAP-1 breadboard CPU
;
zero = 0 ; constant value for 0
one = 1 ; constant value for 1
start:
ldi zero ; load value of 0 into A
out ; display
add_loop:
add increment ; add current value at 0xF to A
jc increment_step ; increment the step if overflow
out ; display
jmp add_loop ; loop
increment_step:
lda increment ; load current increment value
add one_value ; add 1 to increment value
jc restart_loops ; if it overflows, just reset everything
sta increment ; save updated increment value
jmp start ; restart counting
restart_loops:
ldi one ; load the value of 1 into register A
sta increment ; reset the increment value to 1
jmp start ; restart counting
one_value:
.byte 1 ; 1 value needed for incrementing the increment value
increment:
.byte 1 ; storage for the current increment value
Here is an example that employs an instruction set that enable subroutines (call
, rts
), a stack (push
, pop
) and indirect addressing modes. It uses 16-bit addressing and little endian. The example configuration file for this instruction set is here. Also assumes a memory map with $0000
is the start of ROM and $8000
is the start of RAM.
;
; Variables
;
.org $8000 ; variables should be in RAM
n_value:
.byte 5 ; N value to calculate factorial for
;
; Code
;
.org 0 ; code goes in ROM
start:
push [n_value] ; push the value at n_value onto the stack
call factorial ; jump to the factorial subroutine
out ; factorial results are in A register. display it
hlt ; done
; factorial subroutine
;
; Input:
; stack - function return pointer
; stack+2 - The input N value to calculate factorial. A single 8-bit value
;
; Output:
; A register - the results of the factorial calculation. A single 8-bit value
;
; Registers used: A
;
factorial:
mov [sp+2],a ; copy the N value to A register
je .end,1 ; jump to f_stop if A is 1
sub 1 ; subtract 1 from A to get (N-1)
push a ; put the n-1 value on the stack
call factorial ; recurse into factorial
pop ; remove the (N-1) value from stack
push [sp+2] ; push the N value on the stack
push a ; push the factorial(n-1) results on stack
call multiply ; call multiply subroutine
pop ; pop factorial(n-1) from stack
pop ; pop N-value from stack
.end: ; local-scope label indicating the end of the subroutine
rts ; return from subroutine. Register A contains factorial(N)
; multiply subroutine
;
; Input:
; stack - function return pointer
; stack+2 - A single 8-bit value to multiply
; stack+3 - A single 8-bit value to multiply
;
; Output:
; A register - the results of the multiply calculation. A single 8-bit value
;
; Registers use: A, I
;
multiply:
mov [sp+2],a ; copy the multiplicand to A
je .zero,0 ; jump to zero handler if multiplicand is 0
mov a,b ; copy multiplicand to B to set up for add loop
mov [sp+3],i ; copy multiplier to I
dec i ; decrement I for 0-based loop
jc .zero ; was multiplier zero? If so, carry was set on the dec so jump to m_zero
.loop: ; local scope label indicating the start of the summation loop
jz .end ; jump to done if multiplier counter is now zero
add b ; add b to a
dec i ; decrement multiplier counter
jmp .loop ; restart addition loop
.zero: ; local scope label indicating when a 0-multiplicand is handled
mov a,0 ; set the return value to zero
.end: ; local-scope label indicating the end of the subroutine
rts ; return from subroutine