-
-
Notifications
You must be signed in to change notification settings - Fork 5
Instruction Set Configuration File
The instruction set configuration file enables you to define the instruction set and assembly language features that will be used by BespokeASM to assemble byte code. This configuration file can be made using JSON or YAML.
The purpose of this configuration file is to control how machine code should be compiled for the instruction set. BespokeASM has a fix method for compiling machine for for any given instruction. The standard form or an instruction is:
MNEMONIC [OPERAND1[, OPERAND2[, ...]]]
Here, each instruction must be composed of at least a mnemonic, and can optionally have 1 or more operands.
The machine code generated for an instruction is composed of first byte code, then argument values. The byte code represents the value that will be used by a CPU's instruction register to indicate what instruction the CPU is executing. The byte code is composed of values specific to the mnemonic and optionally for each operand. The size of the packed byte code for a mnemonic and its operands should be the same as the instruction size of the hardware running this machine cod. The argument values are used by the that instruction as parameters. If more than one operand has an argument value to be placed in the machine code, then the argument values will be ordered in the same order as the operands. The instruction mnemonic and each of the operands can generate values to be packed into the byte code of an instruction, while only operands can generate argument values in byte code.
As an illustrative example, consider this assembly instruction:
mov a,[$8000] ; copy value at address $8000 into register A
In this case, the mnemonic mov
and the operands a
(for register A
) and [...]
(for an indirect value) all generate values that will be used to form the instruction's to form the instruction's byte code. The $8000
numeric value is the argument to the [...]
operand and follows the instruction byte code when forming the total machine code. The diagram below illustrated this.
Byte 0 Byte 1 Byte 2
========== ======== ========
01 001 110 00000000 10000000
-- --- --- -----------------
| | | |
| | | +- The second operand's argument value of $8000 in little endian format
| | +------------ The byte code 110 indicating the second operand ([...])
| +---------------- The byte code 001 indicating the first operand (register A)
+------------------- The byte code 01 indicating the mov mnemonic
There are two types of machine code that get generated from any given instruction:
- byte code - Indicates what instruction is to be executed. Usually used to indicate what microcode sequence to use.
- argument values - These are value that the instruction's microcode will operate on. For example, the address that a jump instruction would set the program counter to would be an argument value.
Through out this documentation, any use of "byte code" and "argument value" uses the above meanings. In BespokeASM's model, argument values are emitted after the byte code for an instruction. Operands to an instruction can impact both the byte code and arguments value of an instruction.
The configuration has the following main sections:
The general
section defines the general configuration of BespokeASM and various assembly language features. The general
section is required. The supported options are:
Option Key | Value Type | Description |
---|---|---|
address_size |
integer | The number of bits that is required to represent a memory address. |
page_size |
integer |
(Optional) The default memory page size in bytes to be used with the .page directive. Defaults to a value of 1 . |
endian |
string |
(Optional) Defines of the endianness of multibyte values. Allowed values are big and little . If not present, this option defaults to big . |
registers |
list[string] | (Optional) A list of register labels that will be used in this instruction set. Anything that is declared as a register label cannot be used as a constant or address label, and anything not declared as a register label cannot be used an a register operand. If not present, no register labels are defined. |
min_version |
string | (Optional) The minimum version of BespokeASM that this instruction set configuration file will work with. If provided, BespokeASM will also do a counter-minimum version check to make sure this instruction set configuration file has the schema it is expecting. |
identifier |
dictionary |
(Optional) Configures name and version information for the assembly language defined by this configuration file. This field is used both by language extension generation and source code language requirements. Contains the following key/value items:
|
origin |
integer |
(Optional) Defines the default starting origin address for byte code generated with this configuration file. This is an offset from the start of the GLOBAL memory zone. The starting origin defaults to an address of 0 if this option is not present. |
cstr_terminator |
integer |
(Optional) Defines the terminating character for byte sequences made with the .cstr data directive. Defaults to 0 if unset. |
allow_embedded_strings |
boolean |
(Optional) If set true , the compiler will allow the embedded string feature. Defaults to false . |
Both compiler constants and memory blocks can be defined in the ISA configuration file, and the labels defined with these entities can be used in code compiled with the ISA configuration file. This section is identified with the predefined
key and contains a dictionary with the following key/values.
Define compiler constants for numerical values that are often used for the instruction set the configuration file pertains to.
This subsection of predefined
is identified with the constants
key, and contains a list of dictionaries with the following keys/values:
Option Key | Value Type | Description |
---|---|---|
name |
string | The label string that this constant value will be assigned to. This case sensitive label string then can be used at compile time to reference the assigned integer value. |
value |
integer | The integer value that will be assigned to this constant. |
Data blocks can be predefined. These can be used to represent sections in memory that pertain to certain hardware features of the system that the instruction set pertains to, or can be used to reserve sections of memory for common uses, such as buffers. The point of these data blocks is to reserve a bit of memory and label the memory address of the data block. BespokeASM will generate an error if the addresses of compiled code or data should ever overlap with predefined memory blocks.
This subsection of predefined
is identified with the data
key, and contains a list of dictionaries with the following keys/values:
Option Key | Value Type | Description |
---|---|---|
name |
string | The label string that the first address value in this data block will be assigned to. This case sensitive label string them can be used at compile time to reference the assigned address value. |
address |
integer | The start address of the data block |
size |
integer | The number of bytes associated with this data block. Should be at least 1. |
value |
integer |
(Optional) The byte value this data block will be filled with when the BespokeASM generates a binary image for compiled code. If not present, the default value of 0 is used. |
A predefined memory zone can be defined in the instruction set configuration file. In the predefined
section, a subsection named memory_zones
can be defined. That section contains a list of dictionaries with the following keys:
Option Key | Value Type | Description |
---|---|---|
name |
string | The name of the memory zone |
start |
integer | The start address of the memory zone. |
end |
integer | The end address of the memory zone. |
The GLOBAL
memory zone may be defined here by using the GLOBAL
name. If defined, the origin
value defined in the general
settings will be interpreted as an offset from the GLOBAL
zone's start address. If the GLOBAL
memory zone is not explicitly defined, then the default GLOBAL
zone is created.
Memory zones are different from data blocks in that memory zones are where bytecode for code and data gets assembled into while a data block is a preallocated block of bytecode.
Preprocessor macro symbols can be predefined in the instruction set configuration file. In the predefined
section, a subsection named symbols
can be defined. That section contains a list of dictionaries with the following keys:
Option Key | Value Type | Description |
---|---|---|
name |
string | The name of the preprocessor macro symbol |
value |
string | Optional The string replacement value of the preprocessor macro. If not is provided, the empty string is assumed. |
The operand_sets
section allows you to define sets of operands for instructions. A operand set is intend to represent all of the possible operand values for a specific operand position, and defines the byte code and argument values that will be packed when forming the instructions machine code. Operand sets are defined separate from the instruction as to enable an operand set being used by more than one instruction. An operand set consists of 1 or more distinct operands.
The operand_set section is a dictionary, where the dictionary key is the name for the operand set, and the value is the configuration of that operand set. The name of the operand set is only use internally within this configuration file and does not directly impact the assembly language that is derived from this configuration file.
Each item listed in the operand_sets
consists of a single element titled operand_values
, which contains a dictionary that configures each of the operand variants in this operand set.
The operand configuration dictionary is used to specify the assembly behavior of a specific operand value. In this dictionary, the key is the internal name of the operand value item used within this configuration file, and the value is a collection of operand configuration items defined in the table below:
Option Key | Value Type | Description |
---|---|---|
type |
string | Specifies one of the operand types and operand addressing modes supported by BespokeASM. The allowed values are:
|
bytecode |
dictionary |
(Optional) A dictionary that configures the byte code associated with this operand. If not present this operand will not generate any byte code. This dictionary contains the following keys:
|
argument |
dictionary | Configures how the operand argument will be emitted into the machine code. Must be present for the numeric , numeric_indirect , enumeration , and numeric_enumeration operand types. Ignored for all other types.The dictionary contains the following keys:
|
register |
string | The assembly code representation of the register value to be used for this operand. Must be one of the register values listed in the registers list of the general section. Must be present for the register , register_indirect , and indirect_indexed_register operand types, ignore for all other operand types. |
offset |
dictionary | Configures the offset value that is optional for the indirect_register operand type. Ignored for all other types. If not present, then no offset is enabled, and no argument value will be emitted in the machine code. If offset values are enabled, this operand will generate an argument value in the machine code equal to the offset value specified in the assembly code. The compiler will still permit not specifying an offset for a indirect_register instruction configured to enabled offsets. In this case, the offset of zero is implied and will be emitted as the argument value.The dictionary contains the following keys:
|
index_operands |
dictionary | Configures the allowed offset operands for the indexed_register and indirect_indexed_register operand types. Contains a dictionary, where the key is an internal name for each offset operand option, and the value is an operand configuration formatted the same as described in this table. When compiling, BespokeASM will attempt to match one operand listed in index_operands . Note that the byte code of the matched index operand will be appended to this operand's configured byte code to form the overall byte code for this operand. If the matched index operand generates an argument, that will be appended to this operand's arguments, if any. |
use_curly_braces |
boolean |
(Optional) Used only with the relative_address operand type. Determines whether the assembly notation for this operand should use curly braces {..} around the expression that indicates the target address. Defaults to FALSE if not present. |
offset_from_instruction_end |
boolean |
(Optional) Used only with the relative_address operand type. Indicates whether the relative offset to be calculated should be calculated from the program counter value at the ned of the instruction (TRUE ) or the program counter value at the beginning of the instruction (FALSE ). Defaults to FALSE (beginning of instruction) if not present. |
decorator |
dictionary |
(Optional) Indicates whether this operand requires a decorator in order to match. Only supported by the register , indirect_register , and indirect_indexed_register operand types. The decorator configuration dictionary requires two keys:
|
Note that this configuration dictionary is used both by the Operand Set configuration and the configuration of specific operands in various aspects of the configuration file.
The instructions
section is where the supported instruction mnemonics are defined. An instruction definition is comprised or three parts: the mnemonic, the instruction arguments, and the instruction byte code. This section is a key/value dictionary where the keys are the mnemonic string name of the instruction and the value is another dictionary that defines the instructions arguments configuration and byte code.
Option Key | Value Type | Description |
---|---|---|
bytecode |
dictionary | A dictionary that describes the base byte code for this instruction that should be emitted to indicate the instruction. The key and values that must be present are:
|
operands |
dictionary | A dictionary that configures the set of operands that are allowed for this instruction mnemonic. The key and values that are used in this dictionary are described in the table below. If not present, then the instruction mnemonic is assumed to have no operands. |
variants |
list |
(Optional) This options allows the specification of one or more alternative configurations for the mnemonic. This is useful when a different instruction byte code prefix should be emitted for a certain operand signature. The value of this key is a list, and each list element is another instruction configuration with bytecode and operands as specified above. Variant configurations are processed if the operands do not match the main configurations, and then each variant configuration is processed in order present in the list, using the first match found to generate the byte code. |
The operands
configuration of a specific instruction requires at least one of the operand_sets
or the specific_operands
configuration, and can also have both.
Option Key | Value Type | Description |
---|---|---|
count |
integer | The number of operands this mnemonic must have. |
operand_sets |
dictionary |
(Optional) Present if operand sets are used to configure the operands of the mnemonic. Contains the following keys and values:
|
specific_operands |
dictionary |
(Optional) A dictionary of specific operand combination configurations that are allowed when assembling this instruction. Takes precedence over the operand combinations allowed in the operand_sets configuration for this instruction when both configure the same operand combination. The keys of this dictionary are arbitrary strings used internally to identify a specific operand configuration, and the values are the keys' operand configuration. Each operand configuration is a dictionary that contains the following keys and values:
|
Instruction macros are a way to make configurable sequences of instructions and then just just use a single instruction (macro) to insert that instruction sequence into the byte code. For example, if the ISA of the computer only has a single byte move instruction named mov
, a two byte move instruction (macro) named mov2
can be constructed from the following sequence of instructions:
mov [addr1],[addr2]
mov [addr1+1],[addr2+1]
And then the byte code the would be generated by this sequence of instructions can be added through the assembly instruction mov2 [addr1],[addr2]
.
BespokeASM enables the ability for instruction macros to be defined in the ISA configuration file. Once defined, the macro mnemonic can be used in the assembly code identically to native instruction mnemonics, with the only noticeable difference being that instruction macros generate more byte code than native instructions. What BespokeASM does here is essentially run a pre-assembler that expand a macro instruction into desired set of replacement instruction lines through a string parsing and replacement process. Then the constructed instruction lines are assembled with all the other instruction lines from the assembly code to generate the machine code.
Macros are defined in the macros
section of the configuration file. The section is structured similar to the instructions
section in that the section is a dictionary where the keys are the mnemonic of the macro and the value is a list of distinct configurations for that macro. A macro configuration list is a list of dictionaries. Each dictionary has two elements, operands
and instructions
.
The operands
section is configured the same as the operands
section for instructions is configured, however it is worth noting that since no byte code is emitted directly from a macro, any configuration provided for a macro's operand's byte code is ignored. The goal of the operand
section for a macro is simply to define what the allowed types of operands are for a specific macro configurations.
The instructions
section of a macro definition lists in order the instruction templates that will be used to compile the instruction sequence that the macro will be expanded into. Each instruction is written as is to be assembled, the macro mechanism essentially replaces the macro instruction in the assembly code with the assembly code listed in instructions
. However, before doing so, certain tokens that may be present in the instruction
section get replaced with finalized values. The tokens are of the form @YYY(x)
, where YYY
is the token label, and x
is an integer indicating what macro operand will be the source of its value. The first macro operand is represent by x
being zero to 0
, the second is 1
, and so on. The following macro tokens are supported:
-
@OP(x)
- Generates a value based on the whole string of thex
macro operand. -
@ARG(x)
- Generates a value based on the argument numeric expression of thex
macro operand -
@REG(x)
- Generates a value based on the register label used in thex
macro operand.
The specific value emitted by each macro token depends on the operand type that the x
macro operand is configured to be in the operands
section of this macro configuration. The following table lists what each macro token will generate for all supported operand types.
Operand Type |
operand argument@ARG(x)
|
operand register@REG(x)
|
entire operand@OP(x)
|
---|---|---|---|
numeric |
The original numeric expression | error | The original numeric expression |
indirect_numeric |
The numeric expression of the indirect address | error | The entire operand, including the [ ] brackets |
deferred_numeric |
The numeric expression of the indirect address | error | The entire operand, including the [[ ]] brackets |
register |
error | The register | The register |
indirect_register |
The offset expression applied to the register | The register | The entire operand, including the [ ] brackets |
indirect_indexed_register |
? | The base register | The entire operand, including the [ ] brackets |
enumeration |
The string of the enumeration value | error | The string of the enumeration value |
numeric_enumeration |
The numeric expression of the enumeration value | error | The numeric expression of the enumeration value |
numeric_bytecode |
The original numeric expression | error | The original numeric expression |
empty |
error | error | error |
To illustrate how to configure an macro, the the following is a nominal configuration for the mov2
example discussed above:
...
macros:
mov2:
- operands:
count: 2
specific_operands:
indirect_indirect:
list:
iaddr1:
type: indirect_numeric
argument:
size: 16
byte_align: true
iaddr2:
type: indirect_numeric
argument:
size: 16
byte_align: true
instructions:
- "mov [@ARG(0)],[@ARG(1)]"
- "mov [@ARG(0)+1],[@ARG(1)+1]"
...
- Macros definitions cannot define labels or constants, and cannot make use of directives. However, macro definitions can make use of labels and constants in expressions. It is strongly advised that the only predefined labels and constants are used in macro definitions.
- The instructions listed in the
instruction
section of a given instance of a macro definition are tightly coupled to the operands types configured for the macro in theoperands
section. If the instructions do not match what the macro operands would provide, then errors would be generated during assembly. Whileoperand_sets
can be used to configure a macro's operands, care should be taken to ensure all operands listed in the operand set are consistent with each other in terms of how the macro instructions will use it. If operands are inconsistent, a different configuration for the macro should be created in the list of configurations for a given macro.
Example configuration files can be found in the examples
directory of the BespokeASM repository.