-
Notifications
You must be signed in to change notification settings - Fork 50
Example: UART Loopback
The simplest possible UART Loopback is a single wire connecting the receive input side to the transmit output side:
One version of that code looks like:
#pragma MAIN main
uint1_t main(uint1_t uart_rx){
return uart_rx;
}
A similar version where global wires uart_rx
and uart_tx
are exposed from inside a top.h
header looks like:
// Top level IO pins configured in top.h
#include "top.h"
#pragma MAIN main
void main(){
uart_tx = uart_rx;
}
Most development boards come with some integrated USB-UART functionality. If not, any cheap dongle options can be found online.
For ex. when a pico-ice dev board is plugged in by default a /dev/ttyACM1
TTY device appears on the system. Using a terminal IO program like tio
you can test that the FPGA based loopback is working by typing into the window and seeing your characters printed to the screen.
$ sudo tio /dev/ttyACM1 -b 115200
[tio 23:54:17] tio v1.32
[tio 23:54:17] Press ctrl-t q to quit
[tio 23:54:17] Connected
YAY loopback works!
[tio 23:54:25] Disconnected
It is recommended to follow the design patterns shown in a series of examples designed for dev boards.
The UART loopback section of top.c
is shown below:
// See top level IO pin config in top.h
#include "top.h"
MAIN_MHZ(uart_main, UART_CLK_MHZ)
void uart_main(){
uart_tx_mac_word_in = uart_rx_mac_word_out;
uart_rx_mac_out_ready = uart_tx_mac_in_ready;
}
Notice the include of top.h
where top level IO is configured.
For instance inside top.h
you might configure the UART MAC instance like so:
// Configure UART module
#define UART_TX_OUT_WIRE ice_25
#define UART_RX_IN_WIRE ice_27
#define UART_CLK_MHZ PLL_CLK_MHZ
#define UART_BAUD 115200
#include "uart/uart_mac.c"
specifying which top level wires to use, and rate constants.
top.h
also references a board specific header inside of the board/ directory.
This design pattern of common code in top.c
with different board specifics configured in top.h
allows the final generated VHDL to be dev board specific and easy instantiate. While the internal PipelineC code can share a common interface with generic names, common helper libraries, etc.
Finally, notice the loopback is no longer a single wire:
uart_tx_mac_word_in = uart_rx_mac_word_out;
uart_rx_mac_out_ready = uart_tx_mac_in_ready;
Instead it is a data, valid, ready (DVR) handshake one byte at a time via some 'media access controller' MAC.
Inside uart_mac.c and uart_mac.h you will see reusable implementations of UART receive and transmit.
UART RX is a state machine that waits for the START bit, captures the 8 bits of data, followed by the STOP bit, and repeats.
A signature for the functionality that takes the 1 bit physical wire input and outputs a DVR stream of the single bits that were sampled at the right time (center of UART bit period) looks like uart_rx_1b
inside uart_mac.h:
// Logic to receive a UART bit stream
typedef enum uart_rx_state_t
{
IDLE,
WAIT_START,
RECEIVE
}uart_rx_state_t;
typedef struct uart_rx_1b_t{
// Outputs
stream(uint1_t) bit_stream;
uint1_t overflow;
}uart_rx_1b_t;
uart_rx_1b_t uart_rx_1b(
// Inputs
uint1_t input_wire,
uint1_t ready_for_bit_stream
){
static uart_rx_state_t state;
}
Continuing to fill in the body of the uart_rx_1b
function, adding in the counters needed in addition to the state machine state:
// Static local registers
static uart_rx_state_t state;
static uart_clk_count_t clk_counter;
static uart_bit_count_t bit_counter;
uart_mac.h contains the UART rate constants and types like uart_clk_count_t
of proper width for counting out the bit period in clock cycles.
So first part of UART RX is idle/doing nothing:
// Output wires
uart_rx_1b_t o; // Default all zeros
// State machine for receiving
if(state==IDLE)
{
// Wait for line to be high, idle, powered, etc
if(input_wire==UART_IDLE)
{
// Then wait for the start bit
state = WAIT_START;
clk_counter = 0;
}
}
outputting zeros o
during this state just waiting for the HIGH idle on the wire.
A UART frame begins with the START bit. In preparation for sampling data bits in the middle of the bit time, this state waits half way through the start bit before moving on:
else if(state==WAIT_START)
{
// Wait for the start bit=0
if(input_wire==UART_START)
{
// Wait half a bit period to align to center of clk period
clk_counter += 1;
if(clk_counter >= UART_CLKS_PER_BIT_DIV2)
{
// Begin loop of sampling each bit
state = RECEIVE;
clk_counter = 0;
bit_counter = 0;
}
}
}
The receive state waits one UART bit period in order sample the following eight data bits in the center of the bit period. This is a single over sampling point, no fancy averaging or voting to help resolve glitches on the wire etc.
else if(state==RECEIVE)
{
// Count a full bit period and then sample
clk_counter += 1;
if(clk_counter >= UART_CLKS_PER_BIT)
{
// Reset counter for next bit
clk_counter = 0;
// Output current data
o.bit_stream.data = input_wire;
o.bit_stream.valid = 1;
bit_counter += 1;
// Last bit of word?
if(bit_counter==UART_WORD_BITS)
{
// Back to idle waiting for next word
state = IDLE;
}
}
}
Notice the above code goes back to IDLE, which also handles the STOP bit == IDLE level. Could rewrite FSM as having stop state and no idle maybe.
There are two parts to UART RX: The above counting, getting the bit off the wire (protocol/sampling) and storing the 8 bits to output as a byte (deserialize with handshake):
Recall the signature of the receive function:
typedef struct uart_rx_1b_t{
stream(uint1_t) bit_stream;
uint1_t overflow;
}uart_rx_1b_t;
uart_rx_1b_t uart_rx_1b(
uint1_t input_wire,
uint1_t ready_for_bit_stream
);
The serialized single bit at a time bit_stream
output from that module needs assembled into an eight bit byte, deserialized.
Using some existing helper macros, the 1:8
deserializer is declared:
#include "stream/deserializer.h"
// Input 1 bit 8 times to get to get 1 byte out
deserializer(uart_deserializer, uint1_t, UART_WORD_BITS)
When connected to the the uart_rx_1b
function a stream of received UART bytes is produced:
Inside uart_mac.c an instance of uart_rx_1b
is connected to an instance of the uart_deserializer
. Both of these modules use valid-ready handshaking.
void uart_rx_mac()
{
// Receive bit stream from input wire
// Fake always ready since need to overflow
// on per byte level after deserializer
uint1_t ready_for_uart_rx_bit_stream = 1;
uart_rx_1b_t uart_rx_1b_out = uart_rx_1b(
uart_rx, // input physical bit
ready_for_uart_rx_bit_stream
);
// uart_rx_1b_out.overflow unused, never occurs
// Input 1 bit 8 times to get to get 1 byte out
uint1_t ready_for_uart_rx_byte_stream = 1;
uart_deserializer_o_t deser = uart_deserializer(
uart_rx_1b_out.bit_stream.data,
uart_rx_1b_out.bit_stream.valid,
ready_for_uart_rx_byte_stream
);
uart_rx_mac_word_out.data = uart_word_from_bits(deser.out_data);
uart_rx_mac_word_out.valid = deser.out_data_valid;
// Per byte overflow logic based on deser output handshake
uart_rx_mac_overflow =
uart_rx_mac_word_out.valid & ~uart_rx_mac_out_ready;
}
See faked always ready hard coded = 1 ready_for_uart_rx_bit_stream
and ready_for_uart_rx_byte_stream
. Real ready-ness is checked via the uart_rx_mac_overflow
flag, if not ready then full UART bytes are dropped.
It has proven convenient to expose globally visible wires as one way of composing a design.
For the above instances the following global wires of a data, valid, ready (DVR) handshake are declared:
// RX side
// Globally visible ports / wires
// Inputs
uint1_t uart_rx_mac_out_ready;
// Outputs
stream(uint8_t) uart_rx_mac_word_out;
uint1_t uart_rx_mac_overflow;
The ready in this case is just use to signal overflow=dropped data or not (it doesn't actual push back on UART like CTS/RTS).
Notice in the receive section above there were 'faked' ready
signals, really just an overflow exists. For transmit we cannot do this and need real functional back pressure flow control ready
signalling for our valid-ready stream handshaking. Expected that transmitting will 'block' and needs to signal 'not ready' for more inputs bytes as it outputs the current UART frame for a given byte onto the wire.
uart_mac.h has a function called uart_tx_1b
where the majority of code for UART transmit lives. The signature for this function looks like:
typedef enum uart_tx_state_t
{
IDLE,
SEND_START,
TRANSMIT,
SEND_STOP
}uart_tx_state_t;
typedef struct uart_tx_1b_t{
uint1_t output_wire
uint1_t ready_for_bit_stream;
}uart_tx_1b_t;
uart_tx_1b_t uart_tx_1b(
stream(uint1_t) bit_stream
){
static uart_tx_state_t state;
static uart_clk_count_t clk_counter;
static uart_bit_count_t bit_counter;
...
}
Idle doing nothing until have bits to send. Send the START bit. Send the eight data bits. Finally send the STOP bit.
The primary input to the uart_tx_1b
module is a stream of bits bit_stream
(contents of the byte to transmit). The primary output is the UART output TX wire output_wire
.
Notice the additional output of ready_for_bit_stream
which is new to handle for transmit.
The idle state outputs nothing and just wants for a bit (the first bit to transmit) to show up at the input. All eight bits of the byte are ready as soon as we see the first bit since the entire byte will be buffered a serializer (see next section).
// Default all zeros output
// (ex. ready_for_bit_stream=0)
uart_tx_1b_t o;
if(state==IDLE)
{
// Wait for valid bit(s) from serializer buffer
o.output_wire = UART_IDLE;
if(bit_stream.valid)
{
// Start transmitting start bit
state = SEND_START;
clk_counter = 0;
}
}
The start bit and stop bit sending states are similar and simple
if(state==SEND_START)
{
// Output start bit for one bit period
o.output_wire = UART_START;
clk_counter += 1;
if(clk_counter >= UART_CLKS_PER_BIT)
{
// Then move onto transmitting word bits
state = TRANSMIT;
clk_counter = 0;
bit_counter = 0;
}
}
if(state==SEND_STOP)
{
// Output stop bit for one bit period
o.output_wire = UART_STOP;
clk_counter += 1;
if(clk_counter >= UART_CLKS_PER_BIT)
{
// Then back to idle
state = IDLE;
}
}
It's the middle TRANSMIT
state that actually pulls out eight bits from the serializer. State looks similar to START AND STOP, except the data put onto the output wire is from the serializer bit stream input:
if(state==TRANSMIT)
{
// Output bit from serializer for one bit period
o.output_wire = bit_stream.data;
clk_counter += 1;
if(clk_counter >= UART_CLKS_PER_BIT)
{
// signal ready done with current bit now
// (next bit will be available next cycle)
o.ready_for_bit_stream = 1;
// Reset counter for next bit
clk_counter = 0;
bit_counter += 1;
// Last bit of word?
if(bit_counter==UART_WORD_BITS)
{
// Send the final stop bit
state = SEND_STOP;
clk_counter = 0;
}
}
}
Notice how ready_for_bit_stream
is used to signal 'yes ready for the current bit' as the mechanism to prepare the next bit.
Recall the signature for the uart_tx_1b
function:
typedef struct uart_tx_1b_t{
uint1_t output_wire
uint1_t ready_for_bit_stream;
}uart_tx_1b_t;
uart_tx_1b_t uart_tx_1b(
stream(uint1_t) bit_stream
);
In order to transmit a byte of eight bits it must be serialized into a single bit at a time bit_stream
input to the transmitter:
Using some existing helper macros, the 8:1
serializer is declared:
#include "stream/serializer.h"
// Input 8 bits once to get to get 8 bits out 1 at a time
serializer(uart_serializer, uint1_t, UART_WORD_BITS)
The output of the serializer is connected to the input of the uart_tx_1b
function and ultimately drives the output TX UART wire:
Inside uart_mac.c an instance of uart_tx_1b
is connected to an instance of the uart_serializer
. Both of these modules use valid-ready handshaking.
void uart_tx_mac()
{
// Input one 8b word into serializer buffer and get eight single bits
uint1_t word_in[UART_WORD_BITS];
UINT_TO_BIT_ARRAY(word_in, UART_WORD_BITS, uart_tx_mac_word_in.data)
// Ready is FEEDBACK doesnt get a value until later
uint1_t ready_for_bit_stream;
#pragma FEEDBACK ready_for_bit_stream
uart_serializer_o_t ser = uart_serializer(
word_in,
uart_tx_mac_word_in.valid,
ready_for_bit_stream
);
uart_tx_mac_in_ready = ser.in_data_ready;
stream(uint1_t) bit_stream;
bit_stream.data = ser.out_data;
bit_stream.valid = ser.out_data_valid;
// Transmit bit stream onto output wire
uart_tx_1b_t uart_tx_1b_out = uart_tx_1b(
bit_stream
);
uart_tx = uart_tx_1b_out.output_wire;
// Finally have FEEDBACK ready for serializer
ready_for_bit_stream = uart_tx_1b_out.ready_for_bit_stream;
}
Dealing with the feedback back pressure 'ready' part of a valid-ready handshake is likely the most subtle and potentially confusing aspect of the PipelineC code.
Generally, as you write out the C code of some dataflow, if you encounter a signal that you compute later, but need as an input now, you can mark that variable as FEEDBACK
to the compiler. Behavior is as if the variable has its final later assigned value available earlier in the code even before it was assigned.
In this case ready_for_bit_stream
is used as an input to uart_serializer
but its value is not given until later ready_for_bit_stream = uart_tx_1b_out.ready_for_bit_stream;
.
It has proven convenient to expose globally visible wires as one way of composing a design.
For the above instances the following global wires of a data, valid, ready (DVR) handshake are declared:
// TX side
// Globally visible ports / wires
// Inputs
stream(uint8_t) uart_tx_mac_word_in;
// Outputs
uint1_t uart_tx_mac_in_ready;
Finally, again notice the loopback as described at the start of this page.
uart_tx_mac_word_in = uart_rx_mac_word_out;
uart_rx_mac_out_ready = uart_tx_mac_in_ready;
it connects the global variables wires for the receive valid-ready handshake to the transmit valid-ready handshake.
More concise/hierarchical PipelineC source varieties for this design can be found here and here. A C test program for exercising the loopback can be found here.
Below is a flattened 'everything in uart loopback in one file' version of the code:
// Loopback UART with just enough buffering
// as to never overflow with balanced I/O bandwidth
#include "uintN_t.h"
// Each main function is a clock domain
// Only one clock in the design for now 'sys_clk' @ 100MHz
#define SYS_CLK_MHZ 100.0
#define CLKS_PER_SEC (SYS_CLK_MHZ*1000000.0)
#define SEC_PER_CLK (1.0/CLKS_PER_SEC)
#pragma MAIN_MHZ sys_clk_main 100.0
#pragma PART "xc7a35ticsg324-1l" // xc7a35ticsg324-1l = Arty, xcvu9p-flgb2104-2-i = AWS F1
// UART PHY?MAC?(de)serialize? logic
#define UART_BAUD 115200
#define UART_WORD_BITS 8
#define uart_word_t uint8_t
#define uart_bit_count_t uint4_t
#define uart_word_from_bits uint1_array8_le // PipelineC built in func
#define UART_SEC_PER_BIT (1.0/UART_BAUD)
#define UART_CLKS_PER_BIT_FLOAT (UART_SEC_PER_BIT/SEC_PER_CLK)
#define UART_CLKS_PER_BIT ((uart_clk_count_t)UART_CLKS_PER_BIT_FLOAT)
#define UART_CLKS_PER_BIT_DIV2 ((uart_clk_count_t)(UART_CLKS_PER_BIT_FLOAT/2.0))
#define uart_clk_count_t uint16_t
#define UART_IDLE 1
#define UART_START 0
#define UART_STOP UART_IDLE
// Convert framed async serial data to sync data+valid word stream
// rule of thumb name "_s" 'stream' if has .valid and .data
typedef struct uart_mac_s
{
uart_word_t data;
uint1_t valid;
}uart_mac_s;
// RX side
// Stateful regs
typedef enum uart_rx_mac_state_t
{
IDLE,
WAIT_START,
RECEIVE
}uart_rx_mac_state_t;
uart_rx_mac_state_t uart_rx_mac_state;
uart_clk_count_t uart_rx_clk_counter;
uart_bit_count_t uart_rx_bit_counter;
uint1_t uart_rx_bit_buffer[UART_WORD_BITS];
// RX logic
uart_mac_s uart_rx_mac(uint1_t data_in)
{
// Default no output
uart_mac_s output;
output.data = 0;
output.valid = 0;
// State machine for receiving
if(uart_rx_mac_state==IDLE)
{
// Wait for line to be high, idle, powered, etc
if(data_in==UART_IDLE)
{
// Then wait for the start bit
uart_rx_mac_state = WAIT_START;
uart_rx_clk_counter = 0;
}
}
else if(uart_rx_mac_state==WAIT_START)
{
// Wait for the start bit=0
if(data_in==UART_START)
{
// Wait half a bit period to align to center of clk period
uart_rx_clk_counter += 1;
if(uart_rx_clk_counter >= UART_CLKS_PER_BIT_DIV2)
{
// Begin loop of sampling each bit
uart_rx_mac_state = RECEIVE;
uart_rx_clk_counter = 0;
uart_rx_bit_counter = 0;
}
}
}
else if(uart_rx_mac_state==RECEIVE)
{
// Count a full bit period and then sample
uart_rx_clk_counter += 1;
if(uart_rx_clk_counter >= UART_CLKS_PER_BIT)
{
// Reset counter for next bit
uart_rx_clk_counter = 0;
// Shift bit buffer to make room for incoming bit
uint32_t i;
for(i=0;i<(UART_WORD_BITS-1);i=i+1)
{
uart_rx_bit_buffer[i] = uart_rx_bit_buffer[i+1];
}
// Sample current bit into back of shift buffer
uart_rx_bit_buffer[UART_WORD_BITS-1] = data_in;
uart_rx_bit_counter += 1;
// Last bit of word?
if(uart_rx_bit_counter==UART_WORD_BITS)
{
// Output the full valid word
output.data = uart_word_from_bits(uart_rx_bit_buffer);
output.valid = 1;
// Back to idle waiting for next word
uart_rx_mac_state = IDLE;
}
}
}
return output;
}
// TX side
// Slight clock differences between RX and TX sides can occur.
// Do a hacky off by one fewer clock cycles to ensure TX bandwidth
// is always slighty greater than RX bandwidth to avoid overflow
#define TX_CHEAT_CYCLES 1
// Stateful regs
typedef enum uart_tx_mac_state_t
{
IDLE,
SEND_START,
TRANSMIT,
SEND_STOP
}uart_tx_mac_state_t;
uart_tx_mac_state_t uart_tx_mac_state;
uart_clk_count_t uart_tx_clk_counter;
uart_bit_count_t uart_tx_bit_counter;
uart_mac_s uart_tx_word_in_buffer;
uint1_t uart_tx_bit_buffer[UART_WORD_BITS];
// Output type
typedef struct uart_tx_mac_o_t
{
uint1_t word_in_ready;
uint1_t data_out;
uint1_t overflow;
}uart_tx_mac_o_t;
// TX logic
uart_tx_mac_o_t uart_tx_mac(uart_mac_s word_in)
{
// Default no output
uart_tx_mac_o_t output;
output.word_in_ready = 0;
output.data_out = UART_IDLE; // UART high==idle
uint32_t i = 0;
// Ready for an incoming word to send
// if dont have valid word_in already (i.e. input buffer empty)
output.word_in_ready = !uart_tx_word_in_buffer.valid;
output.overflow = !output.word_in_ready & word_in.valid;
// Input registers
if(output.word_in_ready)
{
uart_tx_word_in_buffer = word_in;
}
// State machine for transmitting
if(uart_tx_mac_state==IDLE)
{
// Wait for valid bits in input buffer
if(uart_tx_word_in_buffer.valid)
{
// Save the bits of the word into shift buffer
for(i=0;i<UART_WORD_BITS;i=i+1)
{
uart_tx_bit_buffer[i] = uart_tx_word_in_buffer.data >> i;
}
// Start transmitting start bit
uart_tx_mac_state = SEND_START;
uart_tx_clk_counter = 0;
// No longer need data in input buffer
uart_tx_word_in_buffer.valid = 0;
}
}
// Pass through single cycle low latency from IDLE to SEND_START since if()
if(uart_tx_mac_state==SEND_START)
{
// Output start bit for one bit period
output.data_out = UART_START;
uart_tx_clk_counter += 1;
if(uart_tx_clk_counter >= (UART_CLKS_PER_BIT-TX_CHEAT_CYCLES))
{
// Then move onto transmitting word bits
uart_tx_mac_state = TRANSMIT;
uart_tx_clk_counter = 0;
uart_tx_bit_counter = 0;
}
}
else if(uart_tx_mac_state==TRANSMIT)
{
// Output from front of shift buffer for one bit period
output.data_out = uart_tx_bit_buffer[0];
uart_tx_clk_counter += 1;
if(uart_tx_clk_counter >= (UART_CLKS_PER_BIT-TX_CHEAT_CYCLES))
{
// Reset counter for next bit
uart_tx_clk_counter = 0;
// Shift bit buffer to bring next bit to front
for(i=0;i<(UART_WORD_BITS-1);i=i+1)
{
uart_tx_bit_buffer[i] = uart_tx_bit_buffer[i+1];
}
uart_tx_bit_counter += 1;
// Last bit of word?
if(uart_tx_bit_counter==UART_WORD_BITS)
{
// Send the final stop bit
uart_tx_mac_state = SEND_STOP;
uart_tx_clk_counter = 0;
}
}
}
else if(uart_tx_mac_state==SEND_STOP)
{
// Output stop bit for one bit period
output.data_out = UART_STOP;
uart_tx_clk_counter += 1;
if(uart_tx_clk_counter>=(UART_CLKS_PER_BIT-TX_CHEAT_CYCLES))
{
// Then back to idle
uart_tx_mac_state = IDLE;
}
}
return output;
}
// Make structs that wrap up the inputs and outputs
typedef struct sys_clk_main_inputs_t
{
// UART Input
uint1_t uart_txd_in;
} sys_clk_main_inputs_t;
typedef struct sys_clk_main_outputs_t
{
// UART Output
uint1_t uart_rxd_out;
// LEDs
uint1_t led[4];
} sys_clk_main_outputs_t;
// Sticky save overflow bit
uint1_t overflow;
// Break path from rx->tx in one clock by having buffer reg
uart_mac_s rx_word_buffer;
// The sys_clk_main function
sys_clk_main_outputs_t sys_clk_main(sys_clk_main_inputs_t inputs)
{
// Loopback RX to TX without connecting backwards facing flow control/ready
uart_mac_s rx_word = uart_rx_mac(inputs.uart_txd_in);
uart_tx_mac_o_t uart_tx_mac_output = uart_tx_mac(rx_word_buffer);
// Break path from rx->tx in one clock by having buffer reg
rx_word_buffer = rx_word;
sys_clk_main_outputs_t outputs;
outputs.uart_rxd_out = uart_tx_mac_output.data_out;
// Light up all four leds if overflow occurs
overflow = overflow | uart_tx_mac_output.overflow; // Sticky
outputs.led[0] = overflow;
outputs.led[1] = overflow;
outputs.led[2] = overflow;
outputs.led[3] = overflow;
return outputs;
}