-
Notifications
You must be signed in to change notification settings - Fork 50
Global Variables
Global variables that are not shared (used by a single function instance) are identical to static local variables.
Otherwise global variables are the indicator of shared data between functions.
If the functions sharing the global variable exist in multiple clock domains, then the global variable is not used directly in code and instead requires use of READ and WRITE wrapper clock domain crossing mechanisms.
Only the function instance that both writes+reads the shared global variable is considered a stateful function and that function behaves as expected regarding reads+writes ordering/statefullness of the global variable, and lack of autopipelining.
This means that the other function instances that use the global variable can only read from it. The value that these functions read from the global is from 'at the end' of the function execution/pipeline where the variable is written.
Consider the below example with a instance of a write_func
writing to a shared global variable the_global
. Two read function instances read_main/read_func
and another_read_main/read_func
read from the shared global. Both read functions see the value of the_global
as it is at the end of the write function's execution.
// Global variable value at the end of the write function's execution/return
// is what read functions see at/triggers the start of their execution
int32_t the_global = 0;
// All reads and writes must be same clock domain
// or marked as as ASYNC_WIRE
//#pragma ASYNC_WIRE the_global
void write_func(uint1_t sel)
{
// Save global
int32_t temp = the_global;
// Do stuff ~maybe
if(sel)
{
// 'Erase it'
the_global = -1;
// Do some stuff
int32_t result = temp*temp;
// Reset global
the_global = temp;
// Accum for demo
the_global += result;
}
}
#pragma MAIN_MHZ write_main 100.0
void write_main(uint1_t sel)
{
write_func(sel);
}
/*// Not allowed, get error about ~multiple drivers
#pragma MAIN_MHZ another_write_main 100.0
void another_write_main(uint1_t sel)
{
write_func(sel);
}*/
uint32_t read_func()
{
static uint32_t local_counter;
if(the_global==-1)
{
// Never reaches here
// "inter-clock cycle" variable state from write_main never seen
// Only value of the_global at the end of write_func is seen
local_counter = 0;
}
// Count as demo
local_counter += the_global;
return local_counter;
}
#pragma MAIN_MHZ read_main 100.0
uint32_t read_main()
{
return read_func();
}
#pragma MAIN_MHZ another_read_main 100.0
uint32_t another_read_main()
{
return read_func();
}
/*// Get error about multiple clocks unless marked ASYNC_WIRE
#pragma MAIN_MHZ not_same_clk_read_main 50.0
uint32_t not_same_clk_read_main()
{
return read_func();
}*/
Shared global wires are declared identically to shared global registers as above. A shared global register that stores no value from cycle to cycle instead infers a shared global wire:
In order to have the register part of your global variable optimize away: never use the stored state of the global variable. Instead always be writing a write a value to the global variable that is a combination of other signals (and not state), then no storage register is needed.
Wires typically have one driver connected to one-or-more things driven by that wire. That is, a global variable is written to in just a single function instance, but can be read from in many other simultaneous function instances. If a global variable is written to in more than one function instance then this is a 'multiple driver' problem common to all HDLs.
The INST_ARRAY
pragma example discussed below is based a previous demo of FSM Style single instance ~atomic shared function calls. In that past example 10 state machine 'threads' attempt to increment an accumulator at the same time and arbitration logic is generated that resolves the multiple simultaneous calls to the increment function, allowing one thread at a time access to increment the accumulator. It takes 10 cycles total to process all the increments to the accumulator, one cycle per thread.
Instead, the below example shows a more basic way of dealing with similar multiple driver 'shared simultaneous access to the same resource' situations. 'Basic' here meaning, 'from this more complicated things can be built', ex. custom arbitration can be written by the user as opposed to relying on built in using FSM style built in generated arbitration logic as mentioned above.
// Each thread is providing these values to the increment handler module simultaneously
uint32_t increment_input_value;
uint32_t increment_input_tid;
// And expects an output of the sum after the thread's addition (same cycle)
uint32_t increment_output_sum;
// A 'thread'/FSM instance definition trying to increment some accumulator this clock cycle
uint32_t incrementer_thread(uint8_t tid)
{
while(1)
{
increment_input_value = tid + 1;
increment_input_tid = tid;
// increment_output_sum ready same cycle
printf("Thread %d trying to increment by %d, got new total %d.\n",
tid, increment_input_value, increment_output_sum);
__out(increment_output_sum); // "return" that continues
//return increment_output_sum;
}
}
In the above code, multiple simultaneous instances of incrementer_thread
are trying to drive the global wire increment_input_value
with a value to increment the shared accumulator. The output after the thread increments is expect on the increment_output_sum
wire. Normally this would be an issue as 1) the tool would have no way to resolve which function instance input value of increment_input_value
to use this cycle, another multiple driver situation and 2) The code can't associate/resolve which output increment_output_sum
corresponds to which of the inputs/threads.
To address this PipelineC introduced the INST_ARRAY
pragma. This pragma tells the tool how to resolve multiple input drivers, and multiple outputs in a way that makes sense to use in code. For example, below, #pragma INST_ARRAY increment_input_value increment_input_values
associates multiple uses of the global variable increment_input_value
to use of the array increment_input_values
. Depending on how the original global variable is used, as a read-only, or written as well, the corresponding array will represent write values or read values.
// The module doing the incrementing/accumulating
// sees an array of values, one from each thread
uint32_t increment_input_values[N_THREADS];
uint32_t increment_input_tids[N_THREADS];
#pragma INST_ARRAY increment_input_value increment_input_values
#pragma INST_ARRAY increment_input_tid increment_input_tids
// And drives output totals back to each thread as an array
uint32_t increment_output_sums[N_THREADS];
#pragma INST_ARRAY increment_output_sum increment_output_sums
#pragma MAIN increment_handler
void increment_handler()
{
// Accumulator reg
static uint32_t total;
// In one cycle accumulate from each thread with chained adders
uint32_t i;
for(i=0; i<N_THREADS; i+=1)
{
increment_output_sums[i] = total + increment_input_values[i];
printf("Thread %d incrementing total %d by %d -> new total %d.\n",
increment_input_tids[i], total, increment_input_values[i], increment_output_sums[i]);
total = increment_output_sums[i]; // accumulate
}
return total;
}
The below image shows running --sim --modelsim
for two cycles of the above design. Unlike the past FSM style arbitrated version, this design uses input values from and drives outputs to all threads each cycle: