Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lane sequencer behavior error with unbalanced workloads #248

Open
WEIhabi opened this issue Sep 10, 2023 · 4 comments
Open

Lane sequencer behavior error with unbalanced workloads #248

WEIhabi opened this issue Sep 10, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@WEIhabi
Copy link

WEIhabi commented Sep 10, 2023

Issue

As mentioned in #237, the relationship between vl and vstart may not be clearly defined in the dispatcher, so we limit vstart to be smaller than vl (the maximum vstart value can only be up to vlmax-1), but we can still see the following error scene.

The main reason is that when the main sequencer is assigned to the lane sequencer on each lane, the handling of the workload imbalance may not be considered clearly, so that there is an underflow situation, and the value in the wrong bank is fetched.

For example, if the current vadd.vv instruction is configured with vl=6 and vstart=1 by the previous vector CSR register, the lane sequencer on each lane will first calculate the vstart (vfu_operation_d.vstart)=pe_req.vstart/NrLanes=1/4=0, it will be 0 in all lanes, but in lane0, it will reduce the vstart value by 1 if lane_id<vstart%NrLanes, so the vstart value in lane0 will be reduced by 1 from 0 to the maximum negative value (It will be 12'hfff when VLEN is set to 2048).

If there's anything I'm considering wrong, please let me know, thanks!


Supplementary Pictures

image
image
image

@ckf104
Copy link

ckf104 commented Sep 11, 2023

Same doubt when I read the code. Maybe here code of assigning vstart should be changed to

      vfu_operation_d.vstart = pe_req.vstart / NrLanes;
      // If lane_id_i < vstart % NrLanes, this lane needs to execute one micro-operation less.
      if (lane_id_i < pe_req.vstart[idx_width(NrLanes)-1:0]) vfu_operation_d.vstart += 1;

alter -= to +=   :)

@WEIhabi
Copy link
Author

WEIhabi commented Sep 12, 2023

Hello @ckf104 ,
I understand your concerns, and I have also encountered situations where using += or -= can result in opposite outcomes. In the example you provided, the use of -= leads to underflow, while using += would cause overflow.

Regarding the specific code block (lane sequencer line 270) that you mentioned, I'm not entirely sure why it uses -=. If the development team has the opportunity, I hope they can explain it further.
My concerns can be illustrated using the diagram I shared in issue #237 image

To briefly explain the current setup: vstart is set to 16'h41, which is equivalent to 10'd65. The overall vstart for the entire instruction is 'd265, so for each lane, the starting position should be at least 66, not 65.

Here, vstart = 265/Nrlane = 4 with a remainder of 1. In our verification for lane 0 with id = 0, it's less than 1, so vstart - 1 becomes 65 (16’h41). Thus, vstart is 65 ('h41) in this case(micro-operation that code comment mentioned when vstart/lane count doesn't evenly divide).

However, my confusion lies in this theoretical situation: when vstart is 264, which is divisible by the number of lanes, shouldn't each lane start at the 66th element together? But when vstart becomes 265, why does lane 0 start from the 65th element rather than the 67th?

In any case, regardless of the design philosophy, it might be advisable to include additional conditional checks to prevent overflow or underflow when calculating values at upper and lower bounds. This is because once an overflow or underflow occurs, the actual bank where values are fetched may deviate significantly from the originally intended bank.


Supplementary illustration

(For my previous change to += run out of overflow)
image
image

In the above figure, you can see that if you change to use += and still don't have additional judgment on the overflow condition, it will result in the entry of the source register v16 to be taken in the extreme case to be taken to the register v24(v16+8), and the other source register v24 to be taken to be taken to the register v32(v24+8) to be taken to the register v0 to be taken to be taken to be taken to be taken to be taken to be taken to be taken to the register v0, and so on.
This is a computational error that exceeds the bounds and has not been taken into account.

@ckf104
Copy link

ckf104 commented Sep 12, 2023

Hi, @WEIhabi .
First, from my understanding, vl and vstart fields of vfu_operation_t correspond to concepts of vl and vstart in spec, except which are used within a single lane.
Now, suppose we change -= into +=. For your example data, set global vstart = 265, we should have

vstart
global 265
lane 0 67
lane 1 66
lane 2 66
lane 3 66

which is expected results.
Suppose vlen=2048 with vl=2046 and vstart=2045, we should have

vl vstart
global 2046 2045
lane 0 512 512
lane 1 512 511
lane 2 511 511
lane 3 511 511

which is also results we expected. Given vl=2046 and vstart=2045, ara should take only one one element for calculation, which will be dispatched to lane 1, and lane 0, 2, 3 don't do any calculation, hence overflow of addr calculation isn't a problem.

I think the true problem is how operand_requester.sv calculate len field of requester_d and vl field of operand_queue_cmd_o, which do not take vstart value into consideration, causing that lane 0, 2, 3 will request operand and do some calculations even though vl = vstart.

@WEIhabi
Copy link
Author

WEIhabi commented Sep 12, 2023

Hello @ckf104 .
You are right, I didn't take into account that the rest of the elements only operate on one lane (lane 1 as you said), so overflow is really not an problem.

For the len field of requester_d and vl field of operand_queue_cmd_o, without considering scale_vl (scale_vl=0), it's true that it's possible to change vl-vstart to be the judgment of whether or not to operate on a lane.

But there may be other ways to modify this :)

if (requester_d.len == '0)
    requester_d.len = 1;

Thank you for this, it's a wake up call!

@mp-17 mp-17 added the bug Something isn't working label Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants