Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question of vfredusum #294

Open
AD738560581 opened this issue Mar 20, 2024 · 3 comments
Open

question of vfredusum #294

AD738560581 opened this issue Mar 20, 2024 · 3 comments

Comments

@AD738560581
Copy link

Hello@mp-17 @suehtamacv
When I use the following case to test the vfredusum instruction, I found that the RTL result is 40a81878, while the spike result is 40a81879. Howerver, when I used a floating-point calculator to simulate the calculation process of the vfredusum instruction, I found that the result was the same as RTL. Does this mean that the nodes of tree addition in RTL need to maintain accuracy?
Thanks
VSET(4, e32, m1);
VLOAD_32(v11, 03fc001e6, 03fa01fff, 03fa01fff, 03fa01fff);
VLOAD_32(v14, 0x0, 0x0, 0x0, 0x0);
VLOAD_32(v1, 0x4, 0x4, 0x4, 0x4);
asm volatile("vfredusum.vs v1, v11, v14");

@jin8495
Copy link

jin8495 commented Mar 20, 2024

Hi, AD738560581.

It is because the Spike computes vfredusum with in-order addition, which is the same as vfredosum.

@AD738560581
Copy link
Author

Thanks~ Your answer completely solved my doubts. However, there is another problem. According to the spec of rvv1.0, "If no elements are active, no additions are performed, so the scalar in vs1[0] is simply copied to the destination register, without canonicalizing NaN values and without setting any exception flags", which means if v0 register is zero while vm=0, the vfredosum.vs v3, v1, v2, v0.t means v3[0] = v1[0]? However, in the below case, I found the v3[0] is zero. @mp-17 @suehtamacv @jin8495

VSET(4, e32, m1);
VLOAD_32(v1, 0x80000000, 0x80800000, 0x80000000, 0x80000000);
VLOAD_32(v2, 0x80000000, 0x80000000, 0x80000000, 0x80000000);
VLOAD_32(v3, 0x1, 0x2, 0x3, 0x4);
VLOAD_32(v0, 0xf, 0x0, 0x0, 0x0);
asm volatile("vfredosum.vs v3, v1, v2, v0.t");

@AD738560581
Copy link
Author

This seems to be an issue with -0.0 data. The vfredosum instruction adds vs1[0] to each element of vs2 separately. What's more, the masked element of vs2 will be replaced with +0.0, which will lead to -0.0 of vs1[0] add +0.0, and the result is +0.0. So, if there is no active element and the vs1[0] is -0.0, the result will be +0.0. The reason is that the no active element will be replaced with +0.0 in vmfpu.sv of ntr_val in default. I modify the ntr_val in VFREDU/OSUM, and give it with 0x8000_0000_8000_0000 with the EW32. The case will be passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants