Skip to content

Golang s390x asm Reference

Sun Yimin edited this page Aug 30, 2024 · 15 revisions

Reference

IBM z/Architecture, a.k.a. s390x The registers R10 and R11 are reserved. The assembler uses them to hold temporary values when assembling some instructions.

R13 points to the g (goroutine) structure. This register must be referred to as g; the name R13 is not recognized.

R15 points to the stack frame and should typically only be accessed using the virtual registers SP and FP.

Load- and store-multiple instructions operate on a range of registers. The range of registers is specified by a start register and an end register. For example, LMG (R9), R5, R7 would load R5, R6 and R7 with the 64-bit values at 0(R9), 8(R9) and 16(R9) respectively.

Storage-and-storage instructions such as MVC and XC are written with the length as the first argument. For example, XC $8, (R9), (R9) would clear eight bytes at the address specified in R9.

If a vector instruction takes a length or an index as an argument then it will be the first argument. For example, VLEIF $1, $16, V2 will load the value sixteen into index one of V2. Care should be taken when using vector instructions to ensure that they are available at runtime. To use vector instructions a machine must have both the vector facility (bit 129 in the facility list) and kernel support. Without kernel support a vector instruction will have no effect (it will be equivalent to a NOP instruction).

Addressing modes:

(R5)(R6*1): The location at R5 plus R6. It is a scaled mode as on the x86, but the only scale allowed is 1.

向量指令

Reference

Element Size

  • B - This stands for Byte and represents multiple 8-bit values in a 128-bit vector.
  • H - This stands for Halfword and represents multiple 16-bit values in a 128-bit vector.
  • F - This stands for Fullword and represents multiple 32-bit values in a 128-bit vector.
  • G - This stands for Doubleword and represents multiple 64-bit values in a 128-bit vector.
  • Q - This stands for Quadword and represents a single 128-bit value in a 128-bit vector.

(以上解释来自Copilot)

算术加减、乘法

  • VA - Vector Add. 无符号整数加法。
  • VAC - Vector Add With Carry. 带进位无符号整数加法;相当于3个数加法。
  • VACC - Vector Add Compute Carry. 无符号整数加法,计算进位。只有进位结果。
  • VACCC - Vector Add With Carry Compute Carry. 带进位无符号整数加法,计算进位。相当于3个数加法,只有进位结果。
  • VS - Vector Substract. 无符号整数减法。
  • VSBCBI - Vector Substract With Borrow Compute Borrow Indication. 带借位计算借位。
  • VSBI - Vector Substract With Borrow Indicator. 带借位减法。
  • VSCBI - Vector Substract Compute Borrow Indication. 计算借位。

所以,两个数相加要同时使用多个指令。下面示例演示 T2||T1||T0 = T1||T0 + RED2||RED1。

	VACCQ  T0, RED1, CAR1
	VAQ    T0, RED1, T0
	VACCCQ T1, RED2, CAR1, CAR2
	VACQ   T1, RED2, CAR1, T1
	VAQ    T2, CAR2, T2

下面示例演示 T2||TT1||TT0 = T2||T1||T0 - ZERO||PH||PL。

	VSCBIQ  PL, T0, CAR1
	VSQ     PL, T0, TT0
	VSBCBIQ T1, PH, CAR1, CAR2
	VSBIQ   T1, PH, CAR1, TT1
	VSBIQ   T2, ZER, CAR2, T2

乘法更复杂。

  • VML - Vector Multiply Low.
  • VMH - Vector Multiply High.Reference
  • VMLH - Vector Multiply Logical High.Reference
  • VMAL - Vector Multiply and Add Low. Reference
  • VMAH - Vector Multiply and Add High. Reference
  • VMALH - Vector Multiply and Add Logical High. Reference

The main difference between VMALH and VMAH (same as VMH and VMLH ) is how they handle the sign of the operands:

  • VMALH treats the operands as unsigned integers. This means it performs a logical multiplication and addition, which does not take the sign of the operands into account.

  • VMAH treats the operands as signed integers. This means it performs an arithmetic multiplication and addition, which does take the sign of the operands into account.

In other words, VMALH is used for unsigned integer operations, while VMAH is used for signed integer operations. The choice between the two depends on whether the data you're working with is signed or unsigned.

SM2 S390X汇编优化

https://github.com/emmansun/gmsm/blob/main/internal/sm2ec/p256_asm_s390x.s

没想到,s390x的普通寄存器竟然比AMD64架构还少!