-
Notifications
You must be signed in to change notification settings - Fork 98
VCVTPH2PS
VCVTPH2PS — Convert 16-bit FP values to Single-Precision FP values
Opcode/ Instruction | Op / En | 64/32 bit Mode Support | CPUID Feature Flag | Description |
VEX.128.66.0F38.W0 13 /r VCVTPH2PS xmm1, xmm2/m64 | A | V/V | F16C | Convert four packed half precision (16-bit) floating- point values in xmm2/m64 to packed single-precision floating-point value in xmm1. |
VEX.256.66.0F38.W0 13 /r VCVTPH2PS ymm1, xmm2/m128 | A | V/V | F16C | Convert eight packed half precision (16-bit) floating- point values in xmm2/m128 to packed single-precision floating-point value in ymm1. |
EVEX.128.66.0F38.W0 13 /r VCVTPH2PS xmm1 {k1}{z}, xmm2/m64 | B | V/V | AVX512VL AVX512F | Convert four packed half precision (16-bit) floating- point values in xmm2/m64 to packed single-precision floating-point values in xmm1. |
EVEX.256.66.0F38.W0 13 /r VCVTPH2PS ymm1 {k1}{z}, xmm2/m128 | B | V/V | AVX512VL AVX512F | Convert eight packed half precision (16-bit) floating- point values in xmm2/m128 to packed single-precision floating-point values in ymm1. |
EVEX.512.66.0F38.W0 13 /r VCVTPH2PS zmm1 {k1}{z}, ymm2/m256 {sae} | B | V/V | AVX512F | Convert sixteen packed half precision (16-bit) floating-point values in ymm2/m256 to packed single-precision floating-point values in zmm1. |
Op/En | Tuple Type | Operand 1 | Operand 2 | Operand 3 | Operand 4 |
A | NA | ModRM:reg (w) | ModRM:r/m (r) | NA | NA |
B | Half Mem | ModRM:reg (w) | ModRM:r/m (r) | NA | NA |
Converts packed half precision (16-bits) floating-point values in the low-order bits of the source operand (the second operand) to packed single-precision floating-point values and writes the converted values into the destination operand (the first operand).
If case of a denormal operand, the correct normal result is returned. MXCSR.DAZ is ignored and is treated as if it 0. No denormal exception is reported on MXCSR.
VEX.128 version: The source operand is a XMM register or 64-bit memory location. The destination operand is a XMM register. The upper bits (MAXVL-1:128) of the corresponding destination register are zeroed.
VEX.256 version: The source operand is a XMM register or 128-bit memory location. The destination operand is a YMM register. Bits (MAXVL-1:256) of the corresponding destination register are zeroed.
EVEX encoded versions: The source operand is a YMM/XMM/XMM (low 64-bits) register or a 256/128/64-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with writemask k1.
The diagram below illustrates how data is converted from four packed half precision (in 64 bits) to four single precision (in 128 bits) FP values.
Note: VEX.vvvv and EVEX.vvvv are reserved (must be 1111b).
127 96 VCVTPH2PS xmm1, xmm2/mem64, imm8 95 64 63 48 47 32 31 16 15 0 VH3 VH2 VH1 VH0 xmm2/mem64 convert convert convert convert 127 96 95 64 63 32 31 0 VS3 VS2 VS1 VS0 xmm1 |
Figure 5-6. VCVTPH2PS (128-bit Version)
vCvt_h2s(SRC1[15:0])
{
RETURN Cvt_Half_Precision_To_Single_Precision(SRC1[15:0]);
}
(KL, VL) = (4, 128), (8, 256), (16, 512)
FOR j ← 0 TO KL-1
i ← j * 32
k ← j * 16
IF k1[j] OR *no writemask*
THEN DEST[i+31:i] ←
vCvt_h2s(SRC[k+15:k])
ELSE
IF *merging-masking*
; merging-masking
THEN *DEST[i+31:i] remains unchanged*
ELSE
; zeroing-masking
DEST[i+31:i] ← 0
FI
FI;
ENDFOR
DEST[MAXVL-1:VL] ← 0
DEST[31:0] ←vCvt_h2s(SRC1[15:0]);
DEST[63:32] ←vCvt_h2s(SRC1[31:16]);
DEST[95:64] ←vCvt_h2s(SRC1[47:32]);
DEST[127:96] ←vCvt_h2s(SRC1[63:48]);
DEST[159:128] ←vCvt_h2s(SRC1[79:64]);
DEST[191:160] ←vCvt_h2s(SRC1[95:80]);
DEST[223:192] ←vCvt_h2s(SRC1[111:96]);
DEST[255:224] ←vCvt_h2s(SRC1[127:112]);
DEST[MAXVL-1:256] ← 0
DEST[31:0] ←vCvt_h2s(SRC1[15:0]);
DEST[63:32] ←vCvt_h2s(SRC1[31:16]);
DEST[95:64] ←vCvt_h2s(SRC1[47:32]);
DEST[127:96] ←vCvt_h2s(SRC1[63:48]);
DEST[MAXVL-1:128] ← 0
None
VCVTPH2PS __m512 _mm512_cvtph_ps( __m256i a);
VCVTPH2PS __m512 _mm512_mask_cvtph_ps(__m512 s, __mmask16 k, __m256i a);
VCVTPH2PS __m512 _mm512_maskz_cvtph_ps(__mmask16 k, __m256i a);
VCVTPH2PS __m512 _mm512_cvt_roundph_ps( __m256i a, int sae);
VCVTPH2PS __m512 _mm512_mask_cvt_roundph_ps(__m512 s, __mmask16 k, __m256i a, int sae);
VCVTPH2PS __m512 _mm512_maskz_cvt_roundph_ps( __mmask16 k, __m256i a, int sae);
VCVTPH2PS __m256 _mm256_mask_cvtph_ps(__m256 s, __mmask8 k, __m128i a);
VCVTPH2PS __m256 _mm256_maskz_cvtph_ps(__mmask8 k, __m128i a);
VCVTPH2PS __m128 _mm_mask_cvtph_ps(__m128 s, __mmask8 k, __m128i a);
VCVTPH2PS __m128 _mm_maskz_cvtph_ps(__mmask8 k, __m128i a);
VCVTPH2PS __m128 _mm_cvtph_ps ( __m128i m1);
VCVTPH2PS __m256 _mm256_cvtph_ps ( __m128i m1)
Invalid
VEX-encoded instructions, see Exceptions Type 11 (do not report
#AC);
EVEX-encoded instructions, see Exceptions Type E11.
#UD If VEX.W=1.
#UD If VEX.vvvv != 1111B or EVEX.vvvv != 1111B.
Source: Intel® Architecture Software Developer's Manual (May 2018)
Generated: 5-6-2018