8328998: Encoding support for Intel APX extended general-purpose registers
Add instruction encoding support for Intel APX extended general-purpose registers:
Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html.
By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used.
Progress
- [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8328998: Encoding support for Intel APX extended general-purpose registers (Sub-task - P4)
Reviewers
- Sandhya Viswanathan (@sviswa7 - Reviewer) ⚠️ Review applies to 7b3e8ec7
- Jatin Bhateja (@jatin-bhateja - Reviewer)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476
$ git checkout pull/18476
Update a local copy of the PR:
$ git checkout pull/18476
$ git pull https://git.openjdk.org/jdk.git pull/18476/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18476
View PR using the GUI difftool:
$ git pr show -t 18476
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18476.diff
Webrev
:wave: Welcome back steveatgh! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@steveatgh This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8328998: Encoding support for Intel APX extended general-purpose registers
Reviewed-by: kvn, sviswanathan, jbhateja
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 247 new commits pushed to the master branch:
- ddd73b458355bffeaa8e0e5017c27d6c6af2dc94: 8332082: Shenandoah: Use consistent tests to determine when pre-write barrier is active
- 0a9d1f8c89e946d99f01549515f6044e53992168: 8332749: Broken link in MemorySegment.Scope.html
- c9a7b9772d96d9a4825d9da2aacc277534282860: 8332829: [BACKOUT] C2: crash in compiled code because of dependency on removed range check CastIIs
- 7fd9d6c760c66d3e2f4034cf1a6b1b583ff829a9: 8332340: Add JavacBench as a test case for CDS
- 417d174aa1b7bd3b5755e5f2352d9bbe6ce6f183: 8331348: Some incremental builds deposit files in the make directory
- 303ac9f270f567d821d156f3a9d4f4c070f43f95: 8332671: Logging for pretouching thread stacks shows wrong memory range
- 90758f6735620776fcb60da9e0e2c91a4f53aaf1: 8332808: Always set java.io.tmpdir to a suitable value in the build
- e19a421c30534566ba0dea0fa84f812ebeecfc87: 8332720: ubsan: instanceKlass.cpp:3550:76: runtime error: member call on null pointer of type 'struct Array'
- 2581935b47afaf661a94c8a8e50ce08065d632f6: 8332528: Generate code in SwitchBootstraps.generateTypeSwitch that require fewer adaptations
- b890336e111ea8473ae49e9992bc2fd61e716792: 8328083: degrade virtual thread support for GetObjectMonitorUsage
- ... and 237 more: https://git.openjdk.org/jdk/compare/6f98d8f58f98827ae454c7ce4839de4071d95767...master
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @sviswa7, @jatin-bhateja, @eme64) but any other Committer may sponsor as well.
➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).
@steveatgh The following label will be automatically applied to this pull request:
-
hotspot-compiler
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
Hi @steveatgh , Please use newly created JBS entry JDK-8328998 for this draft PR.
Webrevs
- 27: Full - Incremental (47a0cd70)
- 26: Full - Incremental (5b6fdce0)
- 25: Full - Incremental (1d6ecba9)
- 24: Full - Incremental (d2ac410e)
- 23: Full - Incremental (f054589e)
- 22: Full - Incremental (49b117ef)
- 21: Full - Incremental (47885cbe)
- 20: Full - Incremental (04a7db2a)
- 19: Full - Incremental (156bbfc5)
- 18: Full - Incremental (826fa2bb)
- 17: Full - Incremental (aee89e7c)
- 16: Full - Incremental (d4ecb31c)
- 15: Full - Incremental (52628798)
- 14: Full - Incremental (2a63a159)
- 13: Full - Incremental (d93e9893)
- 12: Full - Incremental (7b3e8ec7)
- 11: Full - Incremental (46eb6b42)
- 10: Full - Incremental (c65fda0c)
- 09: Full - Incremental (54d2226f)
- 08: Full - Incremental (01241d48)
- 07: Full - Incremental (41398bba)
- 06: Full - Incremental (7f845511)
- 05: Full - Incremental (21524eea)
- 04: Full - Incremental (7bd4b885)
- 03: Full - Incremental (eb246fd7)
- 02: Full - Incremental (95ce7dfa)
- 01: Full - Incremental (3d62dce8)
- 00: Full (1ad335a3)
How can we be confident that the encoding is correct? Would it be possible to write tests for this? Maybe one that disassembles it and compares the result to a 3rd party disassembler offline or in-process hsdis?
Also: what if the UseAPX is enabled, but the hardware does not support the feature? Don't we usually then automatically disable the flag, if the feature is not present? We do that with UseAVX for example.
And I agree with @dean-long : you need to have some test for this. At least some test should have the flag enabled. Something that stresses the registers, and verifies the results could be an idea.
Also: I suspect you would want to put this flag into the IR-framework whitelist, since it has no effect on the IR, right?
Can the APX features be simulated, maybe even with SDE?
Now you made the flag EXPERIMENTAL and by default false. What is the roadmap with this? It is generally not great to have default false flags, because the code underneath will just slowly rot and become broken. Is there a plan to eventually make it default true? What stops us from doing that already now?
Thank you @dean-long for the comment. I agree, tests are needed. Up to this point we have not had a separate formal tool to test encoding of x86. I did a lot of manual testing by adding loops that used r0-r31in different addressing patterns. I put these in a stub file that would be compiled by hotspot but not executed. I manually compared the disassembly of that against the output of similar assembly included in a small C program and run on the SDE. This worked pretty well for debugging but the manual aspect of it makes it error-prone and it takes a lot of time, too much time if iterating an implementation.
Subsequent pull requests will add encoding support for additional APX instructions (e.g. those using New Data Destination). Maybe one of these PRs can include a tool for testing instruction encoding for APX features. What do you think?
Thank you @eme64 for the comments. The functionality of the UseAPX flag is, as you point out, incomplete in this pull request. A subsequent PR (see JDK-8329030) will tie the logic of the flag in with a query of the hardware features. It was added in this PR thinking it could be useful for testing or debugging the encoding functionality.
Should is_src_gpr be set to true for the following: void Assembler::movdl(Register dst, XMMRegister src) { NOT_LP64(assert(VM_Version::supports_sse2(), "")); InstructionAttr attributes(AVX_128bit, /* rex_w / false, / legacy_mode / false, / no_mask_reg / true, / uses_vl */ false); // swap src/dst to get correct prefix int encode = simd_prefix_and_encode(src, xnoreg, as_XMMRegister(dst->encoding()), VEX_SIMD_66, VEX_OPCODE_0F, &attributes); emit_int16(0x7E, (0xC0 | encode)); }
Should is_src_gpr be set to true for the additional following instructions as well:
void Assembler::pextrd(Register dst, XMMRegister src, int imm8)
void Assembler::pextrq(Register dst, XMMRegister src, int imm8)
void Assembler::pextrb(Register dst, XMMRegister src, int imm8)
void Assembler::extractps(Register dst, XMMRegister src, uint8_t imm8)
void Assembler::pextl(Register dst, Register src1, Address src2)
void Assembler::pdepl(Register dst, Register src1, Address src2)
void Assembler::pextq(Register dst, Register src1, Address src2)
void Assembler::pdepq(Register dst, Register src1, Address src2)
void Assembler::movdq(Register dst, XMMRegister src)
Also the following instruction is not handled for egprs: void Assembler::popq(Register dst)
It looks to me that the source and dest are reversed in the following instruction in call to simd_prefix_and_encode, perhaps that should be a separate PR: // Do we have this wrong src and dst reversed in simd_prefix_and_encode? void Assembler::pextrw(Register dst, XMMRegister src, int imm8) { assert(VM_Version::supports_sse2(), ""); InstructionAttr attributes(AVX_128bit, /* rex_w / false, / legacy_mode / _legacy_mode_bw, / no_mask_reg / true, / uses_vl */ false); int encode = simd_prefix_and_encode(as_XMMRegister(dst->encoding()), xnoreg, src, VEX_SIMD_66, VEX_OPCODE_0F, &attributes); emit_int24((unsigned char)0xC5, (0xC0 | encode), imm8); } Once that PR is fixed, is_src_gpr should be set to true for this one as well.
Should is_src_gpr be set to true for the additional following instructions as well:
Thanks @sviswa7, yes some of these should also have src_is_gpr = true. Fixed: void Assembler::extractps(Register dst, XMMRegister src, uint8_t imm8) void Assembler::pextrd(Register dst, XMMRegister src, int imm8) void Assembler::pextrq(Register dst, XMMRegister src, int imm8) void Assembler::pextrb(Register dst, XMMRegister src, int imm8) void Assembler::movdq(Register dst, XMMRegister src)
These use a different prefix function, one that doesn't use a src_is_gpr parameter: void Assembler::pextl(Register dst, Register src1, Address src2) void Assembler::pdepl(Register dst, Register src1, Address src2) void Assembler::pextq(Register dst, Register src1, Address src2) void Assembler::pdepq(Register dst, Register src1, Address src2)
@sviswa7 wrote
Also the following instruction is not handled for egprs: void Assembler::popq(Register dst)
Thank you. Updated popq(Register) for egpr support.
SHA instructions (sha1rnds4, sha1nexte, sha1msg1, sha1msg2, sha256rnds2, sha256msg1, sha256msg2) needs to be encoded using EVEX encoding when egprs are in use.
It looks to me that the source and dest are reversed in the following instruction in call to simd_prefix_and_encode, perhaps that should be a separate PR: // Do we have this wrong src and dst reversed in simd_prefix_and_encode? void Assembler::pextrw(Register dst, XMMRegister src, int imm8) { assert(VM_Version::supports_sse2(), ""); InstructionAttr attributes(AVX_128bit, /* rex_w / false, / legacy_mode _/ legacy_mode_bw, / no_mask_reg / true, / uses_vl */ false); int encode = simd_prefix_and_encode(as_XMMRegister(dst->encoding()), xnoreg, src, VEX_SIMD_66, VEX_OPCODE_0F, &attributes); emit_int24((unsigned char)0xC5, (0xC0 | encode), imm8); } Once that PR is fixed, is_src_gpr should be set to true for this one as well.
Verified that the pextrw has the operands reversed per the SDM, so please ignore this comment.
SHA instructions (sha1rnds4, sha1nexte, sha1msg1, sha1msg2, sha256rnds2, sha256msg1, sha256msg2) needs to be encoded using EVEX encoding when egprs are in use.
Thank you, I missed these. The APX 3.0 spec says xmm register use is limited to 0-15 for SHA instructions. Coincidentally, the new version 4.0 APX spec. also removes support for EVEX promotion of SHA instructions. Given these specs, I don't think any encoding changes are needed. I've added an assert to these 7 instructions to check that only registers < 16 are used.
@sviswa7 Thank you for your review comments. Very helpful!
@steveatgh Please also do a merge with master.
:warning: @steveatgh the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:
$ git checkout apx-encoding-pr
$ git commit --author='Preferred Full Name <[email protected]>' --allow-empty -m 'Update full name'
$ git push
@vnkozlov Are there other things you would like to see for this pull request?
Thank you @dean-long for the comment. I agree, tests are needed. Up to this point we have not had a separate formal tool to test encoding of x86. I did a lot of manual testing by adding loops that used r0-r31in different addressing patterns. I put these in a stub file that would be compiled by hotspot but not executed. I manually compared the disassembly of that against the output of similar assembly included in a small C program and run on the SDE. This worked pretty well for debugging but the manual aspect of it makes it error-prone and it takes a lot of time, too much time if iterating an implementation.
Subsequent pull requests will add encoding support for additional APX instructions (e.g. those using New Data Destination). Maybe one of these PRs can include a tool for testing instruction encoding for APX features. What do you think?
When we wrote the AArch64 port, there was no available hardware to test it on. So, we wrote a simulator to test it. However, we ran the risk that if our understanding of instruction encoding was wrong, our assembler and our simulator might appear to work correctly when used together, but the result would not run on real AArch64 hardware once it arrived.
So, as well as a simulator for the architecture, we verified the internal HotSpot assembler by checking its encoding against GNU as. See /test/hotspot/gtest/aarch64, where a Python program generates source for both the HotSpot internal assembler and GNU as. I strongly suggest you do something similar.
(As a matter for the historical record, this did work. The test found several encoding bugs. Once we got the first real AArch64 hardware, the port worked almost immediately.)
In response to @dean-long, @theRealAph wrote:
When we wrote the AArch64 port, there was no available hardware to test it on. So, we wrote a simulator to test it. However, we ran the risk that if our understanding of instruction encoding was wrong, our assembler and our simulator might appear to work correctly when used together, but the result would not run on real AArch64 hardware once it arrived. So, as well as a simulator for the architecture, we verified the internal HotSpot assembler by checking its encoding against GNU
as. See /test/hotspot/gtest/aarch64, where a Python program generates source for both the HotSpot internal assembler and GNUas. I strongly suggest you do something similar. (As a matter for the historical record, this did work. The test found several encoding bugs. Once we got the first real AArch64 hardware, the port worked almost immediately.)
Thanks for the description. It would be great to create a similar tool for x86. I tested the encoding manually using the SDE as the authoritative source. It is tedious though and very time consuming.
A subsequent PR in JDK-8329030, perhaps the one that adds encoding support for New Data Destination variants, should include such a tool.
Hi @steveatgh ,
I have few more comments.
A) With recent change register only flavors of cvtsi2ss / cvtsi2sd / cvttsd2si/ cvttss2si which are all legacy map 1 instruction and are encoded using REX prefixes at UseAVX=0 will now be promoted to EEVEX which is a fixed 4 byte prefix, we should use REX2 instead. cvtsi2ss_MAP1_with_EEVEX.txt
B) Memory operand flavor of paddd : Missing EVEX tuples for memory operand instructions, it will prevent applying EVEX compressed displacement (disp8*N) encoding optimization. FTR: These are map 1 legacy instruction which could be encoded using SIMD + REX prefix, which adds up to two byte prefix, currently we promote them to VEX encoding in order to zero upper 128 bits, this added additional byte penalty in prefix since it used three byte VEX prefix (c4), now we will encode it using EEVEX if address operands (BASE/INDEX) is a EGPR which will add another byte to prefix since EVEX is a fixed 4 byte prefix. As mentioned above at UseAVX=0 we should encode them using REX2. paddd_MAP1_VEX_now_EEVEX.txt
C) Memory operand flavor of pcmpestri, ptest and vptest.
- missing address tuple
- legacy mode is true should be false.
Kindly incorporate.
A) With recent change register only flavors of cvtsi2ss / cvtsi2sd / cvttsd2si/ cvttss2si which are all legacy map 1 instruction and are encoded using REX prefixes at UseAVX=0 will now be promoted to EEVEX which is a fixed 4 byte prefix, we should use REX2 instead. cvtsi2ss_MAP1_with_EEVEX.txt
SDE ERROR: Illegal instruction at address = 7f3aafa329a0: f3 d5 10 0f 2a c1
I debugged the cause of above assertion failure with UseAVX=0 and found two issues:-
- Secondary map ID (0x0f) is being emitted after REX2 prefix, this should not happen since REX.M0 encodes this information.
- REX.M0 is not being set, because we are not passing correct value for map1 argument https://github.com/openjdk/jdk/blob/47885cbe8eb215323e9ca8f4a36d422d17521e57/src/hotspot/cpu/x86/assembler_x86.cpp#L11734
If we bring out the 0x0f to the top level assembler routines and remove following newly added codelets we can handle APX even at AVX level 0 and lift AVX512 constraint you introduced since SPECjbb2015 benchmark still explicitly pass UseAVX=0 during measurments.
https://github.com/openjdk/jdk/blob/47885cbe8eb215323e9ca8f4a36d422d17521e57/src/hotspot/cpu/x86/assembler_x86.cpp#L13158 https://github.com/openjdk/jdk/blob/47885cbe8eb215323e9ca8f4a36d422d17521e57/src/hotspot/cpu/x86/assembler_x86.cpp#L13187 https://github.com/openjdk/jdk/blob/47885cbe8eb215323e9ca8f4a36d422d17521e57/src/hotspot/cpu/x86/assembler_x86.cpp#L13347
I am ok to address these limitation in subsequent patches.
Hi @vnkozlov,
Is there anything else you would like to see for this PR. If not, I would like to check it in this week.
@steveatgh I just saw that we from Oracle did not run any tests for this. Can you please hold off for a day or two until we have the testing completed? I'm sure you did exhaustive testing - but still I'd like to make sure it runs fine on all the x64 machines we have.