Cover all RMIs with MIRI testing
This PR aims to introduce MIRI testing across all RMI functions to strengthen memory safety within RMM, particularly by addressing potential issues arising from unsafe code.
To cover all 23 RMI types, 11 new test cases (TCs) have been added (see Appendix: MIRI Test Cases for details), leading to the discovery of 2 bugs (1 memory leak, 1 integer overflow). Additionally, while writing the MIRI test cases, a manual review revealed a misalignment within the RMM specification, which was reported to Arm.
How to run
$ ./scripts/tests/miri.sh
running 22 tests
test r#macro::test::eprintln_with_format ... ok
test r#macro::test::eprintln_without_arg ... ok
test r#macro::test::eprintln_without_format ... ok
test r#macro::test::println_with_format ... ok
test r#macro::test::println_without_arg ... ok
test r#macro::test::println_without_format ... ok
test r#macro::test::set_of_const_assert ... ok
test rmi::features::test::rmi_features ... ok
test rmi::gpt::test::rmi_granule_delegate_negative ... ok
test rmi::gpt::test::rmi_granule_delegate_positive ... ok
test rmi::gpt::test::rmi_granule_undelegate ... ok
test rmi::realm::test::rmi_realm_create_negative ... ok
test rmi::realm::test::rmi_realm_create_positive ... ok
test rmi::rec::handlers::test::rmi_rec_create_positive ... ok
test rmi::rec::handlers::test::rmi_rec_enter_positive ... ok
test rmi::rtt::test::rmi_data_create_positive ... ok
test rmi::rtt::test::rmi_rtt_create_positive ... ok
test rmi::rtt::test::rmi_rtt_fold_positive ... ok
test rmi::rtt::test::rmi_rtt_init_ripas_positive ... ok
test rmi::rtt::test::rmi_rtt_map_unprotected_positive ... ok
test rmi::rtt::test::rmi_rtt_set_ripas_positive ... ok
test rmi::version::test::rmi_version ... ok
test result: ok. 22 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 144.88s
Discovered Bugs
Bug 1: Memory Leak in the RMM Page Table
- Bug Type: Memory Leak
- Cause: During the process of mapping/unmapping the RMM page table, level 1-3 sub-tables were not released even when they contained no entries.
- Temporary Solution: As a temporary measure, the RMM page table is dropped upon completion of the MIRI test.
- Future Discussion: Creating and destroying sub-tables in the RMM page table with each GRANULE_DELEGATE call may result in performance degradation. It is necessary to find a solution that addresses the memory leak while minimizing performance impacts.
-
Note: The RMM page table is managed as a static variable, and in Rust, the
drop()function for static variables is not invoked. As a result, sub-tables used by RMM continue to allocate memory without being released.
$ cargo miri test rmi::gpt::test::rmi_granule_delegate
error: memory leaked: alloc56894755 (Rust heap, size: 4096, align: 4096), allocated here:
--> /home/sangwan/.rustup/toolchains/nightly-2024-04-21-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:100:9
|
100 | __rust_alloc(layout.size(), layout.align())
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: BACKTRACE:
= note: inside `alloc::alloc::alloc` at /home/sangwan/.rustup/toolchains/nightly-2024-04-21-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:100:9: 100:52
= note: inside `<vmsa::page_table::DefaultMemAlloc as vmsa::page_table::MemAlloc>::allocate` at /home/sangwan/islet/lib/vmsa/src/page_table.rs:63:9: 63:36
= note: inside `vmsa::page_table::PageTable::<vmsa::address::VirtAddr, mm::page_table::L3Table, mm::page_table::entry::Entry, 512>::new_in` at /home/sangwan/islet/lib/vmsa/src/page_table.rs:159:13: 159:92
= note: inside `<vmsa::page_table::PageTable<vmsa::address::VirtAddr, mm::page_table::L2Table, mm::page_table::entry::Entry, 512> as vmsa::page_table::PageTableMethods<vmsa::address::VirtAddr, mm::page_table::L2Table, mm::page_table::entry::Entry, 512>>::set_page::<mm::page::BasePageSize>` at /home/sangwan/islet/lib/vmsa/src/page_table.rs:392:25: 394:26
= note: inside `<vmsa::page_table::PageTable<vmsa::address::VirtAddr, mm::page_table::L1Table, mm::page_table::entry::Entry, 512> as vmsa::page_table::PageTableMethods<vmsa::address::VirtAddr, mm::page_table::L1Table, mm::page_table::entry::Entry, 512>>::set_page::<mm::page::BasePageSize>` at /home/sangwan/islet/lib/vmsa/src/page_table.rs:400:13: 400:50
= note: inside `<vmsa::page_table::PageTable<vmsa::address::VirtAddr, mm::page_table::L1Table, mm::page_table::entry::Entry, 512> as vmsa::page_table::PageTableMethods<vmsa::address::VirtAddr, mm::page_table::L1Table, mm::page_table::entry::Entry, 512>>::set_pages::<mm::page::BasePageSize>` at /home/sangwan/islet/lib/vmsa/src/page_table.rs:215:19: 215:
Bug 2: Integer Overflow in RMI_RTT_SET_RIPAS
- Bug Type: Integer Overflow
-
Cause: An overflow occurs when the
topinRMI_RTT_SET_RIPASis smaller thanGRANULE_SIZE. - Solution: Implemented an overflow check and return an appropriate error when detected.
note: inside closure
--> rmm/src/rmi/rtt.rs:649:36
|
648 | #[test]
| ------- in this procedural macro expansion
649 | fn rmi_rtt_set_ripas_positive() {
| ^
= note: this warning originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)
thread 'rmi::rtt::test::rmi_rtt_set_ripas_positive' panicked at rmm/src/rmi/rtt.rs:138:32:
attempt to subtract with overflow
Spec Feedback: Ensuring realm_state validation in B4.3.15
It seems necessary to review between B4.3.15 and D1.2.2 for spec consistency.
- Specification Restrictions According to "D1.2.2 Realm Translation Table creation flow" subsequent levels of RTT can be created when the state of the Realm is REALM_NEW or REALM_ACTIVATE.
D1.2.2 Realm Translation Table creation flow Subsequent levels of RTT are added using the RMI_RTT_CREATE command. This can be performed when the state of the Realm is REALM_NEW or REALM_ACTIVE.
-
Missing Failure Conditions However in "B4.3.15 RMI_RTT_CREATE", the condition to check the realm_state is missing.
-
Suggestion It would be necessary either to add this realm_state validation within "B4.3.15.2 Failure conditions" or to clarify the details in D1.2.2 for better alignment.
Process of Applying MIRI Test
- Extract inputs and outputs of the RMI to be tested from positive test cases in the ACS tests.
- Write the MIRI test.
- Execute the MIRI test.
- Implement mocking for parts requiring simulation.
- Apply patches for any bugs encountered.
Challenges in Applying MIRI
Handling Provenance Issues with Host-Provided Memory
-
Problem: When using MIRI to test Rust code, "No provenance" errors may occur 1) during operations like
core::ptr::writeor 2) when converting externally allocated memory into specific Rust objects. This is because the memory addresses, not allocated by RMM, lack the required provenance that MIRI enforces. - Limitation: Currently, MIRI does not allow the disabling of provenance checks. Additionally, Rust does not natively support allocating memory at specific addresses, complicating this scenario further.
- Proposed Alternative: Instead of using externally provided memory addresses directly, simulate these addresses by allocating memory within RMM’s managed space and treat it as if it were external memory. This approach avoids provenance issues while allowing MIRI testing.
Realm Simulation
- Problem: MIRI cannot execute assembly code, making it impossible to directly enter the realm context.
- Solution: Write mocking for the necessary parts, simulating the execution of the realm and the invocation of RSI as if it were happening in the test.
Remaining Works
- Enable Stacked Borrows Check: Currently, the MIRI rule for "Stacked Borrows" is disabled as it is experimental. Plan to enable this check.
- Add Negative Test Cases: In addition to the positive test cases, add negative test cases to ensure more comprehensive testing.
Appendix: MIRI Test Cases
| Index | Command | Test Case | Test Result | Note |
|---|---|---|---|---|
| 1 | RMI_VERSION | rmi::version::test::rmi_version | PASS | |
| 2 | RMI_GRANULE_DELEGATE | rmi::gpt::test::rmi_granule_delegate_positive | PASS | Bug Found: Memory Leak in RMM Page Table |
| 3 | RMI_GRANULE_UNDELEGATE | rmi::gpt::test::rmi_granule_delegate_positive | PASS | |
| 4 | RMI_DATA_CREATE | rmi::rtt::test::rmi_data_create_positive | PASS | |
| 5 | RMI_DATA_CREATE_UNKNOWN | rmi::rtt::test::rmi_data_create_positive | PASS | |
| 6 | RMI_DATA_DESTROY | rmi::rtt::test::rmi_data_create_positive | PASS | |
| 7 | RMI_REALM_ACTIVATE | rmi::realm::test::rmi_realm_create_positive | PASS | |
| 8 | RMI_REALM_CREATE | rmi::realm::test::rmi_realm_create_positive | PASS | |
| 9 | RMI_REALM_DESTROY | rmi::realm::test::rmi_realm_create_positive | PASS | |
| 10 | RMI_REC_CREATE | rmi::realm::test::rmi_rec_create_positive | PASS | |
| 11 | RMI_REC_DESTROY | rmi::realm::test::rmi_rec_create_positive | PASS | |
| 12 | RMI_REC_ENTER | rmi::rec::handlers::test::rmi_rec_enter_positive | PASS | |
| 13 | RMI_RTT_CREATE | rmi::realm::test::rmi_rtt_create_positive | PASS | Spec Feedback: Misalignment in the realm state |
| 14 | RMI_RTT_DESTROY | rmi::realm::test::rmi_rtt_create_positive | PASS | |
| 15 | RMI_RTT_MAP_UNPROTECTED | rmi::rtt::test::rmi_rtt_map_unprotected_positive | PASS | |
| 16 | RMI_RTT_READ_ENTRY | rmi::realm::test::rmi_rtt_create_positive | PASS | |
| 17 | RMI_RTT_UNMAP_UNPROTECTED | rmi::rtt::test::rmi_rtt_map_unprotected_positive | PASS | |
| 18 | RMI_PSCI_COMPLETE | rmi::rec::handlers::test::rmi_rec_enter_positive | PASS | |
| 19 | RMI_FEATURES | rmi::features::test::rmi_features | PASS | |
| 20 | RMI_RTT_FOLD | rmi::rtt::test::rmi_rtt_fold_positive | PASS | |
| 21 | RMI_REC_AUX_COUNT | rmi::realm::test::rmi_rec_create_positive | PASS | |
| 22 | RMI_RTT_INIT_RIPAS | rmi::realm::test::rmi_rtt_init_positive | PASS | |
| 23 | RMI_RTT_SET_RIPAS | rmi::realm::test::rmi_rtt_set_positive | PASS | Bug Found: Integer Overflow |
Related Issue: https://github.com/islet-project/islet/issues/361
Regarding "Memory Leak in the RMM Page Table", I think the simplest way is to drop unnecessary tables when a realm gets activated. A lot of RMI calls are not allowed for an activated realm (e.g., DATA_CREATE) so it's reasonable to say that it's unlikely further RMM page table mappings are needed for the realm.
- Realm-A in the launch phase (e.g., granule_delegate & data_create) --> create sub tables for the PA range 0x10000-0x20000 for example.
- Realm-A gets activated --> clear out unnecessary sub tables --> reasonable, it's unlikely that the host requests the same PA range 0x10000--0x20000 (other realms are likely to call RMIs with different physical memory ranges)
Benefit? we can avoid performance overheads of creating and deleting sub tables while launching Realm-A (before being activated).
Regarding "Memory Leak in the RMM Page Table", I think the simplest way is to drop unnecessary tables when a realm gets activated. A lot of RMI calls are not allowed for an activated realm (e.g., DATA_CREATE) so it's reasonable to say that it's unlikely further RMM page table mappings are needed for the realm.
- Realm-A in the launch phase (e.g., granule_delegate & data_create) --> create sub tables for the PA range 0x10000-0x20000 for example.
- Realm-A gets activated --> clear out unnecessary sub tables --> reasonable, it's unlikely that the host requests the same PA range 0x10000--0x20000 (other realms are likely to call RMIs with different physical memory ranges)
Benefit? we can avoid performance overheads of creating and deleting sub tables while launching Realm-A (before being activated).
The proposed approach seems practical and reasonable. As suggested, we can break down the process (1. delegate & data-create , 2. realm-activate) and remove unnecessary sub-tables at specific stages (when the realm is activated). To achieve this, we will need to store information related to the previously used Realm, but this shouldn't pose a significant problem.
However, I have one concern; if someone intentionally does not activate the Realm and repeatedly triggers only the previous stage (delegate & data-create), it still could lead to the exhaustion of RMM's memory. It would be beneficial to include a solution for this scenario as well.
if someone intentionally does not activate the Realm
This is crucial. Agree, we should figure out the way to deal with this. Just setting up a condition that triggers "dropping sub tables" suffices to do this. (then, we don't need the function to drop subtables when it gets activated)
(1) keeps track of the number of sub tables; (2) when the number exceeds a threshold value T (statically configurable), it calls the function to drop sub tables;