islet Cover all RMIs with MIRI testing

This PR aims to introduce MIRI testing across all RMI functions to strengthen memory safety within RMM, particularly by addressing potential issues arising from unsafe code.

To cover all 23 RMI types, 11 new test cases (TCs) have been added (see Appendix: MIRI Test Cases for details), leading to the discovery of 2 bugs (1 memory leak, 1 integer overflow). Additionally, while writing the MIRI test cases, a manual review revealed a misalignment within the RMM specification, which was reported to Arm.

How to run

 $ ./scripts/tests/miri.sh
running 22 tests
test r#macro::test::eprintln_with_format ... ok
test r#macro::test::eprintln_without_arg ... ok
test r#macro::test::eprintln_without_format ... ok
test r#macro::test::println_with_format ... ok
test r#macro::test::println_without_arg ... ok
test r#macro::test::println_without_format ... ok
test r#macro::test::set_of_const_assert ... ok
test rmi::features::test::rmi_features ... ok
test rmi::gpt::test::rmi_granule_delegate_negative ... ok
test rmi::gpt::test::rmi_granule_delegate_positive ... ok
test rmi::gpt::test::rmi_granule_undelegate ... ok
test rmi::realm::test::rmi_realm_create_negative ... ok
test rmi::realm::test::rmi_realm_create_positive ... ok
test rmi::rec::handlers::test::rmi_rec_create_positive ... ok
test rmi::rec::handlers::test::rmi_rec_enter_positive ... ok
test rmi::rtt::test::rmi_data_create_positive ... ok
test rmi::rtt::test::rmi_rtt_create_positive ... ok
test rmi::rtt::test::rmi_rtt_fold_positive ... ok
test rmi::rtt::test::rmi_rtt_init_ripas_positive ... ok
test rmi::rtt::test::rmi_rtt_map_unprotected_positive ... ok
test rmi::rtt::test::rmi_rtt_set_ripas_positive ... ok
test rmi::version::test::rmi_version ... ok

test result: ok. 22 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 144.88s

Discovered Bugs

Bug 1: Memory Leak in the RMM Page Table

Bug Type: Memory Leak
Cause: During the process of mapping/unmapping the RMM page table, level 1-3 sub-tables were not released even when they contained no entries.
Temporary Solution: As a temporary measure, the RMM page table is dropped upon completion of the MIRI test.
Future Discussion: Creating and destroying sub-tables in the RMM page table with each GRANULE_DELEGATE call may result in performance degradation. It is necessary to find a solution that addresses the memory leak while minimizing performance impacts.
Note: The RMM page table is managed as a static variable, and in Rust, the drop() function for static variables is not invoked. As a result, sub-tables used by RMM continue to allocate memory without being released.

$ cargo miri test rmi::gpt::test::rmi_granule_delegate

error: memory leaked: alloc56894755 (Rust heap, size: 4096, align: 4096), allocated here:
   --> /home/sangwan/.rustup/toolchains/nightly-2024-04-21-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:100:9
    |
100 |         __rust_alloc(layout.size(), layout.align())
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: BACKTRACE:
    = note: inside `alloc::alloc::alloc` at /home/sangwan/.rustup/toolchains/nightly-2024-04-21-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:100:9: 100:52
    = note: inside `<vmsa::page_table::DefaultMemAlloc as vmsa::page_table::MemAlloc>::allocate` at /home/sangwan/islet/lib/vmsa/src/page_table.rs:63:9: 63:36
    = note: inside `vmsa::page_table::PageTable::<vmsa::address::VirtAddr, mm::page_table::L3Table, mm::page_table::entry::Entry, 512>::new_in` at /home/sangwan/islet/lib/vmsa/src/page_table.rs:159:13: 159:92
    = note: inside `<vmsa::page_table::PageTable<vmsa::address::VirtAddr, mm::page_table::L2Table, mm::page_table::entry::Entry, 512> as vmsa::page_table::PageTableMethods<vmsa::address::VirtAddr, mm::page_table::L2Table, mm::page_table::entry::Entry, 512>>::set_page::<mm::page::BasePageSize>` at /home/sangwan/islet/lib/vmsa/src/page_table.rs:392:25: 394:26
    = note: inside `<vmsa::page_table::PageTable<vmsa::address::VirtAddr, mm::page_table::L1Table, mm::page_table::entry::Entry, 512> as vmsa::page_table::PageTableMethods<vmsa::address::VirtAddr, mm::page_table::L1Table, mm::page_table::entry::Entry, 512>>::set_page::<mm::page::BasePageSize>` at /home/sangwan/islet/lib/vmsa/src/page_table.rs:400:13: 400:50
    = note: inside `<vmsa::page_table::PageTable<vmsa::address::VirtAddr, mm::page_table::L1Table, mm::page_table::entry::Entry, 512> as vmsa::page_table::PageTableMethods<vmsa::address::VirtAddr, mm::page_table::L1Table, mm::page_table::entry::Entry, 512>>::set_pages::<mm::page::BasePageSize>` at /home/sangwan/islet/lib/vmsa/src/page_table.rs:215:19: 215:

Bug 2: Integer Overflow in RMI_RTT_SET_RIPAS

Bug Type: Integer Overflow
Cause: An overflow occurs when the top in RMI_RTT_SET_RIPAS is smaller than GRANULE_SIZE.
Solution: Implemented an overflow check and return an appropriate error when detected.

note: inside closure
   --> rmm/src/rmi/rtt.rs:649:36
    |
648 |     #[test]
    |     ------- in this procedural macro expansion
649 |     fn rmi_rtt_set_ripas_positive() {
    |                                    ^
    = note: this warning originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)

thread 'rmi::rtt::test::rmi_rtt_set_ripas_positive' panicked at rmm/src/rmi/rtt.rs:138:32:
attempt to subtract with overflow

Spec Feedback: Ensuring realm_state validation in B4.3.15

It seems necessary to review between B4.3.15 and D1.2.2 for spec consistency.

Specification Restrictions According to "D1.2.2 Realm Translation Table creation flow" subsequent levels of RTT can be created when the state of the Realm is REALM_NEW or REALM_ACTIVATE.

D1.2.2 Realm Translation Table creation flow Subsequent levels of RTT are added using the RMI_RTT_CREATE command. This can be performed when the state of the Realm is REALM_NEW or REALM_ACTIVE.

Missing Failure Conditions However in "B4.3.15 RMI_RTT_CREATE", the condition to check the realm_state is missing.
Suggestion It would be necessary either to add this realm_state validation within "B4.3.15.2 Failure conditions" or to clarify the details in D1.2.2 for better alignment.

Process of Applying MIRI Test

Extract inputs and outputs of the RMI to be tested from positive test cases in the ACS tests.
Write the MIRI test.
Execute the MIRI test.
Implement mocking for parts requiring simulation.
Apply patches for any bugs encountered.

Challenges in Applying MIRI

Handling Provenance Issues with Host-Provided Memory

Problem: When using MIRI to test Rust code, "No provenance" errors may occur 1) during operations like core::ptr::write or 2) when converting externally allocated memory into specific Rust objects. This is because the memory addresses, not allocated by RMM, lack the required provenance that MIRI enforces.
Limitation: Currently, MIRI does not allow the disabling of provenance checks. Additionally, Rust does not natively support allocating memory at specific addresses, complicating this scenario further.
Proposed Alternative: Instead of using externally provided memory addresses directly, simulate these addresses by allocating memory within RMM’s managed space and treat it as if it were external memory. This approach avoids provenance issues while allowing MIRI testing.

Realm Simulation

Problem: MIRI cannot execute assembly code, making it impossible to directly enter the realm context.
Solution: Write mocking for the necessary parts, simulating the execution of the realm and the invocation of RSI as if it were happening in the test.

Remaining Works

Enable Stacked Borrows Check: Currently, the MIRI rule for "Stacked Borrows" is disabled as it is experimental. Plan to enable this check.
Add Negative Test Cases: In addition to the positive test cases, add negative test cases to ensure more comprehensive testing.

Appendix: MIRI Test Cases

Index	Command	Test Case	Test Result	Note
1	RMI_VERSION	rmi::version::test::rmi_version	PASS
2	RMI_GRANULE_DELEGATE	rmi::gpt::test::rmi_granule_delegate_positive	PASS	Bug Found: Memory Leak in RMM Page Table
3	RMI_GRANULE_UNDELEGATE	rmi::gpt::test::rmi_granule_delegate_positive	PASS
4	RMI_DATA_CREATE	rmi::rtt::test::rmi_data_create_positive	PASS
5	RMI_DATA_CREATE_UNKNOWN	rmi::rtt::test::rmi_data_create_positive	PASS
6	RMI_DATA_DESTROY	rmi::rtt::test::rmi_data_create_positive	PASS
7	RMI_REALM_ACTIVATE	rmi::realm::test::rmi_realm_create_positive	PASS
8	RMI_REALM_CREATE	rmi::realm::test::rmi_realm_create_positive	PASS
9	RMI_REALM_DESTROY	rmi::realm::test::rmi_realm_create_positive	PASS
10	RMI_REC_CREATE	rmi::realm::test::rmi_rec_create_positive	PASS
11	RMI_REC_DESTROY	rmi::realm::test::rmi_rec_create_positive	PASS
12	RMI_REC_ENTER	rmi::rec::handlers::test::rmi_rec_enter_positive	PASS
13	RMI_RTT_CREATE	rmi::realm::test::rmi_rtt_create_positive	PASS	Spec Feedback: Misalignment in the realm state
14	RMI_RTT_DESTROY	rmi::realm::test::rmi_rtt_create_positive	PASS
15	RMI_RTT_MAP_UNPROTECTED	rmi::rtt::test::rmi_rtt_map_unprotected_positive	PASS
16	RMI_RTT_READ_ENTRY	rmi::realm::test::rmi_rtt_create_positive	PASS
17	RMI_RTT_UNMAP_UNPROTECTED	rmi::rtt::test::rmi_rtt_map_unprotected_positive	PASS
18	RMI_PSCI_COMPLETE	rmi::rec::handlers::test::rmi_rec_enter_positive	PASS
19	RMI_FEATURES	rmi::features::test::rmi_features	PASS
20	RMI_RTT_FOLD	rmi::rtt::test::rmi_rtt_fold_positive	PASS
21	RMI_REC_AUX_COUNT	rmi::realm::test::rmi_rec_create_positive	PASS
22	RMI_RTT_INIT_RIPAS	rmi::realm::test::rmi_rtt_init_positive	PASS
23	RMI_RTT_SET_RIPAS	rmi::realm::test::rmi_rtt_set_positive	PASS	Bug Found: Integer Overflow

Related Issue: https://github.com/islet-project/islet/issues/361

Oct 07 '24 05:10 bitboom

Regarding "Memory Leak in the RMM Page Table", I think the simplest way is to drop unnecessary tables when a realm gets activated. A lot of RMI calls are not allowed for an activated realm (e.g., DATA_CREATE) so it's reasonable to say that it's unlikely further RMM page table mappings are needed for the realm.

Realm-A in the launch phase (e.g., granule_delegate & data_create) --> create sub tables for the PA range 0x10000-0x20000 for example.
Realm-A gets activated --> clear out unnecessary sub tables --> reasonable, it's unlikely that the host requests the same PA range 0x10000--0x20000 (other realms are likely to call RMIs with different physical memory ranges)

Benefit? we can avoid performance overheads of creating and deleting sub tables while launching Realm-A (before being activated).

Oct 10 '24 06:10 jinbpark

Regarding "Memory Leak in the RMM Page Table", I think the simplest way is to drop unnecessary tables when a realm gets activated. A lot of RMI calls are not allowed for an activated realm (e.g., DATA_CREATE) so it's reasonable to say that it's unlikely further RMM page table mappings are needed for the realm.

Realm-A in the launch phase (e.g., granule_delegate & data_create) --> create sub tables for the PA range 0x10000-0x20000 for example.

Realm-A gets activated --> clear out unnecessary sub tables --> reasonable, it's unlikely that the host requests the same PA range 0x10000--0x20000 (other realms are likely to call RMIs with different physical memory ranges)

Benefit? we can avoid performance overheads of creating and deleting sub tables while launching Realm-A (before being activated).

The proposed approach seems practical and reasonable. As suggested, we can break down the process (1. delegate & data-create , 2. realm-activate) and remove unnecessary sub-tables at specific stages (when the realm is activated). To achieve this, we will need to store information related to the previously used Realm, but this shouldn't pose a significant problem.

However, I have one concern; if someone intentionally does not activate the Realm and repeatedly triggers only the previous stage (delegate & data-create), it still could lead to the exhaustion of RMM's memory. It would be beneficial to include a solution for this scenario as well.

Oct 10 '24 06:10 bitboom

if someone intentionally does not activate the Realm

This is crucial. Agree, we should figure out the way to deal with this. Just setting up a condition that triggers "dropping sub tables" suffices to do this. (then, we don't need the function to drop subtables when it gets activated)

(1) keeps track of the number of sub tables; (2) when the number exceeds a threshold value T (statically configurable), it calls the function to drop sub tables;

Oct 10 '24 07:10 jinbpark