Consider consolidating common tests for Catalog implementations in a new crate
Some context here: https://github.com/apache/iceberg-rust/pull/503#issuecomment-2267076323
As part of the MemoryCatalog implementation, I wrote ~1500 lines worth of tests to comprehensively exercise catalog behaviour. These tests are now being reused in the SqlCatalog implementation by copy-pasting.
We should consider a more DRY approach like creating a new crate with those tests that can be called by a function, maybe something like this:
fn test_catalog<C: Catalog>(catalog: &C) {
todo!()
}
Although it might be better to pass in a function that can setup a new catalog so that each test can work on a fresh instance e.g.
/// Takes a function that accepts a FileIO and an optional warehouse_location and returns a fresh Catalog instance
fn test_catalog<C: Catalog>(setup_catalog_fn: impl FnOnce(FileIO, Option<String>) -> C) {
todo!()
}
The main concern I have is that the existing catalogs don't all follow exactly the same behaviour, there are small, subtle differences in behaviour like checking for tables before dropping a namespace, error messaging, etc. It may or may not be worth aligning behaviours but we can discuss all of that when we get there.
The other concern I have is developer-experience. We should still strive for clear test failures. I'm not sure a function-based approach as illustrated above will offer a good DevX but it may still be worth the compromise. In other languages, this would just be an abstract class for me but I'm not sure what the Rust equivalent is at this moment in time.
This is currently blocked by: https://github.com/apache/iceberg-rust/pull/503
Hi, @fqaiser94 Thanks for raising this, and I'm definitely +1 for this proposal to avoid duplicate codes!
About the overall design, inspired by your design, I think maybe we can have origanize things like following:
struct CatalogTestSuite<C> {
catalog: C,
file_io: FileIO
}
impl<C: Catalog> CatalogTestSuite {
async fn test_catalog_create_empty_namespace(self) {
...
}
async fn test_catalog_create_empty_hierarchy_namespace(self) {
...
}
...
}
Then in each catalog tests module, we can just have tests to setup this CatalogTestSuite and call different method in different case:
#[tokio::test]
async fn test_create_empty_namespace() {
let suite = CatalogTestSuite {}
suite.test_catalog_create_empty_namespace().await
}
With this approach we have boilerplate codes in each catalog test module, but I think this is a trade off. We can use macros or libtest_mimic to remove duplicate codes as much as possible, but the problem is that it makes running single test in ide a little more difficult.
The main concern I have is that the existing catalogs don't all follow exactly the same behaviour, there are small, subtle differences in behaviour like checking for tables before dropping a namespace, error messaging, etc. It may or may not be worth aligning behaviours but we can discuss all of that when we get there.
I think there are two problems here:
- Not all catalogs support exactly all test cases. For example, IIRC, hive catalog doesn't support nested namespaces, and rest catalog's support for namespaces is not determined yet.
- About the subtle behaviors differentct(such as check table), I think we should align them, but we don't need to do them together, we can do them one by one once we have finished the necessary preparatory work.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'