talSofer
talSofer
Currently, lakeFSFS support Hadoop 2, and we need to add support for Hadoop 3. DOD: - Hadoop 3 spec tests pass - lakeFSFS depends only on matching Hadoop packages. -...
Currently, GC integration tests are implemented in bash. This slows down our dev speed significantly and we want to move the tests to another language.
Use https://github.com/treeverse/lakeFS/tree/master/cmd/lakefs-loadtest to run load tests on a recurring cadence, for main lakeFS workflows.
https://github.com/treeverse/lakeFS/runs/7628981215?check_suite_focus=true
DOD - - Removed objects are captured - GC end-of-run report correctly reports the objects that were deleted successfully
Address the TODO in https://github.com/treeverse/lakeFS/blob/5b7547c5606120bb1869bc5ab03eecd2555e1157/clients/spark/core/src/main/scala/io/treeverse/clients/GarbageCollector.scala#L263 This issue requires to understand how previousRunID is currently used by GC, and talk to @guy-har to understand the problems and potential solutions. See performance...
Currently, we only publish the non-assembled Spark metadata client to Maven central. This task is to do some discovery and decide whether we want to publish the assembled jar too.
In this task, we will configure a working lakeFSIceberg setup. The task's output should be: - docs page for how to configure Iceberg to work with lakeFS, and the use...
Test with large dataset to find problems. DoD: - GC finishes successfully on large-scale data set. - There are issues for any problems found.