rain icon indicating copy to clipboard operation
rain copied to clipboard

Roadmap after v0.3

Open gavento opened this issue 7 years ago • 0 comments

A document to track the directions from 0.3, replacing #26. Our mid- and long-term goals, their [priority], (asignee) and any sub-tasks.

Any help is welcome with mentoring available for most tasks!

Remaining enhancements from v0.3

Will be updated after prioritization discussion.

Client-side protocols

Replace capnp RPC and the current monitoring dashboard HTTP API with common protocol. Part of #11 (more discussion there) but specific to the public API.

  • [ ] Design the API calls (@gavento) [medium]
  • [ ] Implement in the server (@gavento) [medium]
  • [ ] Update in the Python API (using aiohttp for async API) (@gavento) [medium]
  • [ ] Update the dashboard (@gavento) [medium] (#38)

Improve the dashboard with more information and post-mortem analysis

  • [ ] Design and revamp the dashborad. Depends on the client API development (@gavento) [medium/low] (#38)
  • [x] Include stats for task/object groups and possibly names/labels from #32 [low] (#38)

Fix current bugs

  • [ ] #7 (occurs under heavy load only) [medium]
  • [ ] #13 (seems to be bound to Exoscale deployment) [high]

Custom tasks (subworkers) in more languages

  • [ ] Python subworker as a library [low] (run standalone scripts as opposed to defining them in the client only)

Easier deployment in the cloud

  • [ ] Deployment in the amazon cloud (@vojtechcima) [medium] (#37)

Packaging for easier deployment

Multiple options, priorities may vary. (@spirali)

  • [ ] AppImage/Snap packages [low] (we already have static binaries)
  • [ ] Deb/other distro packages [low]

Improve Python API

Pythonize the client API.

  • [ ] Draft content-type loaders/extensions (@gavento) [low]
  • [x] Task/object groups and names/labels (#32) [low]

Improve testing infrastructure

  • [ ] Scripts/containers/... to test deployment and running in a network. (@vojtechcima) [medium]
    • Test rain start and running on OpenStack, Exoscale, AWS. Does not have to be a part of CI (even for running locally). Depends on / part of #37.

More real-world code examples

Lower priority, best based on real use-cases. Ideas: numpy subtasks, C++/Rust subworkers

Enhancements to revisit in the (not so distant) future

  • Integration with some popular libraries
    • Apache Arrow content-type
      • Basic type and loading is implemented. We could add more operations (filter, split, merge, ...)
    • XGBoost tasks, etc ...
    • Why not now: Not clear what would be the demand
  • Worker configuration files (needed for common (CPU) and special resources (GPU), different subworker locatins and configurations, ...)
    • Partially done
    • Why not now: Needs to be thought-through (esp. w.r.t. resources), not needed now
  • Separate session construction and running (save/load session)
    • Why not now: Not clear what would be the use-cases, not difficult when API stabilized
  • Clients in other languages: Rust, C++, Java, ...
    • Why not now: Not clear what would be the demand. Easier after the protocol/Python API stabilization.
  • Scale the scheduler, benchmarks
    • There is a benchmark in utils/bench/simple_task_scaling.py. The results as of 0.2 are here.
    • Why not now: While eventually crucial, the scheduler is sufficient when there are <1000 tasks to be scheduled at once.

gavento avatar Jul 02 '18 15:07 gavento