atomic-server icon indicating copy to clipboard operation
atomic-server copied to clipboard

consider alternative KV store

Open jonassmedegaard opened this issue 3 years ago • 15 comments

I noticed in a Safenet discussion a mention that sled "is buggy and doesn’t seem to be actively maintained", with Persy and Cacache being their current candidates for replacement.

jonassmedegaard avatar Jun 10 '22 13:06 jonassmedegaard

I haven't yet encountered any bugs in sled in the past two years, so I don't think it's worryingly buggy for my usecase. But I do have some worries about active maintenance. I've reached out to the maintainer some time ago, who told me he's working on a large low-level library that he'll integrate in Sled too.

Some requirements for alternatives:

  • Embeddable KV database
  • Fast
  • range and prefix queries possible

Some options:

  • ReDB. Impressive benchmarks, faster than pretty much anything else! But it's quite new and doesn't have a stable format yet. Definitely keep an eye on this one!
  • Reddit thread with more contenders
  • TiKV (multi-node!)
  • OpenDAL (supports sled, tikv, redb, so users can choose for themselves)

joepio avatar Jun 10 '22 13:06 joepio

I walked through a similar impressive list of contenders for embedded DB, before finding the atomic data server and it's not as clean-cut: for example, indradb has sled or Postgres as a dependency. While doing research I started working on a testing bench for common data structures in Rust used for those, but I run out of steam. If we want to consider moving to different databases I would start with benchmarks - expand our criterion benchmarks to cover as many cases as possible and then select a handful to try with the new backend. I believe we can get a lot of improvements using in-memory data structures for the cache - like dashmap, before we need to move from sled. One more thought: team at https://github.com/Synerise/cleora found it's faster to re-build graph structure from data -they use sparse matrix of nodes and edges stored in FxHash than to deserialize it from serde.

AlexMikhalev avatar Jun 14 '22 10:06 AlexMikhalev

Update on sled: maintainer of sled is working on it in the background, mostly on a new storage engine. So sled ain't dead, baby.

Some other thoughts:

  • Switching to Redis might help to achieve multi-node setup #213, although it is not embeddable. Maybe we need some sort of abstraction that allows users to switch KV store? Wouldn't be too complex, I think.
  • Cloudflare's KV store might be interesting, too, as it allows for an edge deploy. Would probably involve rewriting far more, though.

joepio avatar Oct 04 '22 07:10 joepio

Update on sled: maintainer of sled is working on it in the background, mostly on a new storage engine. So sled ain't dead, baby.

Do note that the new engine is licensed under GPL3. I'm not familiar with how sled is being used in your project, but it may be incompatible with your MIT license.

https://github.com/komora-io/marble/blob/main/Cargo.toml#L7

netthier avatar Dec 30 '22 11:12 netthier

@netthier That could very well be a problem, thanks! I've sent a mail to sled's maintainer.

Relevant issue in marble: https://github.com/komora-io/marble/issues/7

joepio avatar Dec 30 '22 11:12 joepio

I propose to hook into Apache OpenDAL (Data Access Library), I was going to use it to handle s3 uploads and writes, but it supports in memory, sled/dash map/redis in addition to all major cloud services + IPFS. Fully functional example:

use log::debug;
use log::info;
use opendal::layers::LoggingLayer;
use opendal::Scheme;
use std::collections::HashMap;
use std::env;
use opendal::services;
use opendal::Operator;
use opendal::Result;

#[tokio::main]
async fn main() -> Result<()> {
    let _ = tracing_subscriber::fmt()
    .with_env_filter("info")
    .try_init();
    let schemes = [Scheme::S3, Scheme::Memory, Scheme::Dashmap, Scheme::Sled, Scheme::Redis];
    
    for scheme in schemes.iter() {
        info!("scheme: {:?}", scheme);
        read_and_write(*scheme).await?;
    }

    Ok(())
}

async fn read_and_write(scheme:Scheme) -> Result<()>{
    // Write data into object test and read it back
    let op = match scheme {
        Scheme::S3 => {
            let op = init_operator_via_map()?;
            debug!("operator: {op:?}");
            op

        },
        Scheme::Dashmap => {
            let builder = services::Dashmap::default();
            // Init an operator
            let op = Operator::new(builder)?
                // Init with logging layer enabled.
                .layer(LoggingLayer::default())
                .finish();
                debug!("operator: {op:?}");
                op

        },
        Scheme::Sled => {
            let mut builder = services::Sled::default();
            builder.datadir("/tmp/opendal/sled");
            // Init an operator
            let op = Operator::new(builder)?
                // Init with logging layer enabled.
                .layer(LoggingLayer::default())
                .finish();
                debug!("operator: {op:?}");
                op
        },
        Scheme::Redis => {
            let builder = services::Redis::default();
            // Init an operator
            let op = Operator::new(builder)?
                // Init with logging layer enabled.
                .layer(LoggingLayer::default())
                .finish();
                debug!("operator: {op:?}");
                op
        },
        _=>{    
            let builder = services::Memory::default();
            // Init an operator
            let op = Operator::new(builder)?
                // Init with logging layer enabled.
                .layer(LoggingLayer::default())
                .finish();
                debug!("operator: {op:?}");
                op

        }

        
    };
    // Write data into object test.
    let test_string = format!("Hello, World! {scheme}");
    op.write("test", test_string).await?;

    // Read data from object.
    let bs = op.read("test").await?;
    info!("content: {}", String::from_utf8_lossy(&bs));

    // Get object metadata.
    let meta = op.stat("test").await?;
    info!("meta: {:?}", meta);

    Ok(())

}

fn init_operator_via_map() -> Result<Operator> {
    // setting up the credentials
let access_key_id = env::var("AWS_ACCESS_KEY_ID").expect("AWS_ACCESS_KEY_ID is set and a valid String");
let secret_access_key = env::var("AWS_SECRET_ACCESS_KEY").expect("AWS_ACCESS_KEY_ID is set and a valid String");

    let mut map = HashMap::default();
    map.insert("bucket".to_string(), "test".to_string());
    map.insert("region".to_string(), "us-east-1".to_string());
    map.insert("endpoint".to_string(), "http://rpi4node3:8333".to_string());
    map.insert("access_key_id".to_string(), access_key_id.to_string());
    map.insert(
        "secret_access_key".to_string(),
        secret_access_key.to_string(),
    );

    let op = Operator::via_map(Scheme::S3, map)?;
    Ok(op)
}

AlexMikhalev avatar Jun 19 '23 16:06 AlexMikhalev

This one: https://github.com/apache/incubator-opendal

AlexMikhalev avatar Jun 19 '23 16:06 AlexMikhalev

Wow @AlexMikhalev that looks really promising! Seems like it supports [scan] so that's good, althought it's missing in the Sled connector. I'm also wondering if it has Tree support, see issues:

https://github.com/apache/incubator-opendal/issues/2498

https://github.com/apache/incubator-opendal/issues/2497

joepio avatar Jun 19 '23 18:06 joepio

Hi, I'm the maintainer of OpenDAL. Thanks for @AlexMikhalev's sharing and @joepio's contact!

I'm here to bring some updates from OpenDAL side:

  • https://github.com/apache/incubator-opendal/issues/2497 is fixed.
  • https://github.com/apache/incubator-opendal/issues/2498 is almost done and waiting for some review, I expect it will be released soon.

Apart from existing issues, I'm interesed in adding support for more services so our users can have more choices:

  • https://github.com/apache/incubator-opendal/issues/2518
  • https://github.com/apache/incubator-opendal/issues/2522
  • https://github.com/apache/incubator-opendal/issues/2523
  • https://github.com/apache/incubator-opendal/issues/2524

Please feel free to let me know if there is anything I can help you with!

Xuanwo avatar Jun 24 '23 07:06 Xuanwo

@Xuanwo awesome. A small example (like example 2 in your plans) of how to use OpenDal from tokio async functions will help me personally - I am building complementary to atomic product, https://terraphim.ai/ and I want to plug OpenDal operator instead of redis.rs KV.

AlexMikhalev avatar Jun 24 '23 10:06 AlexMikhalev

@Xuanwo awesome. A small example (like example 2 in your plans) of how to use OpenDal from tokio async functions will help me personally - I am building complementary to atomic product, https://terraphim.ai/ and I want to plug OpenDal operator instead of redis.rs KV.

Thanks for the feedback! I will write one tomorrow 🤪

Xuanwo avatar Jun 24 '23 10:06 Xuanwo

OpenDAL now also supports TiKV! https://github.com/apache/incubator-opendal/issues/2533

This opens up multi-node setups for Atomic-Server.

joepio avatar Jul 24 '23 07:07 joepio

Hi, I'm the maintainer of OpenDAL. Thanks for @AlexMikhalev's sharing and @joepio's contact!

I'm here to bring some updates from OpenDAL side:

Apart from existing issues, I'm interesed in adding support for more services so our users can have more choices:

Please feel free to let me know if there is anything I can help you with!

One month later, OpenDAL community implemented all the issues :rocket:!

Xuanwo avatar Jul 24 '23 07:07 Xuanwo