embedJs icon indicating copy to clipboard operation
embedJs copied to clipboard

Adding loading (tried web url), adds 3 other duplicates

Open converseKarl opened this issue 1 year ago • 3 comments

before it would add one to the loader list, now in v0.78 it adds 3 other duplicates.

and on the debug console we get all this 1|platform | Error adding URL: [Error: lance error: Commit conflict for version 1195: There was a concurrent commit that conflicts with this one and it cannot be automatically resolved. Please rerun the operation off the latest version of the table. 1|platform | Transaction: Transaction { read_version: 1194, uuid: "b317b32c-fa93-4c90-aec7-c26663444642", operation: Delete { updated_fragments: [], deleted_fragment_ids: [598], predicate: "uniqueLoaderId = "WebLoader_7e57a7538098320500365556c0c96ea1"" }, tag: None } 1|platform | Conflicting Transaction: Some(Transaction { read_version: 1193, uuid: "eb1a0a1d-176e-4e0d-840c-8e8c949b02c1", operation: Append { fragments: [Fragment { id: 0, files: [DataFile { path: "8cccacb1-74c0-4190-bec8-d425d2736c52.lance", fields: [0, 1, 2, 3, 4, 5], column_indices: [], file_major_version: 0, file_minor_version: 0 }], deletion_file: None, physical_rows: Some(4) }] }, tag: None }), /home/build_user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-0.10.16/src/io/commit.rs:107:23] 1|platform | Error adding URL: [Error: lance error: Commit conflict for version 1336: There was a concurrent commit that conflicts with this one and it cannot be automatically resolved. Please rerun the operation off the latest version of the table. 1|platform | Transaction: Transaction { read_version: 1335, uuid: "036ad30e-4c63-4768-8984-4d74cf92f546", operation: Delete { updated_fragments: [], deleted_fragment_ids: [669], predicate: "uniqueLoaderId = "WebLoader_7e57a7538098320500365556c0c96ea1"" }, tag: None } 1|platform | Conflicting Transaction: Some(Transaction { read_version: 1334, uuid: "5ce59ff0-fc6d-4824-9068-d5131f6e6f36", operation: Append { fragments: [Fragment { id: 0, files: [DataFile { path: "fb294721-dfd9-4732-848f-92efcb7db637.lance", fields: [0, 1, 2, 3, 4, 5], column_indices: [], file_major_version: 0, file_minor_version: 0 }], deletion_file: None, physical_rows: Some(4) }] }, tag: None }), /home/build_user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-0.10.16/src/io/commit.rs:107:23] ^[[A^[[A

converseKarl avatar May 19 '24 09:05 converseKarl

Looks like a race condition with lanceDb. I will investigate and address this. But this is not a new issue, a rare one.

In the meantime, could you check if this issue is recurring? A few runs would help.

adhityan avatar May 19 '24 09:05 adhityan

Will do and let you know.

converseKarl avatar May 19 '24 11:05 converseKarl

I cleared everything down in the cache (including vector db - all clean) , everything under it removed, ran the same code in v0.78 that worked in v0.77 as before. I can confirm. I am using Lance and LMCache and and GTP3 Embedded Large.

when you add a URL via web loader

  1. 4 entries appear from the loaders method when retrieving this list (only 1 used to appear in v0.77 or less)
  2. These entries now have the same web loader id causing conflicts
  3. I notice many many documents (100's created in lance but this might be from a delete of the key from the loaders list) that i tried deleting one.

When i try delete one with deleteLoader(webloaderid, true) now there are conflicts i get 1|platform | Error removing embedding: [Error: lance error: Commit conflict for version 2: There was a concurrent commit that conflicts with this one and it cannot be automatically resolved. Please rerun the operation off the latest version of the table. 1|platform | Transaction: Transaction { read_version: 1, uuid: "bd1fa996-606c-415e-806a-11857e9ba3da", operation: Delete { updated_fragments: [], deleted_fragment_ids: [], predicate: "uniqueLoaderId = "WebLoader_1d0f5cba74525f0006ef1b1ae043b010"" }, tag: None } 1|platform | Conflicting Transaction: Some(Transaction { read_version: 1, uuid: "db01c718-4391-4bee-866a-506dd56bfd71", operation: Append { fragments: [Fragment { id: 0, files: [DataFile { path: "d809dd5c-be76-4172-bd88-9e59a2536f5a.lance", fields: [0, 1, 2, 3, 4, 5], column_indices: [], file_major_version: 0, file_minor_version: 0 }], deletion_file: None, physical_rows: Some(4) }] }, tag: None }), /home/build_user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-0.10.16/src/io/commit.rs:107:23]

I refresh the loaders list and now100's of entires appears all duplicates which is really bad. None of this behaviour happened in 0.77 or before.

Some idea on the code that's running

let ragApplicationBuilder, ragApplication; Code - API Setup function setup() { ragApplicationBuilder = new RAGApplicationBuilder().setQueryTemplate(prompt);

// Add other loaders and configurations as needed

ragApplication = await ragApplicationBuilder .setTemperature(0.2) .setEmbeddingModel(new OpenAi3LargeEmbeddings()) .setVectorDb(new LanceDb({ path: './db' })) .setCache(new LmdbCache({ path: './llmcache'})) .build(); }

// Add URL function addURL(url) { ragApplicationBuilder.addLoader(new WebLoader({ url: url })); ragApplication = await ragApplicationBuilder
.build(); }

converseKarl avatar May 19 '24 16:05 converseKarl

I am looking at the changes between versions 0.77 and 0.78 to identify what the issue could be.

adhityan avatar May 22 '24 11:05 adhityan

Thanks Kindly, really need a fix for this. I just know when resources are adding they should appear in the "loaders" list from the ragApplication object, and not be duplicated, when deleting one, the id for that loader type is removed and also from the vector index/llm cache. The ragApplication.loaders list should reflect that state, Many thanks!

converseKarl avatar May 22 '24 12:05 converseKarl

I couldn't find anything in this single commit https://github.com/llm-tools/embedJs/commit/3090f76c499976e2b4ca5fed18030225ab081954 that could have caused an issue like this. Are you sure this issue does not occur in version 0.77?

adhityan avatar May 22 '24 12:05 adhityan

Can you test the latest version 0.79 and let me know if you still see this issue? Thank you

adhityan avatar May 22 '24 12:05 adhityan

I've moved to v0.79 and no longer see duplicates. Also when i delete an item from the loader, it gets removed, and not the squillions of entries in the loader or rag no longer appear.

Great Job! thank you kindly!

converseKarl avatar May 24 '24 12:05 converseKarl