versatile-data-kit
versatile-data-kit copied to clipboard
Record the demo
What is the feature request? What problem does it solve? We want a pre recorded demo and slides. Following the outline below create a fully fleshed out demo.
Move to Slides
Describe the problem scenario.
You are tasked with building a RAG pipeline which has access to private confluence pages.
The company has a very active confluence and so we want to publish the latest updates every 6 hours.
For your solution you want to use
1. langchain to read data from confluence and chunk
2. huggingface sentence_transformers to embed paragraphs
3. llamaindex to insert into postgres
Move to code
Start with a basic example of this written locally. Probably less than 15 lines of code
Give a quick explanation of what it is doing.
Move back to slides
Say that this is a good start but point out all the difficulties with getting this to production
1. How do we package it to run in the cloud? We want to be able to deploy it to run in the cloud easily and it to be setup so others can easily make changes.
2. We want it to run every 6 hours. with just the code running locally we have no idea what's the best way to run it only every 6 hours
3. When each new run starts we want to make sure we only publish new records, i.e we need to persiss the timestamp of the last run
4. We want to be alerted in the case of failures
5. How do we handle the postgres password
Move to code
Using the VDK sdk create a data job. copy the code above into it.
then
1. Add dependencies to the requirements.txt file
2. change the postgres password to read a secret from vdk to get that value.
3. save the last read state in a VDK property
Then deploy and execute the job.
Move back to slides
Reiterate all the benefits they are getting and what a small amount of changes they had to make to their code.
Point out that they can use whatever libraries and frameworks they want
Talk about how DAGs can be used me make code more more modular. [Yoan Salambashev](https://confluence.eng.vmware.com/display/~ysalambashev) do an image here of a DAG
Show swagger docs and how everything is API driven
Missing
What is missing here is how we would handle scale, but if they ask about that we can discuss. But I think best not to try demo it