search-engine-workshop
search-engine-workshop copied to clipboard
Slides and notebook for the workshop on building a search system
Search Engine Workshop
About
Handson workshop for building a semantic search engine.
Setup
If you came to this repo, during a workshop visit this custom jupyter hub with all the dependencies already set up.
The repo is located at npatta01/search-engine-workshop
To use this repo outside a workshop, please use Binder
Content (Notebooks)
Data Fetching
setup notebook
stats notebook
sample image notebook
Notebooks to download unsplash dataset and save as hugging face dataset format
Non Deep Learning Retrieval
BM25 retrieval with elastic search: notebook
Deep Learning Retrieval (text)
Text Deep Learning retrieval: Link
Deep Learning Retrieval (image)
Clip Retrieval: Link
ANN
Shows how to speed up Deep Learning retrieval by exploring different ANN indexes Link
Slides
PyData Seattle 2022
PyData NYC 2022
ODSC 2022
Contact
For help or feedback, please reach out to :
Acknowledgments
This workshop uses Unsplash Lite Dataset 1.2.0 link
The hands on portion of the workshop was made possible due to JupyterHub Helm Chart
Changelog
v1.1
- setup for PyDataNYC
- replaced stackoverflow data with unsplash data
v1.0
- setup for ODSC
- used stackoverflow data