sqlite-vec icon indicating copy to clipboard operation
sqlite-vec copied to clipboard

Enable CI for actions/setup-python@v5

Open franciscojavierarceo opened this issue 1 year ago • 7 comments

I launched support for SQLite Vec in a recent version of Feast but, due to some CI issues, only released it to a subset of Python versions.

I was considering contributing to this project to add a CI to verify the Python package behavior. It would also help the Feast support.

The solution would add a github/worfklow with something like this:

name: unit-tests of Ubuntu and Mac

on: []
jobs:
  unit-test-python:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        python-version: [ "3.9", "3.10", "3.11"]
        os: [ ubuntu-latest, macos-13 ]
        exclude:
            python-version: "3.9"
    env:
      OS: ${{ matrix.os }}
      PYTHON: ${{ matrix.python-version }}
    steps:
      - uses: actions/checkout@v4
      - name: Setup Python
        id: setup-python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          architecture: x64
      - name: Install uv
        run: |
          curl -LsSf https://astral.sh/uv/install.sh | sh
      - name: Get uv cache dir
        id: uv-cache
        run: |
          echo "::set-output name=dir::$(uv cache dir)"
      - name: Install dependencies
        run: pip install sqlite_vec
      - name: run script
        run: python unit_tests.py

franciscojavierarceo avatar Aug 01 '24 16:08 franciscojavierarceo

@asg017 happy to take this on if you're good with it

franciscojavierarceo avatar Aug 01 '24 16:08 franciscojavierarceo

It now should work in all Python versions, if you update your KNN SQL queries to look like this:

select rowid, distance 
from vec_items 
where embedding match ? 
  and k = 10

Instead of:

select rowid, distance 
from vec_items 
where embedding match ? 
limit 10

The limit 10 syntax only works in SQLite versions 3.41+, which older Python versions typically dont have. But the k = 10 syntax should work on all versions of SQLite

I'm a bit hesitant to add a CI rule to test across multiple Python versions, and that can slow down the CI quite a lot. But if the k = 10 syntax doesn't work for you, happy to dig into it further!

asg017 avatar Aug 01 '24 16:08 asg017

Yeah I swapped that syntax but still encountered issues.

Here's the PR I have https://github.com/feast-dev/feast/pull/4333

I tried some changes and now it fails on 3.10 mac instead of 3.11 as well as Ubuntu.

It feels a bit like wack-a-mole which is why I thought adding the CI would help me. I can just make a fork I suppose and have the CI in mine.

franciscojavierarceo avatar Aug 01 '24 17:08 franciscojavierarceo

The error that's being raised is

E       sqlite3.OperationalError: no such module: vec0

franciscojavierarceo avatar Aug 02 '24 02:08 franciscojavierarceo

So I tried the latest version (v0.1.0) and created a simple example with the workflow below and there are some interesting issues. It looks like now these issues are associated with build time errors instead of code failures like I reported in the first run.

For what it's worth, here's the current list of OSs where things are failing (mostly installing)

  1. Failed a. 3.9 mac-latest b. 3.10 mac-latest c. 3.11.0-rc.2* mac-13 d. 3.11-rc.2 mac-latest e. 3.12 mac-13 f. 3.12 mac-latest
  2. Passed a. 3.9 ubuntu-latest b. 3.10-ubuntu-latest c. 3.10-mac-latest d. 3.10 mac-13 e. 3.11.0-rc.2 ubuntu-latest f. 3.12 ubuntu-latest

I briefly looked at your test.yaml workflow and noticed you're building python a bit differently so I'll try to see if it has something to do with using actions/setup-python.

name: unit-tests

on:
  pull_request:
  push:
    branches:
      - main
jobs:
  unit-test-python:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        python-version: [ "3.9", "3.10", "3.11", "3.12"]
        os: [ ubuntu-latest, macos-13, macos-latest ]
        exclude:
          - os: macos-13
            python-version: "3.9"
    env:
      OS: ${{ matrix.os }}
      PYTHON: ${{ matrix.python-version }}
    steps:
      - uses: actions/checkout@v4
      - name: Setup Python
        id: setup-python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          architecture: x64
      - name: Install uv
        run: |
          curl -LsSf https://astral.sh/uv/install.sh | sh
      - name: Get uv cache dir
        id: uv-cache
        run: |
          echo "::set-output name=dir::$(uv cache dir)"
      - name: Install dependencies
        run: pip install sqlite_vec==v0.1.0
      - name: run script
        run: python sqlite_vec_demo.py

And the sqlite_vec_demo.py file is just:

import sqlite3
import sqlite_vec

from typing import List
import struct


def serialize_f32(vector: List[float]) -> bytes:
    """serializes a list of floats into a compact "raw bytes" format"""
    return struct.pack("%sf" % len(vector), *vector)


def main() -> None:
    db = sqlite3.connect(":memory:")
    db.enable_load_extension(True)
    sqlite_vec.load(db)
    db.enable_load_extension(False)

    sqlite_version, vec_version = db.execute(
        "select sqlite_version(), vec_version()"
    ).fetchone()
    
    print(f"sqlite_version={sqlite_version}, vec_version={vec_version}")

    items = [
        (1, [0.1, 0.1, 0.1, 0.1]),
        (2, [0.2, 0.2, 0.2, 0.2]),
        (3, [0.3, 0.3, 0.3, 0.3]),
        (4, [0.4, 0.4, 0.4, 0.4]),
        (5, [0.5, 0.5, 0.5, 0.5]),
    ]
    query = [0.3, 0.3, 0.3, 0.3]

    db.execute("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[4])")

    with db:
        for item in items:
            db.execute(
                "INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)",
                [item[0], serialize_f32(item[1])],
            )

    rows = db.execute(
        """
          SELECT
            rowid,
            distance
          FROM vec_items
          WHERE embedding MATCH ?
          and k = 3
        """,
        [serialize_f32(query)],
    ).fetchall()

    print(rows)

if __name__ == "__main__":
    main()

*Note I had to use 3.11.0-rc.2 because of this thread.

franciscojavierarceo avatar Aug 05 '24 02:08 franciscojavierarceo

Looking at these logs: https://github.com/franciscojavierarceo/Python/actions/runs/10241600790/job/28330119760

Nearly all of the failure have to do with installing Python on github actions runners, and not with sqlite-vec.

But these fail with AttributeError: 'sqlite3.Connection' object has no attribute 'enable_load_extension' https://github.com/franciscojavierarceo/Python/actions/runs/10241600790/job/28330119760

For that: this is a MacOS thing, where recent MacOS versions block loading SQLite extensions on default Python builds. You'll need to use homebrew to install a new Python version that bundles its own SQLite build that allows extensions loading (or some other Python installer, actions/setup-python wont do this for you)

asg017 avatar Aug 05 '24 05:08 asg017

Yeah, that's what I was suggesting, too. Thanks for digging in as well.

I'll raise an issue with SQLite and tag it here. I've renamed this issue so it's more explicit in case someone else tries to do something similar.

franciscojavierarceo avatar Aug 05 '24 10:08 franciscojavierarceo