sqlalchemy icon indicating copy to clipboard operation
sqlalchemy copied to clipboard

The get_multi_indexes method in PGDialect does not work correctly with pgvector-rs

Open xianyuntang opened this issue 1 year ago • 1 comments

Describe the bug

When my database uses Postgres 16 with pgvector-rs and applies the following index, an unexpected error occurs when I attempt to auto-generate a revision through Alembic. The root cause is that the index in pgvector-rs is slightly different from the index in PostgreSQL

When I query the indexes using

SELECT
    indexname,
    indexdef
FROM
    pg_indexes
WHERE
    tablename = 'vectors_768';

I noticed that the PostgreSQL index is:

CREATE INDEX ix_vector_embedding ON public.vectors_768 USING hnsw (embedding vector_l2_ops) WITH (m='4', ef_construction='10');

However, the index from pgvector-rs is:

CREATE INDEX ix_vector_embedding ON public.vectors_768 USING vectors (embedding vector_cos_ops) WITH (options='[indexing.hnsw]+
                     | m = 4                                                                                                                         +
                     | ef_construction = 10');

To fix this issue, just need to modify the get_multi_indexes method in PGDialect.

if row["reloptions"]:
-    dialect_options["postgresql_with"] = dict([option.split("=") for option in row["reloptions"]])
+    dialect_options["postgresql_with"] = dict([option.split("=", 1) for option in row["reloptions"]])

Optional link from https://docs.sqlalchemy.org which documents the behavior that is expected

No response

SQLAlchemy Version in Use

2.0.31

DBAPI (i.e. the database driver)

asyncpg

Database Vendor and Major Version

Postgres 16.3 with pgvector-rs 0.2.1

Python Version

3.10.11

Operating system

OSX

To Reproduce

from sqlalchemy.ext.asyncio import async_sessionmaker, create_async_engine
from sqlalchemy.ext.asyncio import AsyncAttrs
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from sqlalchemy import Index
from pgvector.sqlalchemy import Vector

engine = create_async_engine("postgresql+asyncpg://username:password@localhost/pgdb")

AsyncSessionMaker = async_sessionmaker(
    bind=engine, autocommit=False, autoflush=False, expire_on_commit=False
)


class Base(AsyncAttrs, DeclarativeBase):
    pass


class Vector768(Base):
    __tablename__ = "vectors"

    __table_args__ = (
        Index(
            "ix_vector_embedding",
            "embedding",
            postgresql_using="hnsw",
            postgresql_with={"m": 4, "ef_construction": 10},
            postgresql_ops={"embedding": "vector_l2_ops"},
        ),
    )

    id: Mapped[int] = mapped_column(autoincrement=True, primary_key=True)

    embedding: Mapped[list[float]] = mapped_column(Vector(768))

Error

Traceback (most recent call last):
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/bin/alembic", line 8, in <module>
    sys.exit(main())
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/config.py", line 636, in main
    CommandLine(prog=prog).main(argv=argv)
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/config.py", line 626, in main
    self.run_cmd(cfg, options)
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/config.py", line 603, in run_cmd
    fn(
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/command.py", line 236, in revision
    script_directory.run_env()
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/script/base.py", line 582, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/util/pyfiles.py", line 95, in load_python_file
    module = load_module_py(module_id, path)
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/util/pyfiles.py", line 113, in load_module_py
    spec.loader.exec_module(module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Volumes/Data/Work/Other/alembic-reproduce/alembic/env.py", line 90, in <module>
    run_migrations_online()
  File "/Volumes/Data/Work/Other/alembic-reproduce/alembic/env.py", line 84, in run_migrations_online
    asyncio.run(run_async_migrations())
  File "/Users/tangxianyun/.asdf/installs/python/3.10.11/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Users/tangxianyun/.asdf/installs/python/3.10.11/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/Volumes/Data/Work/Other/alembic-reproduce/alembic/env.py", line 76, in run_async_migrations
    await connection.run_sync(do_run_migrations)
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/sqlalchemy/ext/asyncio/engine.py", line 886, in run_sync
    return await greenlet_spawn(
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 203, in greenlet_spawn
    result = context.switch(value)
  File "/Volumes/Data/Work/Other/alembic-reproduce/alembic/env.py", line 60, in do_run_migrations
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/runtime/environment.py", line 946, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/runtime/migration.py", line 616, in run_migrations
    for step in self._migrations_fn(heads, self):
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/command.py", line 212, in retrieve_migrations
    revision_context.run_autogenerate(rev, context)
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/autogenerate/api.py", line 570, in run_autogenerate
    self._run_environment(rev, migration_context, True)
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/autogenerate/api.py", line 617, in _run_environment
    compare._populate_migration_script(
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/autogenerate/compare.py", line 65, in _populate_migration_script
    _produce_net_changes(autogen_context, upgrade_ops)
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/autogenerate/compare.py", line 98, in _produce_net_changes
    comparators.dispatch("schema", autogen_context.dialect.name)(
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/util/langhelpers.py", line 310, in go
    fn(*arg, **kw)
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/autogenerate/compare.py", line 137, in _autogen_for_tables
    _compare_tables(
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/autogenerate/compare.py", line 249, in _compare_tables
    sqla_compat._reflect_table(inspector, t)
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/alembic/util/sqla_compat.py", line 359, in _reflect_table
    return inspector.reflect_table(table, None)
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 1526, in reflect_table
    _reflect_info = self._get_reflection_info(
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 2011, in _get_reflection_info
    indexes=run(self.get_multi_indexes),
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 1992, in run
    res = meth(filter_names=_fn, **kw)
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 1185, in get_multi_indexes
    self.dialect.get_multi_indexes(
  File "/Volumes/Data/Work/Other/alembic-reproduce/.venv/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/base.py", line 4582, in get_multi_indexes
    dialect_options["postgresql_with"] = dict(
ValueError: dictionary update sequence element #0 has length 4; 2 is required

Additional context

alembic migration scripts here

def upgrade() -> None:
    # ### commands auto generated by Alembic - please adjust! ###
    op.execute("DROP EXTENSION IF EXISTS vectors;")
    op.execute("CREATE EXTENSION vectors;")
    op.execute("SET vectors.pgvector_compatibility=on;")
    op.create_table('vectors_768',
    sa.Column('id', sa.Integer(), autoincrement=True, nullable=False),
    sa.Column('embedding', vector.VECTOR(dim=768), nullable=False),
    sa.PrimaryKeyConstraint('id')
    )
    op.create_index('ix_vector_embedding', 'vectors_768',
                    ['embedding'],
                    unique=False,
                    postgresql_using='hnsw',
                    postgresql_with={'m': 4, 'ef_construction': 10},
                    postgresql_ops={'embedding': 'vector_l2_ops'})
    # ### end Alembic commands ###

xianyuntang avatar Aug 11 '24 17:08 xianyuntang

Hi,

Thanks for reporting. I wonder if it could happen that there is no = sign. Probably not, so that fix should work.

Can you propose a PR?

CaselIT avatar Aug 11 '24 17:08 CaselIT

Hi, I'm running into the same issue and was wondering if a PR was opened for this? I couldn't find anything, but thought I would ask just in case.

If not, I'm happy to go ahead and open a PR for this change. I'm just not sure if there are any tests to add (I saw in the contribution guide that all PRs should ideally come with accompanying tests). I see there is already a test_get_multi_view_indexes here, so hopefully that suffices.

NickWilkinson37 avatar Dec 04 '24 16:12 NickWilkinson37

It doesn't seem that a PR was opened, so feel free to open one. Regarding test, if you can find an easy way of adding an option that accepts an arbitrary string without installing extension, I guess we could try to run a round trip. If not then it's probably fine that nothing else breaks

CaselIT avatar Dec 04 '24 20:12 CaselIT

Cool, thank you! I've opened a PR. I couldn't really think of an easy way to implement the testing suggestion, so I'm hoping that if nothing breaks it should be okay 😅 https://github.com/sqlalchemy/sqlalchemy/pull/12162

NickWilkinson37 avatar Dec 05 '24 14:12 NickWilkinson37

Nick Wilkinson has proposed a fix for this issue in the main branch:

Fixes: #11724 - PGDialect get_multi_indexes PGVecto.rs Bug https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5608

sqla-tester avatar Dec 05 '24 16:12 sqla-tester

Nick Wilkinson has proposed a fix for this issue in the rel_2_0 branch:

Fixes: #11724 - PGDialect get_multi_indexes PGVecto.rs Bug https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/5609

sqla-tester avatar Dec 07 '24 17:12 sqla-tester