Error when reading vector column of pgvector
Things to check first
-
[X] I have searched the existing issues and didn't find my bug already reported there
-
[X] I have checked that my bug is still present in the latest release
Sqlacodegen version
3.0.0rc5
SQLAlchemy version
2.0.31
RDBMS vendor
PostgreSQL
What happened?
Issue:
When attempting to add a new column to the recommend_model."Image" table using the following SQL command:
ALTER TABLE "recommend_model"."Image" ADD COLUMN "embedding" vector(1408) NOT NULL;
then, we encountered an issue where the pgvector.sqlalchemy.vector.VECTOR type is not being properly recognized in the context of the get_adapted_type method. This results in the column type falling back to sqlalchemy.sql.type_api.UserDefinedType instead.
Using pgvector 0.3.0
Traceback (most recent call last):
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/bin/sqlacodegen", line 8, in <module>
sys.exit(main())
^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/cli.py", line 101, in main
outfile.write(generator.generate())
^^^^^^^^^^^^^^^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 171, in generate
self.fix_column_types(table)
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 651, in fix_column_types
column.type = self.get_adapted_type(column.type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 703, in get_adapted_type
if new_coltype.compile(self.bind.engine.dialect) != compiled_type and (
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/type_api.py", line 1062, in compile
return dialect.type_compiler_instance.process(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 960, in process
return type_._compiler_dispatch(self, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/visitors.py", line 141, in _compiler_dispatch
return meth(self, **kw) # type: ignore # noqa: E501
^^^^^^^^^^^^^^^^
File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 7262, in visit_user_defined
return type_.get_col_spec(**kw)
^^^^^^^^^^^^^^^^^^
AttributeError: 'UserDefinedType' object has no attribute 'get_col_spec'
def get_adapted_type(self, coltype: Any) -> Any:
...
coltype = new_coltype
if supercls.__name__ != supercls.__name__.upper():
break
Analysis
The VECTOR type is recognized, but since it is an uppercase type, the loop continues to check for any camel case types. As a result, UserDefinedType is found, but since it does not have the get_col_spec method, an error occurs.
Potential Fixes
-
Avoid to use UserDefinedType when get adapted type:
if supercls.__visit_name__ == "user_defined": break coltype = new_coltypeUserDefinedTypeが発見された場合、すでにnew_coltypeがあった場合はそちらを優先する方針がよさそう。 It seems a good approach to prioritize an existing detected new_coltype if UserDefinedType is found.
It seem to be below:
if supercls.__visit_name__ == "user_defined" and new_coltype is not None: continue ... adapted_coltype = new_coltype -
Add Special Handling for
VECTOR: Include a specific check for theVECTORtype:coltype = new_coltype if supercls.__name__ == "VECTOR": breakVECTORの時点で現状、キャメルケースがくることはない。→しかし、今後来るようになる可能性はある? Currently, there are no camel case types coming after VECTOR. However, it is possible that they might come in the future.
-
Remove the Uppercase Check: Replace the existing check with a break statement, allowing
VECTORto be recognized:coltype = new_coltype # if supercls.__name__ != supercls.__name__.upper(): # break→ Due to the impact on other areas, this approach is rejected.
Output on success
embedding: Mapped[Any] = mapped_column(VECTOR(1408))
References
Commit where if supercls.name != supercls.name.upper() was introduced
ralated issue
https://github.com/agronholm/sqlacodegen/issues/300
What is the difference between Text and TEXT types in SQLAlchemy?
ref. https://docs.sqlalchemy.org/en/20/core/type_basics.html#generic-camelcase-types
Removing if supercls.__name__ != supercls.__name__.upper(): break results in differences, such as parts previously outputting as Text now appearing as TEXT.
Database schema for reproducing the bug
(WIP)
~I'm dealing with this issue...~ I've been too busy to handle it.
Did you install the project with the pgvector extra?
@agronholm yes. this is minimal requirements.lock to reproduce the issue.
# generated by rye
# use `rye lock` or `rye sync` to update this lockfile
#
# last locked with the following flags:
# pre: false
# features: []
# all-features: false
# with-sources: false
# generate-hashes: false
inflect==7.3.1
# via sqlacodegen
more-itertools==10.3.0
# via inflect
numpy==2.0.0
# via pgvector
pgvector==0.3.1
psycopg2==2.9.9
sqlacodegen==3.0.0rc5
sqlalchemy==2.0.31
# via sqlacodegen
typeguard==4.3.0
# via inflect
typing-extensions==4.12.2
# via sqlalchemy
# via typeguard
Others have successfully tested the pgvector integration. Any idea why it's not working for you?
I downgraded pgvector to 0.2.5 and it worked fine, was having the same issue before!
Pinpointing the specific commit where it starts to fail would probably help.
@agronholm @np-kyokyo The exact commit is here: https://github.com/pgvector/pgvector-python/commit/ba45eb23dd2eda0e8e15c4b5874c2cba5a577f81
It happened in the upgrade from pgvector==0.2.5 to pgvector==0.3.0. I am also having the same issue.
But sqlacodegen does not directly import any of those types, so it doesn't make sense that such a change would break the integration.