sqlacodegen Error when reading vector column of pgvector

Things to check first

[X] I have searched the existing issues and didn't find my bug already reported there
[X] I have checked that my bug is still present in the latest release

Sqlacodegen version

3.0.0rc5

SQLAlchemy version

2.0.31

RDBMS vendor

PostgreSQL

What happened?

Issue:

When attempting to add a new column to the recommend_model."Image" table using the following SQL command:

ALTER TABLE "recommend_model"."Image" ADD COLUMN  "embedding" vector(1408) NOT NULL;

then, we encountered an issue where the pgvector.sqlalchemy.vector.VECTOR type is not being properly recognized in the context of the get_adapted_type method. This results in the column type falling back to sqlalchemy.sql.type_api.UserDefinedType instead.

Using pgvector 0.3.0
Traceback (most recent call last):
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/bin/sqlacodegen", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/cli.py", line 101, in main
    outfile.write(generator.generate())
                  ^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 171, in generate
    self.fix_column_types(table)
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 651, in fix_column_types
    column.type = self.get_adapted_type(column.type)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 703, in get_adapted_type
    if new_coltype.compile(self.bind.engine.dialect) != compiled_type and (
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/type_api.py", line 1062, in compile
    return dialect.type_compiler_instance.process(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 960, in process
    return type_._compiler_dispatch(self, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/visitors.py", line 141, in _compiler_dispatch
    return meth(self, **kw)  # type: ignore  # noqa: E501
           ^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 7262, in visit_user_defined
    return type_.get_col_spec(**kw)
           ^^^^^^^^^^^^^^^^^^
AttributeError: 'UserDefinedType' object has no attribute 'get_col_spec'

def get_adapted_type(self, coltype: Any) -> Any:
    ...
    coltype = new_coltype
    if supercls.__name__ != supercls.__name__.upper():
        break

Analysis

The VECTOR type is recognized, but since it is an uppercase type, the loop continues to check for any camel case types. As a result, UserDefinedType is found, but since it does not have the get_col_spec method, an error occurs.

Potential Fixes

Avoid to use UserDefinedType when get adapted type:
```
if supercls.__visit_name__ == "user_defined":
    break
coltype = new_coltype
```
UserDefinedTypeが発見された場合、すでにnew_coltypeがあった場合はそちらを優先する方針がよさそう。 It seems a good approach to prioritize an existing detected new_coltype if UserDefinedType is found.

It seem to be below:
```
             if supercls.__visit_name__ == "user_defined" and new_coltype is not None:
                 continue

             ...

             adapted_coltype = new_coltype
```
Add Special Handling for VECTOR: Include a specific check for the VECTOR type:
```
coltype = new_coltype
if supercls.__name__ == "VECTOR":
    break
```
VECTORの時点で現状、キャメルケースがくることはない。→しかし、今後来るようになる可能性はある？ Currently, there are no camel case types coming after VECTOR. However, it is possible that they might come in the future.
Remove the Uppercase Check: Replace the existing check with a break statement, allowing VECTOR to be recognized:
```
coltype = new_coltype
# if supercls.__name__ != supercls.__name__.upper():
#     break
```
→ Due to the impact on other areas, this approach is rejected.

Output on success

embedding: Mapped[Any] = mapped_column(VECTOR(1408))

References

Commit where if supercls.name != supercls.name.upper() was introduced

Commit 09ff6744ef6267484ab31d8c2e08e43801aac8b9

ralated issue

https://github.com/agronholm/sqlacodegen/issues/300

What is the difference between `Text` and `TEXT` types in SQLAlchemy?

ref. https://docs.sqlalchemy.org/en/20/core/type_basics.html#generic-camelcase-types

Removing if supercls.__name__ != supercls.__name__.upper(): break results in differences, such as parts previously outputting as Text now appearing as TEXT.

Database schema for reproducing the bug

(WIP)

Jul 12 '24 00:07 np-kyokyo

~I'm dealing with this issue...~ I've been too busy to handle it.

Jul 12 '24 03:07 np-kyokyo

Did you install the project with the pgvector extra?

Jul 12 '24 09:07 agronholm

@agronholm yes. this is minimal requirements.lock to reproduce the issue.

# generated by rye
# use `rye lock` or `rye sync` to update this lockfile
#
# last locked with the following flags:
#   pre: false
#   features: []
#   all-features: false
#   with-sources: false
#   generate-hashes: false

inflect==7.3.1
    # via sqlacodegen
more-itertools==10.3.0
    # via inflect
numpy==2.0.0
    # via pgvector
pgvector==0.3.1
psycopg2==2.9.9
sqlacodegen==3.0.0rc5
sqlalchemy==2.0.31
    # via sqlacodegen
typeguard==4.3.0
    # via inflect
typing-extensions==4.12.2
    # via sqlalchemy
    # via typeguard

Jul 12 '24 09:07 np-kyokyo

Others have successfully tested the pgvector integration. Any idea why it's not working for you?

Jul 13 '24 14:07 agronholm

I downgraded pgvector to 0.2.5 and it worked fine, was having the same issue before!

Jul 18 '24 22:07 Tcintra

Pinpointing the specific commit where it starts to fail would probably help.

Jul 18 '24 22:07 agronholm

@agronholm @np-kyokyo The exact commit is here: https://github.com/pgvector/pgvector-python/commit/ba45eb23dd2eda0e8e15c4b5874c2cba5a577f81

It happened in the upgrade from pgvector==0.2.5 to pgvector==0.3.0. I am also having the same issue.

Nov 21 '24 17:11 chandlj

But sqlacodegen does not directly import any of those types, so it doesn't make sense that such a change would break the integration.

Nov 23 '24 11:11 agronholm

Error when reading vector column of pgvector

Things to check first

Sqlacodegen version

SQLAlchemy version

RDBMS vendor

What happened?

Issue:

Analysis

Potential Fixes

Output on success

References

ralated issue

What is the difference between Text and TEXT types in SQLAlchemy?

Database schema for reproducing the bug

What is the difference between `Text` and `TEXT` types in SQLAlchemy?