Autogenerate renders TypeDecorator instance instead of underlying impl type
Describe the bug
This isn't a bug per se, but a small improvement for autogenerate when using TypeDecorator.
When a TypeDecorator is used in a column definition, e.g.:
"""
File: app/models/foo.py
"""
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.types import TypeDecorator
...
class JSONBData(TypeDecorator):
impl = JSONB
foo = Table("foo", MetaData(), Column("data", JSONBData))
The auto-generated migration ends up referencing the TypeDecorator:
op.add_column("foo", sa.Column("data", app.models.foo.JSONBData(), nullable=True))
which is annoying for two reasons:
- The import is not automatically rendered.
- The migration file has an unnecessary dependency on
app, e.g. if the app/models/foo.py is refactored, we may need to update this migration file... when that could have been avoided if instead of renderingapp.models.foo.JSONBData, alembic directly rendered the underlying impl:postgresql.JSONB.
I'm not sure if there are any scenarios where it is actually preferable to have the TypeDecorator in the autogenerated file. If there are use cases for it, would it be sensible to make this a config option instead of unconditional?
Expected behavior
Ideally, we'd generate the same thing as when foo = Table("foo", MetaData(), Column("data", JSONB)) is provided, i.e.:
op.add_column("foo", sa.Column("data", postgresql.JSONB(astext_type=Text()), nullable=True))
To Reproduce
Test case:
diff --git a/tests/test_autogen_render.py b/tests/test_autogen_render.py
index eeeb92e..9755869 100644
--- a/tests/test_autogen_render.py
+++ b/tests/test_autogen_render.py
@@ -33,6 +33,7 @@ from sqlalchemy.sql import false
from sqlalchemy.sql import literal_column
from sqlalchemy.sql import table
from sqlalchemy.types import TIMESTAMP
+from sqlalchemy.types import TypeDecorator
from sqlalchemy.types import UserDefinedType
from alembic import autogenerate
@@ -1078,6 +1079,21 @@ class AutogenRenderTest(TestBase):
"server_default='5', nullable=True))",
)
+ def test_render_add_column_type_decorator(self):
+ self.autogen_context.opts["user_module_prefix"] = None
+
+ class MyType(TypeDecorator):
+ impl = Integer
+
+ op_obj = ops.AddColumnOp(
+ "foo", Column("x", MyType, server_default="5")
+ )
+ eq_ignore_whitespace(
+ autogenerate.render_op_text(self.autogen_context, op_obj),
+ "op.add_column('foo', sa.Column('x', sa.Integer(), "
+ "server_default='5', nullable=True))",
+ )
+
@testing.emits_warning("Can't validate argument ")
def test_render_add_column_custom_kwarg(self):
col = Column(
Error
When running tox -e py-sqlalchemy -- tests/test_autogen_render.py with the above patch:
File "/Users/saif/contrib/alembic/tests/test_autogen_render.py", line 1091, in test_render_add_column_type_decorator
eq_ignore_whitespace(
File "/Users/saif/contrib/alembic/alembic/testing/assertions.py", line 111, in eq_ignore_whitespace
assert a == b, msg or "%r != %r" % (a, b)
^^^^^^
AssertionError: "op.add_column('foo', sa.Column('x', tests.test_autogen_render.MyType(), server_default='5', nullable=True))" != "op.add_column('foo', sa.Column('x', sa.Integer(), server_default='5', nullable=True))"
Versions.
- OS: macOS 14.1.2
- Python: 3.11.6
- Alembic: 1.13.1
- SQLAlchemy: 1.3.24 / 2.0.23
- Database: Postgres
- DBAPI: psycopg2
Have a nice day!
Hi
I'm not sure if there are any scenarios where it is actually preferable to have the TypeDecorator in the autogenerated file. If there are use cases for it, would it be sensible to make this a config option instead of unconditional?
Well a type decorator can render a different type depending on the dialect, so this change if accepted would need to be made a config option. For example https://docs.sqlalchemy.org/en/20/core/custom_types.html#backend-agnostic-guid-type
Also at the moment alembic support customization of what type is rendered, documented here: https://alembic.sqlalchemy.org/en/latest/autogenerate.html#affecting-the-rendering-of-types-themselves
maybe an extension to that documentation to show an example function to render the impl of type decorator could be enough? @zzzeek opinion on this?
""" File: app/models/foo.py """ from sqlalchemy.dialects.postgresql import JSONB from sqlalchemy.types import TypeDecorator ...
class JSONBData(TypeDecorator): impl = JSONB
foo = Table("foo", MetaData(), Column("data", JSONBData))
The auto-generated migration ends up referencing the TypeDecorator: ```python op.add_column("foo", sa.Column("data", app.models.foo.JSONBData(), nullable=True))which is annoying for two reasons:
1. The import is not automatically rendered.
we have a configuration path for this which is to add your application's standard type import paths to the script.py.mako template. There's some other ways to add imports dynamically as well but are not as easy to document.
2. The migration file has an unnecessary dependency on `app`, e.g. if the app/models/foo.py is refactored, we may need to update this migration file... when that could have been avoided if instead of rendering `app.models.foo.JSONBData`, alembic directly rendered the underlying impl: `postgresql.JSONB`.
if you are refactoring your application to move where a particular symbol imports from, you necessarily have to change all the places that reference it, so I dont see why an alembic migration file is any different than any other file which refers to it.
I'm not sure if there are any scenarios where it is actually preferable to have the TypeDecorator in the autogenerated file.
there is, which is that the op.create_table() directive actually returns the Table object it just created so you can then use it for bulk_insert , where you definitely want your TypeDecorator to take place.
If there are use cases for it, would it be sensible to make this a config option instead of unconditional?
there already is! make yourself a render_item callable that does this for all typedecorators, I will gladly accept an add for cookbook for this:
def render_item(type_, obj, autogen_context):
"""Apply custom rendering for selected items."""
if type_ == 'type' and isinstance(obj, TypeDecorator):
return f"sa.{obj.impl!r}"
# default rendering for other objects
return False
def run_migrations_online():
# ...
context.configure(
connection=connection,
target_metadata=target_metadata,
render_item=render_item,
# ...
)
# ...
Thanks @CaselIT and @zzzeek for the feedback!
For some additional context on where I'm coming from, I've already implemented a monkeypatch that is applying a change similar to the one I've proposed that we've been using for a couple of years now, so that for most part developers get correctly generated migrations... but at the cost of someone maintaining ugly monkeypatch code in sync with alembic's changelog, so I was hoping to fold this into alembic source code if it would make sense. It looks like there are already cases where folks want to preserve their type decorators in the migration, so finding the right configuration instead of monkey patching / broadly making the change seems sensible to me.
monkeypatch if you're curious
from alembic import context
from alembic.autogenerate.render import _repr_type
import sqlalchemy.types as types
migration_context = context.get_context()
old_render_type = migration_context.impl.render_type
def new_render_type(type_, autogen_context):
# If we are dealing with a custom TypeDecorator, run _repr_type on the underlying implementation.
if isinstance(type_, types.TypeDecorator) and type(type_).__module__.startswith("app."):
return _repr_type(type_.impl, autogen_context)
# Otherwise, do the default behavior.
return old_render_type(type_, autogen_context)
migration_context.impl.render_type = new_render_type
But in the interest of properly communicating why this sort of recipe is useful, I'll respond to some of the things you both have mentioned 😇.
if you are refactoring your application to move where a particular symbol imports from, you necessarily have to change all the places that reference it, so I dont see why an alembic migration file is any different than any other file which refers to it.
That's generally true... for our project, we've decided to stricly enfoce that migrations are disjoint from app, which has some pretty big benefits:
-
Easier maintainence: a migration at the end of the day is applying changes to our database using a dbapi, where we typically autogenerate it from our constantly-iterated-upon application's ORM models. By avoiding importing our application code in our migraiton, we end up with a growing collection of past migrations that are self-contained scripts can be referenced by developers writing new migrations with minimal maintainence cost since it is not importing our constantly evolving
app. - Lower likelihood of bugs: There's also less possibility of accidentally refactoring a migration to do something differently. We manage 100+ databases that are not released to at the same release cadence, so not refactoring our migrations needlessly is one additional measure to ensure we are keeping our databases in sync.
-
Performance:Our
appis monolithic, so importing it comes with a good deal of baggage (cost of runtime imports, installing all dependencies used by the app). As a migration is effectively a self-contained set of dbapi instructions, we can keep our migrations package super lightweight dependency-wise, and can easily dockerize and run our migrations without having to bundle it with our monolithic application.
There are many ways to combat the above issues, which I concur are not universal issues, but we found this decoupling to work for us.
Well a type decorator can render a different type depending on the dialect, so this change if accepted would need to be made a config option.
FWIW, where I put my proposed fix (#1387) happens after a dialect has attempted to render the type, so I think the mentioned use case still works - at least no test broke where I put my logic.
we have a configuration path for this which is to add your application's standard type import paths to the script.py.mako template. There's some other ways to add imports dynamically as well but are not as easy to document.
Good to know. In my monolith applicaiton, we have many places that a type decorator could live, so a standard set of imports isn't totally doable, and the overhead of custom code for adding dynamic imports sound like a heavier burden on the team, also not allowed by my teams' "migrations does not depend on app" requirement.
there already is! make yourself a render_item callable that does this for all typedecorators, I will gladly accept an add for cookbook for this:
Nice! I'll try this out, and attempt to write up a cookbook recipe.
re: render_item , one thing to note is that with my proposed implementation when dealing with impl = JSONB case, we get the from sqlalchemy.dialects import postgresql import for free since it uses the existing machinery, whereas using render_item , I think I'll have to duplicate some of the logic from https://github.com/sqlalchemy/alembic/blob/abc8002ec67ddcb0a0be56b8167a4837f3884217/alembic/autogenerate/render.py#L815-L835 , which isn't the end of the world, just more complexity to maintain in userland.
One way of implementing render_item would be to do something very similar to the monkeypatch I shared above, where instead of adding in our custom logic, we directly invoke alembic.autogenerate.render._repr_type. As this is currently a private function, the recipe would be fairly brittle unless _repr_type were "public" and changes to its signature would be called out as a minor/major breaking change?
this is the thing with recipes. Think of real world use. You have an app, and you're using like two or three database-specific types, you know you're using postgresql.
Realistically, I dont really think having the TypeDecorator render is really any kind of issue, I dont really feel "maintenance" or "bugs" are impacted very much at all by this rendering and certainly not "performance", database migrations can certainly afford an extra .3 seconds startup to import your application's utility module, but as stated, there's the render_item() recipe and you can do whatever you want.
so re: render_item's "automatic logic with imports", again, real world - your app is hardcoded to Postgresql, so OK, add "from sqlalchemy.dialects import postgresql" to your script.py.mako template. that's how 99% of people would solve this, and it's not a problem. but you want to be more fancy, you can do that also, the imports collection is passed to the hook via the autogen_context, see the next paragraph at https://alembic.sqlalchemy.org/en/latest/autogenerate.html#affecting-the-rendering-of-types-themselves :
def render_item(type_, obj, autogen_context):
"""Apply custom rendering for selected items."""
if type_ == 'type' and isinstance(obj, MySpecialType):
# add import for this type
autogen_context.imports.add("from mymodel import types")
return "types.%r" % obj
# default rendering for other objects
return False
I think overall the vibe I want to impart is that this is a simple thing Alembic does and it allows a hook to customize, we want to keep things simple, not overthink, not qualify all of Alembic's API with N^18 flags and switches, just imperative code where needed.