separate source code and deployment configurations
this is the modern version of #708, where people called for alembic configuration to be in pyproject.toml. it's not appropriate for the whole alembic.ini to be moved to this file, and the actual problem to be solved is that Alembic's configuration model is somewhat wrong, hence we were unable to do #708 as it stands without first fixing configuration.
First let's lay out some of what exists, that is impacted by this:
- the alembic.ini file has source code pathing information in it, as well as database URLs to connect to databases, and python logging configurations
- alembic.ini is generated by the "alembic init" command that is intended to lay out a working config within a project
- the parts of alembic.ini that deal with file paths are consumed by the Config /ScriptDirectory components, which then load up the env.py file.
- Then the parts of alembic.ini that connect to databases and do logging are consumed within the env.py file, and users can in fact customize env.py so that logging / database config comes from somewhere else entirely.
So from that, we can summarize Alembic's configuration is in practice at least two separate categories:
- the first is what I will call source code configuration, which is all the path related stuff: script_location, prepend_sys_path, version_locations. These attributes are consumed by Alembic when any commands are run, and in particular it needs to consume these variables in order for it to start working with the user's environment, which includes the env.py. these variables should ALWAYS be in pyproject.toml.
- the second is what I will call deployment configuration, which is the live runtime stuff: database URLs , logging configuration. These are things that are configured on a production server in a way that is specific to a particular deployment. these variables should NEVER be in pyproject.toml
- The "deployment configuration" in alembic.ini has always been implicitly optional. That is, from day one, we said, although not very clearly and sort of by "read the source, Luke" methods, please customize env.py to use your own deployment configuration scheme. That is we even included a "pylons" template that illustrated, "hey the live database URL doesn't have to come from alembic.ini". The use of
logging.fileConfig()inside of env.py illustrated, "hey, the logging config doesn't have to come from alembic.ini".
See this is the bug. alembic.ini has two kinds of configuration, and then through vague handwavy means we sort of gave people a way to "move" the deployment configuration out of it, into their own system, which is what people do (at least, developers working in professional deployment situations. beginners, I have no idea what they do).
So given all that, here are more wrinkles:
- we have an alembic init command that has to generate a working app config. So we can't just say, "hey sure, use pyproject.toml if you want". our init program has to support this, and to that end, supporting just one way would be best, with regards to the automatic generation of a project. Supporting existing deployments with alembic.ini is of course something we'd never change.
- our Config object refers explicitly to a Python ConfigParser and its API is based on ConfigParser, which is not necessarily compatible with pyproject.toml - there are of course a lot of ways to work around this, but not sure of the details. We would need to at the very least port the notions of get_section_option get_main_option as well as setters to a configurational model that accommodates the toml format. such as, "main options" are now officially "source code / toml" features and "section options" are "deployment / alembic.ini" features (edit: not exactly. DB URL is in main, that goes "post_write_hooks", that's pyproject.toml).
So given all that, here are proposals:
-
this is all major release stuff - alembic 1.9 or greater
-
existing alembic projects should not be impacted at all - the existing alembic.ini format should always work, no plans to remove it
-
Config API should be changed to have separate notions of source config and deployment config - I would deprecate all "main_option" / "section_option" language entirely and replace it with "source_config_option" / "deployment_option" (or whatever term). These dont match up exactly.
- everything in [alembic] except for database url can be pyproject.toml, that is, source_config_option
- [post_write_hooks] should be pyproject.toml, that is, source_config_section or some concept like that
- [loggers] and all logging config stays as configparser
- database.url is in alembic.ini "main", but we classify this as a deployment option
-
--nameoption, that is, https://alembic.sqlalchemy.org/en/latest/cookbook.html#multiple-environments. I dont know what to do here.--nameshould probably look in pyproject.toml and then alembic.ini for the named sections using resolution similar to the default resolution.
-
if Config.file_config remains, it returns the alembic.ini config, but this would not have any data from pyproject.toml inside of it
-
alembic init will be switched to render into pyproject.toml directly for source code options - that is we remove all the "main" stuff from alembic.ini. I don't want to have to document two styles. note that this includes:
- generate pyproject.toml if it doesnt exist
- render our pyproject section into an existing pyproject.toml file
- detect if our sections are already present in pyproject.toml and don't add if so, emit a message
- use a mako template for the pyproject section itself, but not the whole .toml file, since we need to render inside an existing file
-
even in the new way, there is still an alembic.ini file - with default deployment configuration used by env.py. it can be removed if one's env.py does not need it
-
alembic will consume source code options from pyproject.toml then fall back to alembic.ini
so that's what I have. I personally don't have time to own the effort on this, however whoever is doing it I will have a lot of very detailed review comments etc. because getting this wrong will just create expoentially more work later.
looking at how --name works, implementation wise, actually porting the toml reader to read out keys/sections into the ConfigParser() is likely the easiest way to do this. but not totally sure there isn't some format consideration that makes this infeasible
hmm nope, I think we have to rip out ConfigParser and rework Config to have its own internal data structure. then we have consumers that can consume both toml and configparser into the internal structure.
def set_main_option(self, name: str, value: str) -> None:
for things like "list of directories", toml can represent the list directly, so that "str value" is not appropriate.
So we need to port ConfigParser to our own internal solution, for things like named sections. We then need consumers for pyproject and configparser that are separate and based on a schema. Things like the directory splitting we are doing at https://github.com/sqlalchemy/alembic/blob/main/alembic/script/base.py#L166 becomes local to the configparser consumer, because pyproject.toml gives us that directly.
this is a big job
- use a mako template for the pyproject section itself, but not the whole .toml file, since we need to render inside an existing file
Not sure I would go down this path. I think toml files are witten like json files, ie the file is read, modification are done to python objects then the object is serialized, rewriting the toml file (iirc comments are kept)
how do different templates produce different configuations then ?
I guess we would have to manually construct the python dict that represents the tool.alembic section of the toml in some way
So from that, we can summarize Alembic's configuration is in practice at least two separate categories:
I never thought about thins, but I actually agree. In fact at work the have a different mean for configuring the engine and the logging, while all the other "source_config_option" are taken from alembic.ini
Maybe it is worth changing the configuration templates now, so they no longer create an alembic.ini file that mixes deployment configuration, instead leaving it fully in the env.py?
Ideally solution for me as a SE:
- Move logging configuration from
alembic.inito :snake: runtime (usuallyloggers.pywhere dictConfig initialized, this is really dynamic solution, because you can change any logging configuration depending on ENVs). Then setupalembiclogger with all handlers/formatters/filters. (For example gunicorn provides such possibility withgunicornlogger name). - Move out any connections / security stuff from
alembic.ini/env.py. The DB URL ~should~must be constructed in :snake: runtime and depends on (os.environ, pydantic.BaseSettings, SSM, SecretsManager, etc...) :rage3: . - Move other configurations (path options. names, hooks) from
alembic.initopyproject.tomlunder a[tool.alembic]and[tool.alembic.hooks]. Just because such options should be inpyproject.toml - Remove
alembic.inifrom repository :pray:.
Any update on this issue?
Nothing outside what's in this issue