Split backends into independent repos
Discussion started here: https://github.com/stac-utils/stac-fastapi/issues/409#issuecomment-1192699464.
Including backends in this repo was convenient when stac-fastapi was first created but the pros are now outweighed by the cons.
Migration plan:
- Merge all open PRs.
- Release a new version, this will be the last version with a monorepo.
- Create two new repos in
stac-utilsorg, one for each backend. - Copy backend submodules over to new repos.
- Merge the
api,extensions, andtypessubmodules into a single stac-fastapi package, keep import paths the same to avoid breaking. - Cut over to the new stac-fastapi package in the backend repos.
- Test / validate backend repos before we merge all of the code.
- Move backend specific issues to the appropriate repo.
I'm a huge +1 for this and I'm happy to help with part of it if you want.
5. Merge the
api,extensions, andtypessubmodules into a single stac-fastapi package, keep import paths the same to avoid breaking.
I'm a big fan of this. I've been using poetry a bunch lately, but it doesn't support installing from GitHub subdirectories, which has been a blocker for trying to install the latest unstable version of this library. This would solve that issue.
I'm a huge +1 for this and I'm happy to help with part of it if you want.
That would be great! I'm going to start getting all of the open PRs resolved, not exactly sure how long that will take. But I imagine it will be easier to merge all the packages once this is done. It looks like the relevant PRs to api/extensions/types modules are:
- #429
- #428
- #421
- #417
- #415
- #341
Sounds good. I can create new repos for the backends (stac-fastapi-sqlalchemy and stac-fastapi-pgstac). We will probably want to resolve the following PRs before we port the code over since they all make changes to one or both of the backends:
- #425
- #424
- #418
@duckontheweb I can start moving code into stac-fastapi-sqlalchemy. I'll try to keep the repo structure as similar as possible.
@duckontheweb I can start moving code into
stac-fastapi-sqlalchemy. I'll try to keep the repo structure as similar as possible.
Sounds good. I’m wondering if we should actually fetch from this repo into those ones in order to preserve the Git history. I think setting up this repo as a remote and then doing a “git reset —hard” to the master branch of this repo would do the trick
I'm going to move the pgstac backend over to stac-fastapi-pgstac and preserve the Git history. I'll try to keep the repo structure as similar as possible, too.
I will also set up the default branch for that repo to be main instead of master. We may want to consider making the same change in stac-fastapi-sqlalchemy and in this repo while we are doing this overhaul.
@geospatial-jeff Let me know if you want me to work on consolidating this repo into a single Python package.
is it part of the plan to release 2.4.0 after the outstanding PRs are merged, and before the repo is broken apart? There seem to be a lot of unpublished changes against the current repo structure.
is it part of the plan to release 2.4.0 after the outstanding PRs are merged, and before the repo is broken apart? There seem to be a lot of unpublished changes against the current repo structure.
I think this would probably be a good idea. The intention is to not make any breaking changes as part of the new repo structure, but just to be safe we might want to cut a release first.
Yes I can create a release after https://github.com/stac-utils/stac-fastapi/pull/441 is merged.
I ported the code to stac-utils/stac-fastapi-pgstac and opened stac-utils/stac-fastapi-pgstac#1 to start trimming that down to just the PgSTAC backend.
I'll have the 2.4.0 release out sometime tomorrow.
I've added labels to each issue to help figure out what to move to different repos. It's not perfect, and there is often some overlap but hopefully it helps.
- sqlalchemy: https://github.com/stac-utils/stac-fastapi/issues?q=is%3Aopen+is%3Aissue+label%3Asqlalchemy
- pgstac: https://github.com/stac-utils/stac-fastapi/issues?q=is%3Aopen+is%3Aissue+label%3Apgstac
- api layer: https://github.com/stac-utils/stac-fastapi/issues?q=label%3A%22api+layer%22
I've added labels to each issue to help figure out what to move to different repos.
I'll open issues on the new stac-fastapi-pgstac repo for anything that is purely related to that backend and close the related issue here.
I can take a stab at trimming this repo down to just the core modules and distributing everything under a single package.
Thanks @duckontheweb, I will review #450 and https://github.com/stac-utils/stac-fastapi-pgstac/pull/1 tomorrow morning!
@geospatial-jeff There may be some issues around separation of concerns between this repo and the backend repos that we want to work out in v2.5.0-a* releases.
These came up as I started to work on #401 using #450 and stac-utils/stac-fastapi-pgstac#1 as a base. That issue pertains to adding Queryables links to the /, /collections, and /collections/{collection_id} responses. None of the changes require information from a specific backend database (the links are static for each endpoint), but because of the way the libraries are set up it required making changes in both this repo and stac-fastapi-pgstac in order to full implement it. The crux of the problem (at least from my perspective) is that "backend" packages like stac-fastapi.pgstac and stac-fastapi.sqlalchemy end up handling too much of the API layer logic like constructing responses.
The AsyncBaseCoreClient.landing_page method does a pretty good job of implementing the logic for constructing the response in this library and delegating fetching of Collections to AsyncBaseCoreClient.all_collections, which is implemented by the backend packages (although that method does do some of the response formatting by putting the list of collections into the collections property of a dictionary).
The /collections endpoint, however, is just using AsyncBaseCoreClient.all_collections as a handler directly. This means that changes to the links property of that response need to happen in the backend repo even if they are not backend-specific. If we change the CRUD client classes to have something more like the following structure, we might be able to get a better separation of concerns and minimize the places where we need to update multiple repos in order to implement a change:
-
AsyncBaseCoreClient.fetch_all_collections- New abstract method in this repo (implemented in backend repos). Returns a simple list of Collections. -
AsyncBaseCoreClient.all_collections- Implemented in this repo, callsAsyncBaseCoreClient.fetch_all_collectionsand constructs the response using that list, including adding top-level links and manipulating links on each Collection as necessary. -
AsyncBaseCoreClient.fetch_collection- New abstract method in this repo (implemented in backend repos). Returns a Collection. -
AsyncBaseCoreClient.get_collection- Implemented in this repo, callsAsyncBaseCoreClient.fetch_collectionand constructs the response, including manipulating Collection links as necessary.
If this seems like a reasonable approach I can put in an draft PR with these changes and then we can see where else we might want to tease apart some of the logic.
This means that changes to the links property of that response need to happen in the backend repo even if they are not backend-specific.
This makes sense, definitely something we need to fix for good separation of concerns.
If this seems like a reasonable approach I can put in an draft PR with these changes and then we can see where else we might want to tease apart some of the logic.
This makes sense to me! I imagine its mostly just adding/resolving links which is already relatively standard across backends using the stac_fastapi.types.links module so seems better to do that once in the API layer.
Do you think we have to do the same for item endpoints?
Do you think we have to do the same for item endpoints?
I haven't looked at these in detail, but I imagine we will. I can start a PR with the collections endpoints and then we can build off of that.
@duckontheweb Could you please give me push access to https://github.com/stac-utils/stac-fastapi-sqlalchemy?
Getting 403s on push.
@duckontheweb Could you please give me push access to https://github.com/stac-utils/stac-fastapi-sqlalchemy? Getting 403s on push.
Oops, should be all set now
The sqlalchemy PR is here https://github.com/stac-utils/stac-fastapi-sqlalchemy/pull/1, I have a few TODOs listed on the PR but it's in pretty good shape.
@geospatial-jeff @duckontheweb checking in on this one — has this stalled out? I’m going to be picking up some dev time for stac-fastapi and I’d like to get across the current state of play before I dive in too deep.
Hey @gadomski that sounds great! This has stalled out but I'm going to be picking it back up. I have some time set aside tomorrow and Saturday to continue working on this issue. It's been a while since we started so I also need to do some accounting to figure out exactly what we've done and how much is left to do. I will follow up tomorrow with a more detailed comment on the status of this issue.
At a high level:
- Update #450 with main, there are some changes that have been released since we started working on this issue.
- Splitting out the sqlalchemy/pgstac backends into independent repos has revealed some issues with separation of concerns between the API layer (
stac_fastapi.api,stac_fastapi.types,stac_fastapi.extensions) and the backends. The main issue is with link generation and how the current implementation is passing thestarlette.requests.Requestobject around. #472 moves link generation to the API layer, this PR needs to be finished and merged into #450. - Update https://github.com/stac-utils/stac-fastapi-sqlalchemy and https://github.com/stac-utils/stac-fastapi-pgstac with main to preserve changes when we cut over.
- Update https://github.com/stac-utils/stac-fastapi-sqlalchemy and https://github.com/stac-utils/stac-fastapi-pgstac with #450. Main challenge here will be to implement the new link generation introduced in #472.
- Do a final review of https://github.com/stac-utils/stac-fastapi-sqlalchemy and https://github.com/stac-utils/stac-fastapi-pgstac to make sure the code looks good and everything is working as expected. There are a few open PRs in each repo that need to be finished up and merged (although some of them look finished, just not merged yet). Also confirm that separation of concerns is better; this may require some iteration.
Some additional steps to cut over to the new repo structure cleanly. We did many of these a while ago but they are worth revisiting now as new code has been pushed to stac-fastapi since then.
- Move backend specific issues over to the appropriate backend repo.
- Merge any open PRs to prevent merge conflicts.
- Release a new stac-fastapi version before we do the cutover.
Great, thanks for the summary. If I'm understanding correctly, #472 is the biggest blocker at the moment -- the rest of the TODOs seem to be more general housekeeping?
I can start on some of the housekeeping tasks, after which I can bring an extra set of hands/eyes to #472, with the caveat that I'm still learning the code base so will be a little slow/ignorant :-).