stac-fastapi icon indicating copy to clipboard operation
stac-fastapi copied to clipboard

Split backends into independent repos

Open geospatial-jeff opened this issue 3 years ago • 25 comments

Discussion started here: https://github.com/stac-utils/stac-fastapi/issues/409#issuecomment-1192699464.

Including backends in this repo was convenient when stac-fastapi was first created but the pros are now outweighed by the cons.

Migration plan:

  1. Merge all open PRs.
  2. Release a new version, this will be the last version with a monorepo.
  3. Create two new repos in stac-utils org, one for each backend.
  4. Copy backend submodules over to new repos.
  5. Merge the api, extensions, and types submodules into a single stac-fastapi package, keep import paths the same to avoid breaking.
  6. Cut over to the new stac-fastapi package in the backend repos.
  7. Test / validate backend repos before we merge all of the code.
  8. Move backend specific issues to the appropriate repo.

geospatial-jeff avatar Jul 27 '22 13:07 geospatial-jeff

I'm a huge +1 for this and I'm happy to help with part of it if you want.

5. Merge the api, extensions, and types submodules into a single stac-fastapi package, keep import paths the same to avoid breaking.

I'm a big fan of this. I've been using poetry a bunch lately, but it doesn't support installing from GitHub subdirectories, which has been a blocker for trying to install the latest unstable version of this library. This would solve that issue.

duckontheweb avatar Jul 27 '22 19:07 duckontheweb

I'm a huge +1 for this and I'm happy to help with part of it if you want.

That would be great! I'm going to start getting all of the open PRs resolved, not exactly sure how long that will take. But I imagine it will be easier to merge all the packages once this is done. It looks like the relevant PRs to api/extensions/types modules are:

  • #429
  • #428
  • #421
  • #417
  • #415
  • #341

geospatial-jeff avatar Jul 27 '22 22:07 geospatial-jeff

Sounds good. I can create new repos for the backends (stac-fastapi-sqlalchemy and stac-fastapi-pgstac). We will probably want to resolve the following PRs before we port the code over since they all make changes to one or both of the backends:

  • #425
  • #424
  • #418

duckontheweb avatar Jul 28 '22 01:07 duckontheweb

@duckontheweb I can start moving code into stac-fastapi-sqlalchemy. I'll try to keep the repo structure as similar as possible.

geospatial-jeff avatar Aug 01 '22 20:08 geospatial-jeff

@duckontheweb I can start moving code into stac-fastapi-sqlalchemy. I'll try to keep the repo structure as similar as possible.

Sounds good. I’m wondering if we should actually fetch from this repo into those ones in order to preserve the Git history. I think setting up this repo as a remote and then doing a “git reset —hard” to the master branch of this repo would do the trick

duckontheweb avatar Aug 01 '22 21:08 duckontheweb

I'm going to move the pgstac backend over to stac-fastapi-pgstac and preserve the Git history. I'll try to keep the repo structure as similar as possible, too.

I will also set up the default branch for that repo to be main instead of master. We may want to consider making the same change in stac-fastapi-sqlalchemy and in this repo while we are doing this overhaul.

@geospatial-jeff Let me know if you want me to work on consolidating this repo into a single Python package.

duckontheweb avatar Aug 02 '22 20:08 duckontheweb

is it part of the plan to release 2.4.0 after the outstanding PRs are merged, and before the repo is broken apart? There seem to be a lot of unpublished changes against the current repo structure.

captaincoordinates avatar Aug 02 '22 22:08 captaincoordinates

is it part of the plan to release 2.4.0 after the outstanding PRs are merged, and before the repo is broken apart? There seem to be a lot of unpublished changes against the current repo structure.

I think this would probably be a good idea. The intention is to not make any breaking changes as part of the new repo structure, but just to be safe we might want to cut a release first.

duckontheweb avatar Aug 03 '22 01:08 duckontheweb

Yes I can create a release after https://github.com/stac-utils/stac-fastapi/pull/441 is merged.

geospatial-jeff avatar Aug 03 '22 02:08 geospatial-jeff

I ported the code to stac-utils/stac-fastapi-pgstac and opened stac-utils/stac-fastapi-pgstac#1 to start trimming that down to just the PgSTAC backend.

duckontheweb avatar Aug 03 '22 12:08 duckontheweb

I'll have the 2.4.0 release out sometime tomorrow.

geospatial-jeff avatar Aug 03 '22 21:08 geospatial-jeff

I've added labels to each issue to help figure out what to move to different repos. It's not perfect, and there is often some overlap but hopefully it helps.

  • sqlalchemy: https://github.com/stac-utils/stac-fastapi/issues?q=is%3Aopen+is%3Aissue+label%3Asqlalchemy
  • pgstac: https://github.com/stac-utils/stac-fastapi/issues?q=is%3Aopen+is%3Aissue+label%3Apgstac
  • api layer: https://github.com/stac-utils/stac-fastapi/issues?q=label%3A%22api+layer%22

geospatial-jeff avatar Aug 04 '22 14:08 geospatial-jeff

I've added labels to each issue to help figure out what to move to different repos.

I'll open issues on the new stac-fastapi-pgstac repo for anything that is purely related to that backend and close the related issue here.

duckontheweb avatar Aug 06 '22 19:08 duckontheweb

I can take a stab at trimming this repo down to just the core modules and distributing everything under a single package.

duckontheweb avatar Aug 07 '22 02:08 duckontheweb

Thanks @duckontheweb, I will review #450 and https://github.com/stac-utils/stac-fastapi-pgstac/pull/1 tomorrow morning!

geospatial-jeff avatar Aug 09 '22 23:08 geospatial-jeff

@geospatial-jeff There may be some issues around separation of concerns between this repo and the backend repos that we want to work out in v2.5.0-a* releases.

These came up as I started to work on #401 using #450 and stac-utils/stac-fastapi-pgstac#1 as a base. That issue pertains to adding Queryables links to the /, /collections, and /collections/{collection_id} responses. None of the changes require information from a specific backend database (the links are static for each endpoint), but because of the way the libraries are set up it required making changes in both this repo and stac-fastapi-pgstac in order to full implement it. The crux of the problem (at least from my perspective) is that "backend" packages like stac-fastapi.pgstac and stac-fastapi.sqlalchemy end up handling too much of the API layer logic like constructing responses.

The AsyncBaseCoreClient.landing_page method does a pretty good job of implementing the logic for constructing the response in this library and delegating fetching of Collections to AsyncBaseCoreClient.all_collections, which is implemented by the backend packages (although that method does do some of the response formatting by putting the list of collections into the collections property of a dictionary).

The /collections endpoint, however, is just using AsyncBaseCoreClient.all_collections as a handler directly. This means that changes to the links property of that response need to happen in the backend repo even if they are not backend-specific. If we change the CRUD client classes to have something more like the following structure, we might be able to get a better separation of concerns and minimize the places where we need to update multiple repos in order to implement a change:

  • AsyncBaseCoreClient.fetch_all_collections - New abstract method in this repo (implemented in backend repos). Returns a simple list of Collections.
  • AsyncBaseCoreClient.all_collections - Implemented in this repo, calls AsyncBaseCoreClient.fetch_all_collections and constructs the response using that list, including adding top-level links and manipulating links on each Collection as necessary.
  • AsyncBaseCoreClient.fetch_collection - New abstract method in this repo (implemented in backend repos). Returns a Collection.
  • AsyncBaseCoreClient.get_collection - Implemented in this repo, calls AsyncBaseCoreClient.fetch_collection and constructs the response, including manipulating Collection links as necessary.

If this seems like a reasonable approach I can put in an draft PR with these changes and then we can see where else we might want to tease apart some of the logic.

duckontheweb avatar Aug 10 '22 00:08 duckontheweb

This means that changes to the links property of that response need to happen in the backend repo even if they are not backend-specific.

This makes sense, definitely something we need to fix for good separation of concerns.

If this seems like a reasonable approach I can put in an draft PR with these changes and then we can see where else we might want to tease apart some of the logic.

This makes sense to me! I imagine its mostly just adding/resolving links which is already relatively standard across backends using the stac_fastapi.types.links module so seems better to do that once in the API layer.

Do you think we have to do the same for item endpoints?

geospatial-jeff avatar Aug 10 '22 01:08 geospatial-jeff

Do you think we have to do the same for item endpoints?

I haven't looked at these in detail, but I imagine we will. I can start a PR with the collections endpoints and then we can build off of that.

duckontheweb avatar Aug 10 '22 01:08 duckontheweb

@duckontheweb Could you please give me push access to https://github.com/stac-utils/stac-fastapi-sqlalchemy?
Getting 403s on push.

geospatial-jeff avatar Aug 10 '22 17:08 geospatial-jeff

@duckontheweb Could you please give me push access to https://github.com/stac-utils/stac-fastapi-sqlalchemy? Getting 403s on push.

Oops, should be all set now

duckontheweb avatar Aug 10 '22 18:08 duckontheweb

The sqlalchemy PR is here https://github.com/stac-utils/stac-fastapi-sqlalchemy/pull/1, I have a few TODOs listed on the PR but it's in pretty good shape.

geospatial-jeff avatar Aug 10 '22 21:08 geospatial-jeff

@geospatial-jeff @duckontheweb checking in on this one — has this stalled out? I’m going to be picking up some dev time for stac-fastapi and I’d like to get across the current state of play before I dive in too deep.

gadomski avatar Nov 10 '22 17:11 gadomski

Hey @gadomski that sounds great! This has stalled out but I'm going to be picking it back up. I have some time set aside tomorrow and Saturday to continue working on this issue. It's been a while since we started so I also need to do some accounting to figure out exactly what we've done and how much is left to do. I will follow up tomorrow with a more detailed comment on the status of this issue.

geospatial-jeff avatar Nov 10 '22 19:11 geospatial-jeff

At a high level:

  1. Update #450 with main, there are some changes that have been released since we started working on this issue.
  2. Splitting out the sqlalchemy/pgstac backends into independent repos has revealed some issues with separation of concerns between the API layer (stac_fastapi.api, stac_fastapi.types, stac_fastapi.extensions) and the backends. The main issue is with link generation and how the current implementation is passing the starlette.requests.Request object around. #472 moves link generation to the API layer, this PR needs to be finished and merged into #450.
  3. Update https://github.com/stac-utils/stac-fastapi-sqlalchemy and https://github.com/stac-utils/stac-fastapi-pgstac with main to preserve changes when we cut over.
  4. Update https://github.com/stac-utils/stac-fastapi-sqlalchemy and https://github.com/stac-utils/stac-fastapi-pgstac with #450. Main challenge here will be to implement the new link generation introduced in #472.
  5. Do a final review of https://github.com/stac-utils/stac-fastapi-sqlalchemy and https://github.com/stac-utils/stac-fastapi-pgstac to make sure the code looks good and everything is working as expected. There are a few open PRs in each repo that need to be finished up and merged (although some of them look finished, just not merged yet). Also confirm that separation of concerns is better; this may require some iteration.

Some additional steps to cut over to the new repo structure cleanly. We did many of these a while ago but they are worth revisiting now as new code has been pushed to stac-fastapi since then.

  1. Move backend specific issues over to the appropriate backend repo.
  2. Merge any open PRs to prevent merge conflicts.
  3. Release a new stac-fastapi version before we do the cutover.

geospatial-jeff avatar Nov 12 '22 03:11 geospatial-jeff

Great, thanks for the summary. If I'm understanding correctly, #472 is the biggest blocker at the moment -- the rest of the TODOs seem to be more general housekeeping?

I can start on some of the housekeeping tasks, after which I can bring an extra set of hands/eyes to #472, with the caveat that I'm still learning the code base so will be a little slow/ignorant :-).

gadomski avatar Nov 14 '22 14:11 gadomski