Add node-based invocation system
This PR adds the core of the node-based invocation system first discussed in https://github.com/invoke-ai/InvokeAI/discussions/597 and implements it through a basic CLI and API. This supersedes #1047, which was too far behind to rebase.
Architecture
Invocations
The core of the new system is invocations, found in /ldm/invoke/app/invocations. These represent individual nodes of execution, each with inputs and outputs. Core invocations are already implemented (txt2img, img2img, upscale, face_restore) as well as a debug invocation (show_image). To implement a new invocation, all that is required is to add a new implementation in this folder (there is a markdown document describing the specifics, though it is slightly out-of-date).
Sessions
Invocations and links between them are maintained in a session. These can be queued for invocation (either the next ready node, or all nodes). Some notes:
- Sessions may be added to at any time (including after invocation), but may not be modified.
- Links are always added with a node, and are always links from existing nodes to the new node. These links can be relative "history" links, e.g.
-1to link from a previously executed node, and can link either specific outputs, or can opportunistically link all matching outputs by name and type by using*. - There are no iteration/looping constructs. Most needs for this could be solved by either duplicating nodes or cloning sessions. This is open for discussion, but is a difficult problem to solve in a way that doesn't make the code even more complex/confusing (especially regarding node ids and history).
Services
These make up the core the invocation system, found in /ldm/invoke/app/services. One of the key design philosophies here is that most components should be replaceable when possible. For example, if someone wants to use cloud storage for their images, they should be able to replace the image storage service easily.
The services are broken down as follows (several of these are intentionally implemented with an initial simple/naïve approach):
- Invoker: Responsible for creating and executing sessions and managing services used to do so.
- Session Manager: Manages session history. An on-disk implementation is provided, which stores sessions as json files on disk, and caches recently used sessions for quick access.
- Image Storage: Stores images of multiple types. An on-disk implementation is provided, which stores images on disk and retains recently used images in an in-memory cache.
- Invocation Queue: Used to queue invocations for execution. An in-memory implementation is provided.
- Events: An event system, primarily used with socket.io to support future web UI integration.
Apps
Apps are available through the /scripts/invoke-new.py script (to-be integrated/renamed).
CLI
python scripts/invoke-new.py
Implements a simple CLI. The CLI creates a single session, and automatically links all inputs to the previous node's output. Commands are automatically generated from all invocations, with command options being automatically generated from invocation inputs. Help is also available for the cli and for each command, and is very verbose. Additionally, the CLI supports command piping for single-line entry of multiple commands. Example:
> txt2img --prompt "a cat eating sushi" --steps 20 --seed 1234 | upscale | show_image
API
python scripts/invoke-new.py --api --host 0.0.0.0
Implements an API using FastAPI with Socket.io support for signaling. API documentation is available at http://localhost:9090/docs or http://localhost:9090/redoc. This includes OpenAPI schema for all available invocations, session interaction APIs, and image APIs. Socket.io signals are per-session, and can be subscribed to by session id. These aren't currently auto-documented, though the code for event emission is centralized in /ldm/invoke/app/services/events.py.
A very simple test html and script are available at http://localhost:9090/static/test.html This demonstrates creating a session from a graph, invoking it, and receiving signals from Socket.io.
What's left?
- There are a number of features not currently covered by invocations. I kept the set of invocations small during core development in order to simplify refactoring as I went. Now that the invocation code has stabilized, I'd love some help filling those out!
- There's no image metadata generated. It would be fairly straightforward (and would make good sense) to serialize either a session and node reference into an image, or the entire node into the image. There are a lot of questions to answer around source images, linked images, etc. though. This history is all stored in the session as well, and with complex sessions, the metadata in an image may lose its value. This needs some further discussion.
- We need a list of features (both current and future) that would be difficult to implement without looping constructs so we can have a good conversation around it. I'm really hoping we can avoid needing looping/iteration in the graph execution, since it'll necessitate separating an execution of a graph into its own concept/system, and will further complicate the system.
- The API likely needs further filling out to support the UI. I think using the new API for the current UI is possible, and potentially interesting, since it could work like the new/demo CLI in a "single operation at a time" workflow. I don't know how compatible that will be with our UI goals though. It would be nice to support only a single API though.
- Deeper separation of systems. I intentionally tried to not touch Generate or other systems too much, but a lot could be gained by breaking those apart. Even breaking apart Args into two pieces (command line arguments and the parser for the current CLI) would make it easier to maintain. This is probably in the future though.