graphiti icon indicating copy to clipboard operation
graphiti copied to clipboard

Add progress reporting to add_episode_bulk function

Open ghosnandre opened this issue 9 months ago • 7 comments

Feature Request: Progress Logging for add_episode_bulk Function in Python SDK

Summary The add_episode_bulk function currently lacks progress reporting or structured logging beyond what is available at logging.DEBUG. For large ingest jobs, it is difficult to track progress or provide feedback to the end user.

If add_episode_bulk is the recommended method for handling large-scale ingestion, there should be a user-friendly way to monitor the process. This is particularly important in long-running tasks where users expect visibility into progress and status.

Current Behavior • Logs are only visible via logging.DEBUG, and even then, they do not report meaningful progress (e.g., step-by-step status or item counts). • No structured callbacks, hooks, or updates are available during processing.

Docstring Reference According to the function docstring, add_episode_bulk performs the following steps:

Notes This method performs several steps including:

  • Saving all episodes to the database
  • Retrieving previous episode context for each new episode
  • Extracting nodes and edges from all episodes
  • Generating embeddings for nodes and edges
  • Deduplicating nodes and edges
  • Saving nodes, episodic edges, and entity edges to the knowledge graph

Proposed Improvement Introduce progress updates during each major step. Ideally, this would include: • The current step name (e.g., “Generating embeddings”) • The current episode index (e.g., “Episode 12 of 300”) • Optionally: time estimates, batch-level status, or percentage complete

Possible Implementation Ideas • Emit structured logs with step and item-level progress • Add optional progress_callback(step: str, current: int, total: int) argument

Impact This would significantly improve usability for those integrating add_episode_bulk into automated pipelines or user-facing tools, and reduce the need for custom wrappers just to add visibility.

ghosnandre avatar Jul 19 '25 14:07 ghosnandre

Another nice feature for progress tracking would be a checkpoint feature like in ML training where you can save your current state and resume processing later.

This is useful again for big jobs where you might want to stop processing for some reason then resume later without having to start again from 0.

ghosnandre avatar Jul 19 '25 15:07 ghosnandre

I agree, that would be exceedingly helpful. I'm not even there yet with our app, we are only just learning that add_episode_bulk() returns "None". There's not even a way to get uuid's back from bulk add (shocking).

CoPilot confirms: "Currently, based on the implementation of add_episode_bulk() in graphiti_core/graphiti.py, the function does not return any value (it returns None). To have add_episode_bulk return the UUIDs (or any details about the episodes added), you would need to modify the source code of graphiti_core itself."

Any support with this would be greatly appreciated.

kcsf avatar Aug 05 '25 15:08 kcsf

@ghosnandre Is this still relevant? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Oct 05 '25 00:10 claude[bot]

@ghosnandre Is this still relevant? Please confirm within 14 days or this issue will be closed.

Yes, as the recommended function call for bulk data ingest which are usually long-running, there needs to be some progress reporting mechanism we can relay to users. Without this, this is unusable.

ghosnandre avatar Oct 05 '25 06:10 ghosnandre

@ghosnandre Is this still an issue? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Oct 22 '25 00:10 claude[bot]

@ghosnandre Is this still an issue? Please confirm within 14 days or this issue will be closed.

yes

ghosnandre avatar Oct 22 '25 03:10 ghosnandre

@ghosnandre Is this still relevant? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Nov 17 '25 00:11 claude[bot]