crewAI icon indicating copy to clipboard operation
crewAI copied to clipboard

When crewai memory=True, and many agents, os error: memory path too long

Open DarrenLynch opened this issue 1 year ago • 3 comments

OSError: [Errno 63] File name too long: '/Users/darrenlynch/Library/Application Support/[project name]/short_term/[some agent name] _ [some agent name] _ [some agent name] _ [some agent name] _ [some agent name] _ [some agent name] _ [some agent name]'

DarrenLynch avatar Jul 02 '24 18:07 DarrenLynch

If not already can you try on the latest version crewai-0.35.8

theCyberTech avatar Jul 03 '24 05:07 theCyberTech

I also get this on 0.36.0. Please advise.

ChristianJohnston97 avatar Jul 12 '24 06:07 ChristianJohnston97

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Aug 11 '24 12:08 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Aug 16 '24 12:08 github-actions[bot]

Suggested modification Update the init of RAGStorage key in rag_storage.py to use an MD5 signature instead of concatenated name def init(self, type, allow_reset=True, embedder_config=None, crew=None): super().init() if ( not os.getenv("OPENAI_API_KEY") and not os.getenv("OPENAI_BASE_URL") == "https://api.openai.com/v1" ): os.environ["OPENAI_API_KEY"] = "fake"

    # Create a hash of agent roles for memory path
    if crew and crew.agents:
        # Sort roles for consistent hashing across runs
        agent_roles = sorted([self._sanitize_role(agent.role) for agent in crew.agents])
        roles_str = "_".join(agent_roles)
        # Create a hash of the roles string
        memory_key = md5(roles_str.encode(), usedforsecurity=False).hexdigest()[:12]
    else:
        memory_key = "default"

    config = {
        "app": {
            "config": {"name": type, "collect_metrics": False, "log_level": "ERROR"}
        },
        "chunker": {
            "chunk_size": 5000,
            "chunk_overlap": 100,
            "length_function": "len",
            "min_chunk_size": 150,
        },
        "vectordb": {
            "provider": "chroma",
            "config": {
                "collection_name": type,
                "dir": f"{db_storage_path()}/{type}/{memory_key}",  # Use hashed memory key
                "allow_reset": allow_reset,
            },
        },
    }

dchevenement avatar Nov 15 '24 16:11 dchevenement

@dchevenement You are correct but the rag_storage.py code has been changed now.

Current code is something like this

    def __init__(self, type, allow_reset=True, embedder_config=None, crew=None, path=None):
        super().__init__(type, allow_reset, embedder_config, crew)
        agents = crew.agents if crew else []
        agents = [self._sanitize_role(agent.role) for agent in agents]
        agents = "_".join(agents)
        self.agents = agents

        self.type = type

        self.allow_reset = allow_reset
        self.path = path
        self._initialize_app()

    def _set_embedder_config(self):
        configurator = EmbeddingConfigurator()
        self.embedder_config = configurator.configure_embedder(self.embedder_config)

    def _initialize_app(self):
        import chromadb
        from chromadb.config import Settings

        self._set_embedder_config()
        chroma_client = chromadb.PersistentClient(
            path=self.path if self.path else f"{db_storage_path()}/{self.type}/{self.agents}",
            settings=Settings(allow_reset=self.allow_reset),
        )

so in the path folder f"{db_storage_path()}/{self.type}/{self.agents} is causing the issue as self.agents can be very long. a hack around that I have implemented is instead of self.agents, I am using the current time. It works but not sure if this is correct

The new line would be something like this

import time
curr_time= time.time()
path=self.path if self.path else f"{db_storage_path()}/{self.type}/{curr_time}"

dundermain avatar Jan 02 '25 19:01 dundermain

I would not recommend to use a time based code for the memory as it would change your memory for your crew depending on the time you launch your crew. Use md5 hash, it's keeping it constant for a given crew of agents as the string of agent role would remain constant accross several launch, hence the hash as well do not forget to had from hashlib import md5

def __init__(self, type, allow_reset=True, embedder_config=None, crew=None, path=None):
    super().__init__(type, allow_reset, embedder_config, crew)
    agents = crew.agents if crew else []
    if agents:
        agent_roles = "_".join(sorted([self._sanitize_role(agent.role) for agent in agents]))
        agents_hash = md5(agent_roles.encode(), usedforsecurity=False).hexdigest()[:12]
    else:
        agents_hash = "default"
    self.agents = agents_hash

    self.type = type

    self.allow_reset = allow_reset
    self.path = path
    self._initialize_app()

dchevenement avatar Jan 02 '25 23:01 dchevenement

Oh understood. Makes sense. I have integrated md5 hash now. Thanks a lot for the suggestion.

dundermain avatar Jan 03 '25 06:01 dundermain