Optimize Memory Search: Normalize entity names and observation contents to lowercase on add, create and delete for consistent comparisons

Open bernoussama opened this issue 8 months ago • 1 comments

Description

Normalized entity and relation properties to lowercase before saving to ensure consistent search behavior in the knowledge graph. This prevents duplicate entries that differ only by case and improves search accuracy.

Server Details

Server: memory
Changes to: tools (knowledge graph functionality)

Motivation and Context

Unnecessary Normalizing to lowercase on each call to searchNodes function. This change ensures all entity and relation properties are consistently stored in lowercase format, improving data integrity and search reliability.

How Has This Been Tested?

Not tested

Breaking Changes

No breaking changes. This is a non-breaking improvement to internal data handling.

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)

Checklist

[x] I have read the MCP Protocol Documentation
[x] My changes follows MCP security best practices
[x] I have tested this with an LLM client
[x] My code follows the repository's style guidelines
[x] New and existing tests pass locally
[x] I have added appropriate error handling

Additional context

This change affects how entity and relation data is stored internally but maintains backward compatibility with existing client interactions. The normalized approach provides more consistent behavior for user queries.

May 13 '25 23:05 bernoussama

This PR introduces a breaking change that will affect existing users with mixed-case data in their memory.jsonl files. Most users of this MCP already have existing memory.json or memory.jsonl files, and this PR does not account for them.

Problem

The changes normalize all new entities, relations, and observations to lowercase when saving, but existing data remains in mixed case. Specifically:

New data: All entity names, types, relations, and observations are converted to lowercase before saving
Existing data: Remains in original mixed case
Search impact: The searchNodes function now expects lowercase data, but will fail to find existing mixed-case entries

This creates an inconsistent state where:

New entities with name "JohnDoe" are saved as "johndoe"
Existing entities with name "JohnDoe" remain as "JohnDoe"
Searching for "john" will only find the new entity, not the existing one

Impact

Users with existing memory.jsonl files will experience:

Broken search functionality for pre-existing entities and relations
Duplicate entities may be created (e.g., both "JohnDoe" and "johndoe")
Data fragmentation between old mixed-case and new lowercase entries

Recommended Solutions

Option 1: Migration with Versioning (Recommended) Add proper migration logic to handle the transition:

// Add to beginning of file
const CURRENT_MEMORY_VERSION = "1.0";

// Modify loadGraph method
private async loadGraph(): Promise<KnowledgeGraph> {
  try {
    const data = await fs.readFile(MEMORY_FILE_PATH, "utf-8");
    const lines = data.split("\n").filter(line => line.trim() !== "");
    
    // Check for version line
    const versionLine = lines.find(line => {
      try {
        const parsed = JSON.parse(line);
        return parsed.type === "memory-version";
      } catch { return false; }
    });
    
    if (!versionLine) {
      // Migrate and add version
      const graph = this.migrateToLowercase(lines);
      await this.saveGraphWithVersion(graph);
      return graph;
    }
    
    // Normal loading for versioned files - skip version line
    return lines.reduce((graph: KnowledgeGraph, line) => {
      try {
        const item = JSON.parse(line);
        if (item.type === "entity") graph.entities.push(item as Entity);
        if (item.type === "relation") graph.relations.push(item as Relation);
        // Skip memory-version lines
      } catch (error) {
        // Skip malformed lines
      }
      return graph;
    }, { entities: [], relations: [] });
  } catch (error) {
    if (error instanceof Error && 'code' in error && (error as any).code === "ENOENT") {
      return { entities: [], relations: [] };
    }
    throw error;
  }
}

// Add migration helper methods
private migrateToLowercase(lines: string[]): KnowledgeGraph {
  return lines.reduce((graph: KnowledgeGraph, line) => {
    try {
      const item = JSON.parse(line);
      if (item.type === "entity") {
        graph.entities.push({
          name: item.name.toLowerCase(),
          entityType: item.entityType.toLowerCase(),
          observations: item.observations.map((obs: string) => obs.toLowerCase())
        });
      }
      if (item.type === "relation") {
        graph.relations.push({
          from: item.from.toLowerCase(),
          to: item.to.toLowerCase(),
          relationType: item.relationType.toLowerCase()
        });
      }
    } catch (error) {
      // Skip malformed lines
    }
    return graph;
  }, { entities: [], relations: [] });
}

private async saveGraphWithVersion(graph: KnowledgeGraph): Promise<void> {
  const lines = [
    // Add version line first
    JSON.stringify({ type: "memory-version", version: CURRENT_MEMORY_VERSION }),
    // Then add all entities and relations
    ...graph.entities.map(e => JSON.stringify({ type: "entity", ...e })),
    ...graph.relations.map(r => JSON.stringify({ type: "relation", ...r })),
  ];
  await fs.writeFile(MEMORY_FILE_PATH, lines.join("\n"));
}

// Modify existing saveGraph to maintain version if it exists
private async saveGraph(graph: KnowledgeGraph): Promise<void> {
  try {
    const data = await fs.readFile(MEMORY_FILE_PATH, "utf-8");
    const lines = data.split("\n").filter(line => line.trim() !== "");
    
    // Check if version already exists
    const hasVersion = lines.some(line => {
      try {
        const parsed = JSON.parse(line);
        return parsed.type === "memory-version";
      } catch { return false; }
    });
    
    if (hasVersion) {
      await this.saveGraphWithVersion(graph);
    } else {
      // Legacy format - no version line
      const lines = [
        ...graph.entities.map(e => JSON.stringify({ type: "entity", ...e })),
        ...graph.relations.map(r => JSON.stringify({ type: "relation", ...r })),
      ];
      await fs.writeFile(MEMORY_FILE_PATH, lines.join("\n"));
    }
  } catch (error) {
    // File doesn't exist - save with version
    await this.saveGraphWithVersion(graph);
  }
}

Option 2: Load-time Normalization Convert everything to lowercase when loading (simpler but less efficient):

private async loadGraph(): Promise<KnowledgeGraph> {
  // ... existing loading logic ...
  
  // Normalize all loaded data to lowercase
  graph.entities = graph.entities.map(e => ({
    ...e,
    name: e.name.toLowerCase(),
    entityType: e.entityType.toLowerCase(),
    observations: e.observations.map(obs => obs.toLowerCase())
  }));
  
  graph.relations = graph.relations.map(r => ({
    ...r,
    from: r.from.toLowerCase(),
    to: r.to.toLowerCase(),
    relationType: r.relationType.toLowerCase()
  }));
  
  return graph;
}

Option 3: Restore Search Compatibility (Minimal change) Keep the search function backward-compatible by restoring the .toLowerCase() calls:

async searchNodes(query: string): Promise<KnowledgeGraph> {
  const graph = await this.loadGraph();
  const queryLower = query.toLowerCase();
  
  const filteredEntities = graph.entities.filter(e => 
    e.name.toLowerCase().includes(queryLower) ||
    e.entityType.toLowerCase().includes(queryLower) ||
    e.observations.some(o => o.toLowerCase().includes(queryLower))
  );
  // ... rest of function
}

Recommendation

Option 1 (Migration with Versioning) is the most robust solution as it:

Provides a clean upgrade path for existing users
Maintains performance after migration
Sets up infrastructure for future schema changes
Ensures data consistency

If that's too complex for this PR, Option 3 provides immediate backward compatibility with minimal code changes.

Sep 21 '25 15:09 nhickster