KAG icon indicating copy to clipboard operation
KAG copied to clipboard

kag产品端联动问题,index构建报错

Open lpdswing opened this issue 5 months ago • 0 comments

  • 参考官方文档尝试把约束模式抽取的extractor添加到kag_index_manager.py
@KAGIndexManager.register("spo_graph_index")
class KAGConstraintSPOIndexManager(KAGIndexManager):
    @property
    def name(self):
        return "图谱约束模式索引管理器"

    @property
    def description(self) -> str:
        return "它首先从文本中抽取实体和关系构建图谱,然后将文本块与图谱节点关联起来。在检索时,它同时利用图谱的结构化查询能力(CS/FR Retriever)和文本的向量检索能力,实现更精准、更具推理能力的检索,特别适合复杂的问答任务。"

    @property
    def schema(self) -> str:
        return """
Chunk(文本块): IndexType
     properties:
        content(内容): Text
          index: TextAndVector  

KnowledgeUnit(知识点): IndexType
     properties:
        structedContent(结构化文本): Text
          index: TextAndVector
        ontology(本体): Text
        desc(描述): Text
            index: TextAndVector
        relatedQuery(关联问): AtomicQuery
        extendedKnowledge(关联外扩知识点):Text
        content(内容): Text
            index: TextAndVector
        knowledgeType(知识类型): Text

AtomicQuery(原子问): IndexType
  properties:
    title(标题): Text
      index: TextAndVector
  relations:
    sourceChunk(关联文本块): Chunk
    similar(相似问题): AtomicQuery
    relatedTo(相关): KnowledgeUnit      
        """

    @property
    def index_cost(self) -> str:
        msg = """
        索引构建的成本:
        
        未知
        """
        return msg

    @property
    def applicable_scenarios(self) -> str:
        return """
        **适用场景**: 适用于需要深度推理和关联分析的复杂问题。它能够理解问题的结构,并在知识图谱和文本块之间进行联合查询,应对需要跨领域知识或多跳推理的场景。

        **检索流程(多路并行)**:
        - **路径1 (FR)**: `kg_fr_retriever(query)` -> 自由文本检索,召回相关文本块。
        - **路径2 (CS)**: `kg_cs_retriever(logic_form)` -> 基于问题的逻辑结构,在图谱中进行精确检索。
        - **路径3 (RC)**: `vector_chunk_retriever(query)` -> 纯向量检索,作为补充召回。
        
        最终将多路结果融合,提供最全面的答案依据。
        """

    @property
    def retrieval_method(self) -> str:
        return "通过构建chunk 与 图谱的关联,实现图谱、chunk 的检索,一般用于检索与图谱相关的chunk"

    @classmethod
    def build_extractor_config(
        cls, llm_config: Dict, vectorize_model_config: Dict, **kwargs
    ):
        kb_task_project_id = kwargs.get(KAGConstants.KAG_QA_TASK_CONFIG_KEY, None)
        return [
            {
                "type": "schema_constraint_extractor",
                "llm": llm_config,
                "ner_prompt": {"type": "spg_entity"},
                "std_prompt": {"type": "default_std"},
                "relation_prompt": {"type": "spg_relation"},
                "event_prompt": {"type": "spg_event"},
                "kag_qa_task_config_key": kb_task_project_id,
            }
        ]

    @classmethod
    def build_retriever_config(
        cls, llm_config: Dict, vectorize_model_config: Dict, **kwargs
    ):
        kb_task_project_id = kwargs.get(KAGConstants.KAG_QA_TASK_CONFIG_KEY, None)
        return [
            {
                "type": "kg_cs_open_spg",
                "path_select": {
                    "type": "exact_one_hop_select",
                    "vectorize_model": vectorize_model_config,
                    "search_api": {
                        "type": "openspg_search_api",
                        "kag_qa_task_config_key": kb_task_project_id,
                    },
                    "graph_api": {
                        "type": "openspg_graph_api",
                        "kag_qa_task_config_key": kb_task_project_id,
                    },
                    "kag_qa_task_config_key": kb_task_project_id,
                },
                "entity_linking": {
                    "type": "entity_linking",
                    "recognition_threshold": 0.9,
                    "exclude_types": ["Chunk"],
                    "vectorize_model": vectorize_model_config,
                    "search_api": {
                        "type": "openspg_search_api",
                        "kag_qa_task_config_key": kb_task_project_id,
                    },
                    "graph_api": {
                        "type": "openspg_graph_api",
                        "kag_qa_task_config_key": kb_task_project_id,
                    },
                    "kag_qa_task_config_key": kb_task_project_id,
                },
                "std_schema": {
                    "type": "default_std_schema",
                    "vectorize_model": vectorize_model_config,
                    "search_api": {
                        "type": "openspg_search_api",
                        "kag_qa_task_config_key": kb_task_project_id,
                    },
                    "kag_qa_task_config_key": kb_task_project_id,
                },
                "llm": llm_config,
                "kag_qa_task_config_key": kb_task_project_id,
            },
            {
                "type": "kg_fr_knowledge_unit",
                "top_k": 20,
                "search_api": {
                    "type": "openspg_search_api",
                    "kag_qa_task_config_key": kb_task_project_id,
                },
                "graph_api": {
                    "type": "openspg_graph_api",
                    "kag_qa_task_config_key": kb_task_project_id,
                },
                "path_select": {
                    "type": "fuzzy_one_hop_select",
                    "llm_client": llm_config,
                    "vectorize_model": vectorize_model_config,
                    "search_api": {
                        "type": "openspg_search_api",
                        "kag_qa_task_config_key": kb_task_project_id,
                    },
                    "graph_api": {
                        "type": "openspg_graph_api",
                        "kag_qa_task_config_key": kb_task_project_id,
                    },
                    "kag_qa_task_config_key": kb_task_project_id,
                },
                "ppr_chunk_retriever_tool": {
                    "type": "ppr_chunk_retriever",
                    "llm_client": llm_config,
                    "vectorize_model": vectorize_model_config,
                    "search_api": {
                        "type": "openspg_search_api",
                        "kag_qa_task_config_key": kb_task_project_id,
                    },
                    "graph_api": {
                        "type": "openspg_graph_api",
                        "kag_qa_task_config_key": kb_task_project_id,
                    },
                    "kag_qa_task_config_key": kb_task_project_id,
                    "ner": {
                        "type": "ner",
                        "kag_qa_task_config_key": kb_task_project_id,
                        "ner_prompt": {
                            "type": "default_question_ner",
                            "kag_qa_task_config_key": kb_task_project_id,
                        },
                        "std_prompt": {"type": "default_std"},
                        "llm_module": llm_config,
                    },
                },
                "entity_linking": {
                    "type": "entity_linking",
                    "recognition_threshold": 0.8,
                    "exclude_types": ["Chunk"],
                    "vectorize_model": vectorize_model_config,
                    "search_api": {
                        "type": "openspg_search_api",
                        "kag_qa_task_config_key": kb_task_project_id,
                    },
                    "graph_api": {
                        "type": "openspg_graph_api",
                        "kag_qa_task_config_key": kb_task_project_id,
                    },
                    "kag_qa_task_config_key": kb_task_project_id,
                },
                "std_schema": {
                    "type": "default_std_schema",
                    "vectorize_model": vectorize_model_config,
                    "search_api": {
                        "type": "openspg_search_api",
                        "kag_qa_task_config_key": kb_task_project_id,
                    },
                    "": kb_task_project_id,
                },
                "llm": llm_config,
                "kag_qa_task_config_key": kb_task_project_id,
            },
            {
                "type": "rc_open_spg",
                "search_api": {
                    "type": "openspg_search_api",
                    "kag_qa_task_config_key": kb_task_project_id,
                },
                "vector_chunk_retriever": {
                    "type": "vector_chunk_retriever",
                    "vectorize_model": vectorize_model_config,
                    "score_threshold": 0.65,
                    "search_api": {
                        "type": "openspg_search_api",
                        "kag_qa_task_config_key": kb_task_project_id,
                    },
                    "kag_qa_task_config_key": kb_task_project_id,
                },
                "vectorize_model": vectorize_model_config,
                "top_k": 20,
                "kag_qa_task_config_key": kb_task_project_id,
            },
        ]

在页面上新建项目,修改自定义schema,然后上传文档出现报错。

Create Index

2025-11-15 16:28:09(10.199.0.5): Task scheduling completed. cost:38 ms !
2025-11-15 16:28:09(10.199.0.5): Lock released successfully!
2025-11-15 16:28:09(10.199.0.5): Scheduler execute failed with error:com.antgroup.openspg.core.schema.model.SchemaException: property: eventTime is defined by system, no need to define or alter
        at com.antgroup.openspg.core.schema.model.SchemaException.alterError(SchemaException.java:34)
        at com.antgroup.openspg.server.biz.schema.impl.SchemaManagerImpl.alterSchema(SchemaManagerImpl.java:81)
        at com.antgroup.openspg.server.biz.schema.impl.SchemaManagerImpl$$FastClassBySpringCGLIB$$d616d11c.invoke(<generated>)
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
        at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:708)
        at com.antgroup.openspg.server.biz.schema.impl.SchemaManagerImpl$$EnhancerBySpringCGLIB$$3c0fdfcf.alterSchema(<generated>)
        at com.antgroup.openspg.server.core.scheduler.service.task.sync.builder.RetrievalSyncTask.submit(RetrievalSyncTask.java:91)
        at com.antgroup.openspg.server.core.scheduler.service.task.sync.SyncTaskExecuteTemplate.execute(SyncTaskExecuteTemplate.java:25)
        at com.antgroup.openspg.server.core.scheduler.service.task.TaskExecuteTemplate.executeEntry(TaskExecuteTemplate.java:53)
        at com.antgroup.openspg.server.core.scheduler.service.engine.impl.SchedulerExecuteServiceImpl.executeTask(SchedulerExecuteServiceImpl.java:161)
        at com.antgroup.openspg.server.core.scheduler.service.engine.impl.SchedulerExecuteServiceImpl.lambda$executeInstance$2(SchedulerExecuteServiceImpl.java:144)
        at java.util.ArrayList.forEach(ArrayList.java:1259)
        at com.antgroup.openspg.server.core.scheduler.service.engine.impl.SchedulerExecuteServiceImpl.executeInstance(SchedulerExecuteServiceImpl.java:144)
        at com.antgroup.openspg.server.core.scheduler.service.engine.impl.SchedulerExecuteServiceImpl.executeInstance(SchedulerExecuteServiceImpl.java:132)
        at com.antgroup.openspg.server.core.scheduler.service.api.impl.SchedulerServiceImpl.lambda$executeJob$0(SchedulerServiceImpl.java:124)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException: property: eventTime is defined by system, no need to define or alter
        at com.antgroup.openspg.server.core.schema.service.alter.check.PropertyChecker.checkBuiltInProperty(PropertyChecker.java:205)
        at com.antgroup.openspg.server.core.schema.service.alter.check.PropertyChecker.check(PropertyChecker.java:68)
        at com.antgroup.openspg.server.core.schema.service.alter.check.BaseSpgTypeChecker.check(BaseSpgTypeChecker.java:62)
        at com.antgroup.openspg.server.core.schema.service.alter.check.SchemaAlterChecker.check(SchemaAlterChecker.java:47)
        at com.antgroup.openspg.server.core.schema.service.alter.stage.PreProcessStage.execute(PreProcessStage.java:65)
        at com.antgroup.openspg.server.core.schema.service.alter.SchemaAlterPipeline.run(SchemaAlterPipeline.java:40)
        at com.antgroup.openspg.server.biz.schema.impl.SchemaManagerImpl.alterSchema(SchemaManagerImpl.java:79)
        ... 24 more

2025-11-15 16:28:09(10.199.0.5): update index(spo_graph_index) schema
2025-11-15 16:28:09(10.199.0.5): update index schema index_ids:[7]
2025-11-15 16:28:09(10.199.0.5): Lock preempted successfully!

lpdswing avatar Nov 15 '25 08:11 lpdswing