topictiling icon indicating copy to clipboard operation
topictiling copied to clipboard

java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "java.util.Map.get(Object)" is null

Open schanz007 opened this issue 1 year ago • 0 comments

Hello there,

I'm working currently on my master thesis, in which I look into different text segmentation algorithms and their benefits for preprocessing documents in Retrieval Augmented Generation (RAG). For this, I wanted to run your TopicTiling method because it seems very promising, however I encountered the following problem. Somehow, it throws a NullPointerException. As a start, I used multiple texts like the one from a previous issue or parts of the Readme File, the paper, etc. I get always the same error.

I called the project with following command: sh topictiling.sh -ri 5 -tmd topicmodel -tmn model-final -fp "Test.txt" -fd files_to_segment -s

I use Java17.

The whole error code: INFORMATION: Found [1] resources to be read The current version uses the Stanford segmenter for tokenization. However, this tokenizer does not play well on languages without any latin characters (e.g. Chinese, Arabic, Hebrew, Japanese, etc.). In order to segment such languages, segment the texts beforehand and use the parameter -s that disables the tokenization and expects all words segmented by white spaces.

Nov. 05, 2024 4:12:05 PM jgibbslda.Model readOthersFile(188) WARNUNG: Error while reading other file:topicmodel/model-final.others (Datei oder Verzeichnis nicht gefunden) java.io.FileNotFoundException: topicmodel/model-final.others (Datei oder Verzeichnis nicht gefunden) at java.base/java.io.FileInputStream.open0(Native Method) at java.base/java.io.FileInputStream.open(FileInputStream.java:216) at java.base/java.io.FileInputStream.(FileInputStream.java:157) at java.base/java.io.FileInputStream.(FileInputStream.java:111) at java.base/java.io.FileReader.(FileReader.java:60) at jgibbslda.Model.readOthersFile(Model.java:150) at jgibbslda.Model.loadModel(Model.java:254) at jgibbslda.Model.initEstimatedModel(Model.java:658) at jgibbslda.Inferencer.init(Inferencer.java:62) at de.tudarmstadt.langtech.semantics.segmentation.segmenter.TopicTiling.(TopicTiling.java:95) at de.tudarmstadt.langtech.semantics.segmentation.segmenter.annotator.TopicTilingSegmenterAnnotator.process(TopicTilingSegmenterAnnotator.java:119) at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:375) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:296) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.uimafit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:223) at org.uimafit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:143) at de.tudarmstadt.langtech.semantics.segmentation.segmenter.RunTopicTilingOnFile.(RunTopicTilingOnFile.java:133) at de.tudarmstadt.langtech.semantics.segmentation.segmenter.RunTopicTilingOnFile.main(RunTopicTilingOnFile.java:94) Nov. 05, 2024 4:12:05 PM jgibbslda.Model initEstimatedModel(659) WARNUNG: Fail to load word-topic assignment file of the model!

Nov. 05, 2024 4:12:05 PM org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl callAnalysisComponentProcess(407) SCHWERWIEGEND: Exception occurred org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:391) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:296) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.uimafit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:223) at org.uimafit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:143) at de.tudarmstadt.langtech.semantics.segmentation.segmenter.RunTopicTilingOnFile.(RunTopicTilingOnFile.java:133) at de.tudarmstadt.langtech.semantics.segmentation.segmenter.RunTopicTilingOnFile.main(RunTopicTilingOnFile.java:94) Caused by: java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "java.util.Map.get(Object)" is null at jgibbslda.Inferencer.infSampling(Inferencer.java:184) at jgibbslda.Inferencer.inference(Inferencer.java:99) at jgibbslda.Inferencer.inference(Inferencer.java:126) at de.tudarmstadt.langtech.semantics.segmentation.segmenter.TopicTiling.inference(TopicTiling.java:508) at de.tudarmstadt.langtech.semantics.segmentation.segmenter.TopicTiling.getSimilarityScores(TopicTiling.java:366) at de.tudarmstadt.langtech.semantics.segmentation.segmenter.TopicTiling.segment2(TopicTiling.java:150) at de.tudarmstadt.langtech.semantics.segmentation.segmenter.TopicTiling.segment(TopicTiling.java:120) at de.tudarmstadt.langtech.semantics.segmentation.segmenter.TopicTiling.segment(TopicTiling.java:111) at de.tudarmstadt.langtech.semantics.segmentation.segmenter.annotator.TopicTilingSegmenterAnnotator.process(TopicTilingSegmenterAnnotator.java:125) at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:375) ... 6 more

schanz007 avatar Nov 05 '24 16:11 schanz007