Error message for custom embedding model: 'NoneType' object has no attribute 'tokenize'

Open Ccccx opened this issue 1 year ago • 0 comments

LocalAI version: localai/localai:v2.19.4-cublas-cuda12

Environment, CPU architecture, OS, and Version:

Ubuntu 22.04
Cuda compilation tools, release 12.4, V12.4.131
Memery 64G、A 10 GPU

Describe the bug

I used embeddings the way I used a custom model, and the service started fine, but I received the following return. { "error": { "code": 500, "message": "rpc error: code = Unknown desc = Exception calling application: 'NoneType' object has no attribute 'tokenize'", "type": "" } }

To Reproduce My model defines the configuration file:

name: text2vec-base-chinese
backend: sentencetransformers
embeddings: true
parameters:
  models: shibing624/text2vec-base-chinese
  model_name_or_path: /build/models/text2vec-base-chinese
  local_files_only: True
usage: |
    You can test this model with curl like this:

    curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
      "input": "你好啊（默认情况下，超过 256 个单词段的输入文本将被截断。）",
      "model": "text2vec-base-chinese"
    }'

Request content： curl --location --request POST 'http://127.0.01:8080/embeddings' \ --header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \ --header 'Content-Type: application/json' \ --data-raw '{ "input": "你好啊", "model": "text2vec-base-chinese" }'

Expected behavior The correct vector response should be returned, but I'm confused about this return, not knowing if it's a compatibility issue or a bug

Logs api_1 | 8:30AM DBG Request received: {"model":"text2vec-base-chinese","language":"","translate":false,"n":0,"top_p":null,"top_k":null,"temperature":null,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"repeat_last_n":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","size":"","prompt":null,"instruction":"","input":"hello world","stop":null,"messages":null,"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"backend":"","model_base_name":""} api_1 | 8:30AM DBG guessDefaultsFromFile: not a GGUF file api_1 | 8:30AM DBG Parameter Config: &{PredictionOptions:{Model: Language: Translate:false N:0 TopP:0xc000e4d988 TopK:0xc000e4d990 Temperature:0xc000e4d998 Maxtokens:0xc000e4d9c8 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000e4d9c0 TypicalP:0xc000e4d9b8 Seed:0xc000e4d9e0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:text2vec-base-chinese F16:0xc000e4d980 Threads:0xc000e4d978 Debug:0xc0004bc850 Roles:map[] Embeddings:0xc000e4d96d Backend:sentencetransformers TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:} PromptStrings:[] InputStrings:[hello world] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType:} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000e4d9b0 MirostatTAU:0xc000e4d9a8 Mirostat:0xc000e4d9a0 NGPULayers:0xc000e4d9d0 MMap:0xc000e4d9d8 MMlock:0xc000e4d9d9 LowVRAM:0xc000e4d9d9 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000e4d970 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: FlashAttention:false NoKVOffloading:false RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: VallE:{AudioPath:}} CUDA:false DownloadFiles:[] Description: Usage:You can test this model with curl like this: api_1 | api_1 | curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{ api_1 | "input": "你好啊（默认情况下，超过 256 个单词段的输入文本将被截断。）", api_1 | "model": "text2vec-base-chinese" api_1 | }' api_1 | } api_1 | 8:30AM INF Loading model with backend sentencetransformers api_1 | 8:30AM DBG Model already loaded in memory: api_1 | 8:30AM DBG GRPC(-127.0.0.1:35523): stderr Calculated embeddings for: hello world api_1 | 8:30AM ERR Server error error="rpc error: code = Unknown desc = Exception calling application: 'NoneType' object has no attribute 'tokenize'" ip=123.161.203.27 latency=2.440391ms method=POST status=500 url=/embeddings

Aug 24 '24 08:08 Ccccx