Hubert
Hubert
不定长的识别问题
你好,用您提供的开源模型进行不定长测试,有这两种问题: **1.图片不定长:** transformer = dataset.resizeNormalize((280, 32)),非280会报错,CRNN的处理是按照32的高然后同比例缩放图片的宽,因此输入是(x,32) **2.文字不定长:** 可能是因为训练的时候都是10个字,预测的时候不管图片里面几个字,预测结果还都是10个字左右? 举个例子,把图片  中的字去掉几个后,还是280*32输入识别,  结果是这样: predict_str:,__不愿意意意资(9个字) => prob:0.002346405293792486  predict_str:中国通信信位主办、《 (10个字) => prob:0.05960559844970703  predict_str:,(通信学会主主府 (9个字) => prob:0.000349084148183465  predict_str:叶国通信学会主里”《 (10个字) =>...
i have the images and masks, and can get the "BitMasks", but transfiner need another groundtruth,that`s poly_masks https://github.com/SysCV/transfiner/blob/5b61fb53d8df5484f44c8b7d8415f398fd283ddc/detectron2/data/detection_utils.py#L437 i dont know how to express the instances with mask_format=='polygon'. my mask...
根据文档一步一步走下来,在提取特征的时候,保存pkl存了3种数据,分别是 video_features = { 'image_feature': np_image_features, 'audio_feature': np_audio_features, 'pcm_feature': np_pcm_features } 但是在get_instance_for_bmn.py 里面,并没有用audio_feature feature_video = np.concatenate((image_feature, pcm_feature), axis=1) 而且train_proposal/configs/bmn_football_v2.0.yaml 里 feat_dim: 2688 #train bmn with image feature. If add audio...
python -m scripts.animate --config configs/prompts/v1/v1-1-ToonYou.yaml File "/export/software/anaconda3/envs/animatediff/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3370, in _pad encoded_inputs["attention_mask"] = encoded_inputs["attention_mask"] + [0] * difference OverflowError: cannot fit 'int' into an index-sized integer