kenho211
kenho211
Have the same error, but the cache dir is iopath_cache instead of fvcore_cache 
When trying to avoid duplicating text by discarding overlapping bounding boxes, a threshold of 80% is used (intersection of bounding boxes over the first bounding box in the pair). In...
Yes, I am using a unified annotaion format (labelling all text in the same text line as one bbox). Thank you for the suggestion on modifying post-processing params.
I am reading through the tips in documentation (https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/finetune.md) 1. PP-OCR提供的预训练模型有较好的泛化能力 2. 加入少量真实数据(检测任务>=500张, 识别任务>=5000张),会大幅提升垂类场景的检测与识别效果 3. 在模型微调时,加入真实通用场景数据,可以进一步提升模型精度与泛化性能 4. 在图像检测任务中,增大图像的预测尺度,能够进一步提升较小文字区域的检测效果 5. 在模型微调时,需要适当调整超参数(学习率,batch size最为重要),以获得更优的微调效果。 For point 2, is 真实数据 referring to scene text...
Same issue on ubuntu 18.04 using docker image python:3.8.12
https://github.com/m-bain/whisperX/pull/795 seems to have fixed this
from_pretrained allow huggingface model name OR model file path.
Encounter another error for audio without speech. Not the same one as in https://github.com/SYSTRAN/faster-whisper/pull/973 ` File "/home/ubuntu/.local/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 362, in transcribe clip_timestamps = merge_segments(active_segments, vad_parameters) File "/home/ubuntu/.local/lib/python3.10/site-packages/faster_whisper/vad.py", line 315, in...
> Hello, what is the use case where it needs to be supplied as `None`? From https://github.com/openai/whisper/pull/676, the use case of `None` should be to conduct a majority vote for...