Jayanta Nath
Jayanta Nath
Hi Shrini, This is a proposal to run this script from http://tools.wmflabs.org, so it will be OS independent.
jayanta@jayanta-Inspiron-3541:~/OCR2$ python do_ocr.py INFO:**main**:Running do_ocr.py 1.50 INFO:root:Operating System = "Ubuntu 14.04.3 LTS" INFO:**main**:URL = https://upload.wikimedia.org/wikipedia/commons/e/ea/%E0%A6%AC%E0%A6%BF%E0%A6%B6%E0%A7%8D%E0%A6%AC%E0%A6%95%E0%A7%8B%E0%A6%B7_%E0%A6%B7%E0%A6%B7%E0%A7%8D%E0%A6%A0_%E0%A6%96%E0%A6%A3%E0%A7%8D%E0%A6%A1.djvu INFO:**main**:Columns = 1 INFO:**main**:Wiki Username = JoyBot INFO:**main**:Wiki Password = Not logging the password INFO:**main**:Wiki...
# INFO:**main**: uploading page_00001.pdf to google Drive. INFO:**main**:Running gdput.py -t ocr -f 0B1OpcVV-_vRSSzZIeElmRE9fMlE page_00004.pdf | tee page_00004.log Traceback (most recent call last): File "/usr/local/bin/gdput.py", line 252, in Uploading file: page_00001.pdf...
I have observed that every re-run do_ocr to complete the full OCR, every time this script Spliting the PDF into single pages. Although all single present at folder. Can it...
Today I am running with 723 pages book, only two page stucked every time. =========ERROR=========== INFO:**main**:Missing page_00099.txt INFO:**main**:page_00099.pdf should be reuploaded INFO:**main**:Missing page_00267.txt INFO:**main**:page_00267.pdf should be reuploaded INFO:**main**: Text files...
convert.im6: unable to open image `charit.pdf': No such file or directory @ error/blob.c/OpenBlob/2638. convert.im6: no images defined`shrini-%03d.jpg' @ error/convert.c/ConvertImageCommand/3044
bub is down past few days.
Proposal:Upload at IA should done by user's account during through Wildard entry, So the use may later change meta data as necessary.
:~$ sudo apt-get install ibus-avro [sudo] password for .....: Reading package lists... Done Building dependency tree Reading state information... Done E: Unable to locate package ibus-avro