label-studio icon indicating copy to clipboard operation
label-studio copied to clipboard

Multi-page annotation ML backend

Open quangvhhe140489 opened this issue 2 years ago • 5 comments

I want to make with ~8 images at once This is the view I want to use, how can I modify the tesseract.py API backend for ML-assisted labeling of the example? The biggest problem I'm facing is finding the idx of the image and returning the result This is urgent, please help me :(

<View>
  <Repeater on="$document" indexFlag="{{idx}}" mode="pagination">
      <Image name="page_{{idx}}" value="$document[{{idx}}].page" inline="true"/>

    	<Labels name="labels_{{idx}}" toName="page_{{idx}}">
            <Label value="Text" background="green"/>
            <Label value="Handwriting" background="blue"/>
      	</Labels>
    
      <Rectangle name="bbox_{{idx}}" smart="true" toName="page_{{idx}}" strokeWidth="3"/>
		
      <TextArea name="transcription_{{idx}}" toName="page_{{idx}}"
            editable="true"
            perRegion="true"
            required="true"
            maxSubmissions="1"
            rows="5"
            placeholder="Recognized Text"
            displayMode="region-list"
            />

      <View style="display: flex; justify-content: center;">
        <Header value=" Rate this page:"/>
        <Rating name="rating_{{idx}}" toName="page_{{idx}}"/>
      </View>
  </Repeater>
</View>


<!-- {
"document": [
    {
      "page": "https://htx-pub.s3.amazonaws.com/demo/images/demo_stock_purchase_agreement/0001.jpg"
    },
    {
      "page": "https://htx-pub.s3.amazonaws.com/demo/images/demo_stock_purchase_agreement/0002.jpg"
    },
    {
      "page": "https://htx-pub.s3.amazonaws.com/demo/images/demo_stock_purchase_agreement/0003.jpg"
    }
  ]
} -->

quangvhhe140489 avatar Apr 26 '23 15:04 quangvhhe140489

Hi @quangvhhe140489 You need to change this line to much your config. Just check annotation created manually and make your prediction alike.

KonstantinKorotaev avatar Apr 28 '23 09:04 KonstantinKorotaev

@KonstantinKorotaev

Yes, I see, I need to change the name to "transcription_1" for page 1. However, how can I get the page I'm annotating? image

image The second thing I need to change is the img_path_url to task["data"]["document"]["page"][idx], right?

quangvhhe140489 avatar Apr 30 '23 15:04 quangvhhe140489

I'm ok with the manually creation of the annotation, but the input data to the tesseract model with me, it seems a hidden box.

I have tried multiple times, however, the idx I get from the kwargs.get("context") not stable when sometimes I got, another time, I get None :(

quangvhhe140489 avatar May 01 '23 11:05 quangvhhe140489

I have tried multiple times, however, the idx I get from the kwargs.get("context") not stable when sometimes I got, another time, I get None :(

That's weird, could you please send several examples?

KonstantinKorotaev avatar May 02 '23 14:05 KonstantinKorotaev

@quangvhhe140489 To answer your question below:

"how can I get the page I'm annotating?"

My answer:

idx = context.get('result')[-1]['to_name'].split("_")[-1]

so you get the page idx you are annotating

fdejax90 avatar Apr 17 '24 15:04 fdejax90