api/ml/{id}/train does not trigger training
Describe the bug
Per the documentation, it seems that the api/ml/{id}/train should trigger the training process on the ML backend, which should trigger the fit function of the LabelStudioMLBase model. However, when running this command, even via curl there is no response from the label-studio server nor does the fit method get triggered.
Here is the current version of the docker-compose.yml file for my project:
version: "3.8"
services:
redis:
image: redis:alpine
container_name: redis
hostname: redis
volumes:
- "./data/redis:/data"
expose:
- 6379
labeling:
container_name: labeling_container
image: heartexlabs/label-studio:v1.5.0
ports:
- 8080:8080
depends_on:
- modeling
volumes:
- ./data:/label-studio/data
environment:
- LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
- LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=/label-studio/data/media
command: >
bash -c "
label-studio start
--log-level DEBUG
--sampling prediction-score-min
--ml-backends http://modeling_container:9090"
restart: always
modeling:
container_name: modeling_container
build:
context: ./modeling
command: >
bash -c "
label-studio-ml init modeling_backend
--script tools/${MODEL:-model.py}
--force true
&&
label-studio-ml start ./modeling_backend
--port 9090
--debug "
restart: always
volumes:
- ./data/media:/data/
environment:
- MODEL_DIR=/data/models
- RQ_QUEUE_NAME=default
- REDIS_HOST=redis
- REDIS_PORT=6379
- USE_REDIS=true
ports:
- 9090:9090
depends_on:
- redis
links:
- redis
Here is my model.py file for the ML backend.
from importlib.resources import path
import torch
import torch.nn as nn
import torch.optim as optim
import time
import os
import numpy as np
import requests
import io
import hashlib
import urllib
import cv2
import pathlib
import urllib.parse as urlparse
from skimage import io, color
from PIL import Image
from torch.utils.data import Dataset, DataLoader
from torchvision import models, transforms
from label_studio_ml.model import LabelStudioMLBase
from label_studio_ml.utils import get_single_tag_keys, get_choice, is_skipped
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
import layoutparser as lp
image_cache_dir = os.path.join(os.path.dirname(__file__), 'image-cache')
os.makedirs(image_cache_dir, exist_ok=True)
def load_image_from_url(url):
# is_local_file = url.startswith('http://localhost:') and '/data/' in url
# purl = pathlib.Path(url)
pres = urlparse.urlparse(url)
if pres.scheme == '':
purl = pathlib.Path(url)
url = purl.as_uri()
im = io.imread(url)
if len(im.shape) < 3:
# needs to be converted to rgb
im = color.gray2rgb(im)
return im
def convert_block_to_value(block, image_height, image_width):
return {
"height": block.height / image_height*100,
"choices": [str(block.type)],
"rotation": 0,
"width": block.width / image_width*100,
"x": block.coordinates[0] / image_width*100,
"y": block.coordinates[1] / image_height*100,
"score": block.score
}
class ObjectDetectionAPI(LabelStudioMLBase):
def __init__(self, freeze_extractor=False, **kwargs):
super(ObjectDetectionAPI, self).__init__(**kwargs)
# label_map_list = os.environ['LABEL_MAP'].split()
# {int(label_map_list[i]): str(label_map_list[i+1]) for i in range(0, len(label_map_list), 2)}
print('parsed label config:\n ')
print(self.parsed_label_config)
self.from_name, self.to_name, self.value, self.classes =\
get_single_tag_keys(self.parsed_label_config, 'RectangleLabels', 'Image')
self.freeze_extractor = freeze_extractor
self.model = lp.Detectron2LayoutModel(
config_path = 'lp://detectron2/PrimaLayout/mask_rcnn_R_50_FPN_3x/config',
# model_path = 'https://www.dropbox.com/s/bitxe8occzb865u/model_final.pth?dl=1',
### PLEASE REMEMBER TO CHANGE `dl=0` INTO `dl=1` IN THE END
### OF DROPBOX LINKS
extra_config=["MODEL.ROI_HEADS.NMS_THRESH_TEST", 0.2,
"MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
label_map={0: "text"}
)
def reset_model(self):
# self.model = ImageClassifier(len(self.classes), self.freeze_extractor)
pass
def predict(self, tasks, **kwargs):
# print('tasks: ', tasks)
print(kwargs)
print('self.value: ', self.value)
image_urls = [task['data'][self.value] for task in tasks]
print('image urls: ', image_urls)
images = [load_image_from_url(url) for url in image_urls]
print('im sizes: ', [im.shape for im in images])
layouts = [self.model.detect(image) for image in images]
print('label config: ', self.parsed_label_config)
print('layouts: ', layouts)
predictions = []
for image, layout in zip(images, layouts):
height, width = image.shape[:2]
result = [
{
'from_name': self.from_name,
'to_name': self.to_name,
"original_height": height,
"original_width": width,
"source": "$image",
'type': 'rectanglelabels',
"value": convert_block_to_value(block, height, width),
} for block in layout
]
predictions.append({'result': result})
return predictions
def fit(self, tasks, workdir=None,
batch_size=32, num_epochs=10, **kwargs):
print("now running the fit function....")
image_urls, image_classes = [], []
# print('Collecting completions...')
# for completion in completions:
# if is_skipped(completion):
# continue
# image_urls.append(completion['data'][self.value])
# image_classes.append(get_choice(completion))
print('tasks: ', tasks)
print('image urls: ', image_urls)
print('image classes: ', image_classes)
# print('Creating dataset...')
# dataset = ImageClassifierDataset(image_urls, image_classes)
# dataloader = DataLoader(dataset, shuffle=True, batch_size=batch_size)
# print('Train model...')
# # self.reset_model()
# self.model.train(dataloader, num_epochs=num_epochs)
# print('Save model...')
# model_path = os.path.join(workdir, 'model.pt')
# self.model.save(model_path)
return {'model_path': None, 'classes': None}
Right now, there isn't much in the fit function, I just wanted to make sure it was working however nothing gets printed to the logs of the modeling_container.
To Reproduce Steps to reproduce the behavior:
- Log in to
http://localhost:8080 - Create a new project (test)
- Add data and configuration. In my case I'm using rectangular bounding boxes.
- Add the ML backend in settings. Will need to use
http://modeling_container:9090since all containers are on the same docker-compose network. - Add data/annotations
- The auto-predictions in the case do indeed work, triggering the
predictfunction specified inmodel.py - Go to
Settings->Machine Learningand clickStart Trainingon the connected ML backend -
curl -X POST http://localhost:8080/api/ml/{id}/train -H 'Authorization: Token <token>'also does nothing.
Expected behavior
Code in the fit function should trigger when the curl command is launched or "Start Training" button is clicked.
Screenshots Can provide if needed.
Environment (please complete the following information):
- OS: Ubuntu 18.04 running docker 20.10.17, build 100c701 and docker-compose v 1.29.1, build c34c88b
- Label Studio Version 1.5.0
Additional context It's entirely possible that I'm not configuring the project correctly, so please let me know.
Hi @themantalope Could you please tell me your label-studio-ml-backend version?
@KonstantinKorotaev
Thanks for getting back to me. I'm using the current version, installed via pip install git+https://github.com/heartexlabs/label-studio-ml-backend. The current version used is 1.0.7.
I should also clarify. When using the curl command in step 8, I also do not get any response from the server.
Do you have any logs from Label Studio and from ML backend?
@KonstantinKorotaev
Here are the logs after clicking the "Start Training" button 3 times.
Congratulations! ML Backend has been successfully initialized in ./modeling_backend
Now start it by using:
label-studio-ml start ./modeling_backend
* Serving Flask app "label_studio_ml.api" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: on
[2022-06-29 01:54:08,277] [WARNING] [werkzeug::_log::225] * Running on all addresses.
WARNING: This is a development server. Do not use it in a production deployment.
[2022-06-29 01:54:08,277] [INFO] [werkzeug::_log::225] * Running on http://192.168.128.2:9090/ (Press CTRL+C to quit)
[2022-06-29 01:54:08,278] [INFO] [werkzeug::_log::225] * Restarting with stat
[2022-06-29 01:54:09,476] [WARNING] [werkzeug::_log::225] * Debugger is active!
[2022-06-29 01:54:09,477] [INFO] [werkzeug::_log::225] * Debugger PIN: 133-986-258
Congratulations! ML Backend has been successfully initialized in ./modeling_backend
Now start it by using:
label-studio-ml start ./modeling_backend
* Serving Flask app "label_studio_ml.api" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: on
[2022-06-29 15:00:29,793] [WARNING] [werkzeug::_log::225] * Running on all addresses.
WARNING: This is a development server. Do not use it in a production deployment.
[2022-06-29 15:00:29,794] [INFO] [werkzeug::_log::225] * Running on http://192.168.128.2:9090/ (Press CTRL+C to quit)
[2022-06-29 15:00:29,795] [INFO] [werkzeug::_log::225] * Restarting with stat
[2022-06-29 15:00:31,030] [WARNING] [werkzeug::_log::225] * Debugger is active!
[2022-06-29 15:00:31,032] [INFO] [werkzeug::_log::225] * Debugger PIN: 206-914-877
[2022-06-29 15:17:58,455] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:17:58] "GET /health HTTP/1.1" 200 -
parsed label config:
{'label': {'type': 'RectangleLabels', 'to_name': ['image'], 'inputs': [{'type': 'Image', 'value': 'image'}], 'labels': ['text'], 'labels_attrs': {'text': {'value': 'text', 'background': '#FFA39E'}}}}
[2022-06-29 15:17:59,751] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:17:59] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:18:09,622] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:18:09,652] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:18:09,721] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:18:09,731] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:18:09,805] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:18:09,822] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:18:09,907] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:18:09,916] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:18:09,980] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:18:09,987] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:09] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:18:21,867] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:21] "POST /webhook HTTP/1.1" 201 -
[2022-06-29 15:18:21,895] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:21] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:18:21,903] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:21] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:18:23,139] [ERROR] [label_studio_ml.model::get_result::58]
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/label_studio_ml/model.py", line 56, in get_result
job_result = self.get_result_from_job_id(model_version)
File "/usr/local/lib/python3.8/site-packages/label_studio_ml/model.py", line 110, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
parsed label config:
{'label': {'type': 'RectangleLabels', 'to_name': ['image'], 'inputs': [{'type': 'Image', 'value': 'image'}], 'labels': ['text'], 'labels_attrs': {'text': {'value': 'text', 'background': '#FFA39E'}}}}
now running the fit function....
tasks: ()
image urls: []
image classes: []
[2022-06-29 15:18:36,306] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:36] "POST /webhook HTTP/1.1" 201 -
[2022-06-29 15:18:36,335] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:36] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:18:36,342] [INFO] [werkzeug::_log::225] 192.168.128.3 - - [29/Jun/2022 15:18:36] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:18:37,494] [ERROR] [label_studio_ml.model::get_result::58]
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/label_studio_ml/model.py", line 56, in get_result
job_result = self.get_result_from_job_id(model_version)
File "/usr/local/lib/python3.8/site-packages/label_studio_ml/model.py", line 110, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
parsed label config:
{'label': {'type': 'RectangleLabels', 'to_name': ['image'], 'inputs': [{'type': 'Image', 'value': 'image'}], 'labels': ['text'], 'labels_attrs': {'text': {'value': 'text', 'background': '#FFA39E'}}}}
now running the fit function....
tasks: ()
image urls: []
image classes: []
Sorry about that! I saw the error but forgot to include it in the initial description of the issue.
Also, looks like the fit function is actually getting triggered but it's not getting any tasks...
This could be due to the way I've set up the label-studio and label-studio-ml containers. I'm running all of them from a single docker-compose.yml file (specified in the problem description). When I restart the stack using docker-compose down; docker-compose up --build, the logs for the modeling_container are now showing a different output after clicking the "Start Training" button:
Congratulations! ML Backend has been successfully initialized in ./modeling_backend
Now start it by using:
label-studio-ml start ./modeling_backend
* Serving Flask app "label_studio_ml.api" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: on
[2022-06-29 15:32:33,354] [WARNING] [werkzeug::_log::225] * Running on all addresses.
WARNING: This is a development server. Do not use it in a production deployment.
[2022-06-29 15:32:33,355] [INFO] [werkzeug::_log::225] * Running on http://172.19.0.3:9090/ (Press CTRL+C to quit)
[2022-06-29 15:32:33,355] [INFO] [werkzeug::_log::225] * Restarting with stat
[2022-06-29 15:32:34,563] [WARNING] [werkzeug::_log::225] * Debugger is active!
[2022-06-29 15:32:34,564] [INFO] [werkzeug::_log::225] * Debugger PIN: 941-357-599
[2022-06-29 15:32:50,161] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:32:50] "GET /health HTTP/1.1" 200 -
parsed label config:
{'label': {'type': 'RectangleLabels', 'to_name': ['image'], 'inputs': [{'type': 'Image', 'value': 'image'}], 'labels': ['text'], 'labels_attrs': {'text': {'value': 'text', 'background': '#FFA39E'}}}}
config.yaml?dl=1: 0.00B [00:00, ?B/s]
config.yaml?dl=1: 8.19kB [00:01, 5.55kB/s]
config.yaml?dl=1: 8.19kB [00:01, 5.54kB/s]
[2022-06-29 15:32:53,543] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:32:53] "GET /health HTTP/1.1" 200 -
parsed label config:
{'label': {'type': 'RectangleLabels', 'to_name': ['image'], 'inputs': [{'type': 'Image', 'value': 'image'}], 'labels': ['text'], 'labels_attrs': {'text': {'value': 'text', 'background': '#FFA39E'}}}}
[2022-06-29 15:32:59,047] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:32:59] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:32:59,595] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:32:59] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:33:55,655] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:33:55,664] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:33:55,681] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:33:55,689] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:33:55,697] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:33:55,702] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:33:55,735] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:33:55,747] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:55] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:33:58,813] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:58] "POST /webhook HTTP/1.1" 201 -
[2022-06-29 15:33:58,843] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:58] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:33:58,853] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:33:58] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:34:19,886] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:34:19] "POST /webhook HTTP/1.1" 201 -
[2022-06-29 15:34:19,911] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:34:19] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:34:19,920] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:34:19] "POST /setup HTTP/1.1" 200 -
[2022-06-29 15:34:22,335] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:34:22] "POST /webhook HTTP/1.1" 201 -
[2022-06-29 15:34:22,361] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:34:22] "GET /health HTTP/1.1" 200 -
[2022-06-29 15:34:22,372] [INFO] [werkzeug::_log::225] 172.19.0.4 - - [29/Jun/2022 15:34:22] "POST /setup HTTP/1.1" 200 -
EDIT:
The logs here do not show the fit function getting triggered. I wonder if this is because of label-studio-ml using it's own docker-compose network?
The logs here do not show the fit function getting triggered. I wonder if this is because of label-studio-ml using it's own docker-compose network?
The log has webhook calls, please check this guide.
Also, looks like the fit function is actually getting triggered but it's not getting any tasks...
Check the guide about training with webhooks. Here is the example how you can get annotated dataset.
@KonstantinKorotaev
Thank you for the clarification. A webhook is also called when the user submits a POST to the api/ml/{id}/train endpoint. Is there a way to modify the webhook that is triggered when the user clicks the "Start Training" button, or if the submit a POST request to the api/ml/{id}/train endpoint? The options in the label studio webhook editing page are limited to events related to the creation/update/delete of tasks, annotations etc.
Is there a way to modify the webhook that is triggered when the user clicks the "Start Training" button, or if the submit a POST request to the api/ml/{id}/train endpoint?
What do you want to add there?