xtuner
xtuner copied to clipboard
[Refactor] refactor http request error
增加http server异常处理逻辑:
- timeout error / request error: retry
- client error: 跳过这条数据
- server error: 由controller控制计数每个worker出错的次数,超过global_batch_size * 0.1, 将该worker标记为deactivate
- unknown error: raise RuntimeError