xtuner icon indicating copy to clipboard operation
xtuner copied to clipboard

[Refactor] refactor http request error

Open YanhuiDua opened this issue 2 months ago • 0 comments

增加http server异常处理逻辑:

  1. timeout error / request error: retry
  2. client error: 跳过这条数据
  3. server error: 由controller控制计数每个worker出错的次数,超过global_batch_size * 0.1, 将该worker标记为deactivate
  4. unknown error: raise RuntimeError

YanhuiDua avatar Nov 13 '25 09:11 YanhuiDua