oneflow
oneflow copied to clipboard
Throw oom error
将vm oom之类的错误通过last_error的方式抛到python层。
对于如下脚本:
# filename: a.py
import oneflow as flow
print(flow.ones((1024, 1024, 1024, 1024)))
运行起来之后,不再是abort,而是抛异常:
Traceback (most recent call last):
File "/home/lixinqi/a.py", line 3, in <module>
print(flow.ones((1024, 1024, 1024, 1024)))
File "/home/lixinqi/oneflow/python/oneflow/framework/tensor.py", line 54, in _str
return self.__repr__()
File "/home/lixinqi/oneflow/python/oneflow/framework/tensor.py", line 58, in _repr
return tensor_str._gen_tensor_str(self)
File "/home/lixinqi/oneflow/python/oneflow/framework/tensor_str.py", line 365, in _gen_tensor_str
return _gen_tensor_str_template(tensor, False)
File "/home/lixinqi/oneflow/python/oneflow/framework/tensor_str.py", line 352, in _gen_tensor_str_template
tensor_str = _tensor_str(tensor, indent)
File "/home/lixinqi/oneflow/python/oneflow/framework/tensor_str.py", line 276, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/home/lixinqi/oneflow/python/oneflow/framework/tensor_str.py", line 311, in get_summarized_data
return flow.stack([get_summarized_data(x) for x in (start + end)])
File "/home/lixinqi/oneflow/python/oneflow/framework/tensor_str.py", line 311, in <listcomp>
return flow.stack([get_summarized_data(x) for x in (start + end)])
File "/home/lixinqi/oneflow/python/oneflow/framework/tensor_str.py", line 311, in get_summarized_data
return flow.stack([get_summarized_data(x) for x in (start + end)])
File "/home/lixinqi/oneflow/python/oneflow/framework/tensor_str.py", line 311, in <listcomp>
return flow.stack([get_summarized_data(x) for x in (start + end)])
File "/home/lixinqi/oneflow/python/oneflow/framework/tensor_str.py", line 311, in get_summarized_data
return flow.stack([get_summarized_data(x) for x in (start + end)])
File "/home/lixinqi/oneflow/python/oneflow/framework/tensor_str.py", line 311, in <listcomp>
return flow.stack([get_summarized_data(x) for x in (start + end)])
File "/home/lixinqi/oneflow/python/oneflow/framework/tensor_str.py", line 302, in get_summarized_data
(self[: PRINT_OPTS.edgeitems], self[-PRINT_OPTS.edgeitems :])
RuntimeError: can't allocate memory: you tried to allocate 4398046511104 bytes.
目前还有缺点,就是python的栈太长了,干扰了理解。
建浩的这个pr能更好的解决展示异常栈的问题。https://github.com/Oneflow-Inc/oneflow/pull/8937