Read record error
Diagnostics
Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version e43767ef2b648d0d5d57c00f38ccbd38390e38da
--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='garlick', release='5.4.0-89-generic', version='#100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021', machine='x86_64')
INFO: sys.getwindowsversion(): N/A
--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: None
--- check: installed_packages
INFO: installed: tensorboard==2.7.0
INFO: installed: tensorflow==2.6.0
INFO: installed: tensorflow-estimator==2.6.0
INFO: installed: tensorboard-data-server==0.6.0
--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.7.0'
--- check: tensorflow_python_version
INFO: tensorflow.__version__: '2.6.0'
INFO: tensorflow.__git_version__: 'v2.6.0-rc2-32-g919f693420e'
--- check: tensorboard_data_server_version
INFO: data server binary: '/usr/local/lib/python3.8/dist-packages/tensorboard_data_server/bin/server'
INFO: data server binary version: b'rustboard 0.6.0'
--- check: tensorboard_binary_path
INFO: which tensorboard: b'/usr/local/bin/tensorboard\n'
--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]
--- check: readable_fqdn
INFO: socket.getfqdn(): 'garlick'
--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=60167861, st_dev=2114, st_nlink=2, st_uid=1251190, st_gid=665, st_size=4096, st_atime=1638393678, st_mtime=1645831882, st_ctime=1645831882)
INFO: mode: 0o40777
--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/usr/local/lib/python3.8/dist-packages']; bad_roots (0): []
--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.12.0
aiohttp==3.7.4.post0
appdirs==1.4.4
argcomplete==1.12.3
argon2-cffi==20.1.0
arviz==0.11.2
astunparse==1.6.3
async-generator==1.10
async-timeout==3.0.1
attrs==19.3.0
Automat==0.8.0
backcall==0.2.0
bleach==3.3.0
blinker==1.4
blis==0.7.4
bokeh==2.4.1
boto==2.49.0
cachetools==4.2.1
catalogue==2.0.6
certifi==2019.11.28
cffi==1.14.5
cftime==1.4.1
chardet==3.0.4
clang==5.0
click==8.0.3
clikit==0.6.2
cloud-init==21.3
cloudpickle==1.6.0
colorama==0.4.3
command-not-found==0.3
configobj==5.0.6
constantly==15.1.0
crashtest==0.3.1
crcmod==1.7
cryptography==2.8
cupshelpers==1.0
cycler==0.10.0
cymem==2.0.5
Cython==0.29.24
dbus-python==1.2.16
decorator==5.0.7
defer==1.0.6
defusedxml==0.7.1
dill==0.3.3
distro==1.4.0
distro-info===0.23ubuntu1
dm-tree==0.1.6
edward==1.3.5
entrypoints==0.3
fail2ban==0.11.1
fasteners==0.16.3
fastprogress==1.0.0
filelock==3.3.1
flatbuffers==1.12
gast==0.4.0
gcs-oauth2-boto-plugin==2.7
gensim==4.1.2
google-apitools==0.5.32
google-auth==1.29.0
google-auth-oauthlib==0.4.4
google-pasta==0.2.0
google-reauth==0.1.1
grpcio==1.41.1
gsutil==4.64
h5py==3.1.0
httplib2==0.19.1
httpstan==4.6.1
hyperlink==19.0.0
idna==2.8
importlib-metadata==1.5.0
incremental==16.10.1
ipykernel==5.5.3
ipython==7.22.0
ipython-genutils==0.2.0
ipywidgets==7.6.3
jax==0.2.24
jaxlib==0.1.73+cuda11.cudnn82
jedi==0.18.0
Jinja2==2.10.1
joblib==1.0.1
jsonpatch==1.22
jsonpointer==2.0
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.12
jupyter-console==6.4.0
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
keras==2.6.0
Keras-Preprocessing==1.1.2
keyring==18.0.1
kiwisolver==1.3.1
language-selector==0.1
launchpadlib==1.10.13
lazr.restfulclient==0.14.2
lazr.uri==1.0.3
ldaptor==21.2.0
macaroonbakery==1.3.1
Markdown==3.3.4
MarkupSafe==1.1.0
marshmallow==3.14.0
matplotlib==3.4.3
mistune==0.8.4
mock==2.0.0
monotonic==1.6
more-itertools==4.2.0
multidict==5.2.0
murmurhash==1.0.5
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.3
nest-asyncio==1.5.1
netCDF4==1.5.6
netifaces==0.10.4
nltk==3.6.5
notebook==6.3.0
numpy==1.19.5
numpyro==0.8.0
nvidia-ml-py3==7.352.0
oauth2client==4.1.3
oauthlib==3.1.0
opt-einsum==3.3.0
packaging==20.9
pandas==1.3.4
pandocfilters==1.4.3
parso==0.8.2
passlib==1.7.4
pastel==0.2.1
pathy==0.6.1
patsy==0.5.1
pbr==5.6.0
pexpect==4.6.0
pickleshare==0.7.5
Pillow==8.4.0
pip==21.3.1
plac==1.1.3
preshed==3.0.5
prometheus-client==0.10.1
prompt-toolkit==3.0.18
protobuf==3.15.8
psutil==5.8.0
ptyprocess==0.7.0
pyasn1==0.4.2
pyasn1-modules==0.2.1
pycairo==1.16.2
pycparser==2.20
pycups==1.9.73
pydantic==1.8.2
Pygments==2.8.1
PyGObject==3.36.0
PyHamcrest==1.9.0
pyinotify==0.9.6
PyJWT==1.7.1
pylev==1.4.0
pymacaroons==0.13.0
pymc3==3.11.4
PyNaCl==1.3.0
pyOpenSSL==19.0.0
pyparsing==2.4.7
pyRFC3339==1.1
pyro-api==0.1.2
pyro-ppl==1.7.0
pyrsistent==0.15.5
pyserial==3.4
pysimdjson==3.2.0
pystan==3.3.0
python-apt==2.0.0+ubuntu0.20.4.6
python-dateutil==2.8.1
python-debian===0.1.36ubuntu1
pytz==2019.3
pyu2f==0.1.5
PyYAML==5.3.1
pyzmq==22.0.3
qtconsole==5.0.3
QtPy==1.9.0
regex==2021.10.23
requests==2.22.0
requests-oauthlib==1.3.0
requests-unixsocket==0.2.0
retry-decorator==1.1.1
rsa==4.7.2
scikit-learn==1.0.1
scipy==1.7.1
screen-resolution-extra==0.0.0
seaborn==0.11.2
SecretStorage==2.3.1
semver==2.13.0
Send2Trash==1.5.0
service-identity==18.1.0
setuptools==45.2.0
simplejson==3.16.0
six==1.15.0
smart-open==5.0.0
sos==4.1
spacy==3.1.3
spacy-legacy==3.0.8
srsly==2.4.2
ssh-import-id==5.10
systemd-python==234
tensorboard==2.7.0
tensorboard-data-server==0.6.0
tensorboard-plugin-wit==1.8.0
tensorflow==2.6.0
tensorflow-estimator==2.6.0
tensorflow-probability==0.14.1
termcolor==1.1.0
terminado==0.9.4
testpath==0.4.4
Theano-PyMC==1.1.2
thinc==8.0.12
threadpoolctl==2.1.0
torch==1.10.0
tornado==6.1
tqdm==4.60.0
traitlets==5.0.5
Twisted==18.9.0
typer==0.4.0
typing-extensions==3.7.4.3
ubuntu-advantage-tools==27.2
ufw==0.36
urllib3==1.25.8
wadllib==1.3.3
wasabi==0.8.2
wcwidth==0.2.5
webargs==8.0.1
webencodings==0.5.1
Werkzeug==1.0.1
wheel==0.37.0
widgetsnbextension==3.5.1
wrapt==1.12.1
xarray==0.17.0
xkit==0.0.0
yarl==1.7.0
zipp==1.0.0
zmq==0.0.0
zope.interface==4.7.1
Issue description
Starting tensorboard, I get : [2022-02-25T23:39:20Z WARN rustboard_core::run] Read error in log/default/version_0/events.out.tfevents.1645831401.72bfeec77920.1.0: ReadRecordError(BadLengthCrc(ChecksumError { got: MaskedCrc(0x07980329), want: MaskedCrc(0x00000000) }))
Using Pytorch Lightning. Logging is done as :
print("LOGGING!")
self.log("total_reward", torch.tensor(self.total_reward).to(device), on_step=True, on_epoch=True, prog_bar=True, logger=True)
self.log("reward", torch.tensor(reward).to(device), on_step=True, on_epoch=True, prog_bar=True, logger=True)
self.log("train_loss", loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
Which has never thrown an error locally running on cpu in docker, but is now throwing an error running in the same container on gpu on a remote server.
@scrunguss
Can you please share the complete standalone code to reproduce the issue from our end? Thanks!
Hi @scrunguss,
Please make sure to share how the summaries were written (the tf.summary.* part), and note that we also have a flag to skip checksum for all records, if you trust the source, you can use it like this: --extra_data_server_flags=--no-checksum.
when i add --extra_data_server_flags=--no-checksum, it stills happend
[2022-02-25T23:39:20Z WARN rustboard_core::run] Read error in log/default/version_0/events.out.tfevents.1645831401.72bfeec77920.1.0: ReadRecordError(BadLengthCrc(ChecksumError { got: MaskedCrc(0x07980329), want: MaskedCrc(0x00000000) }))
@yatbear
Hi,
There might be an issue with the format of the data logged. Since the data was logged using PyTorch Lighting Logging API, rather than the TensorFlow Summary API, it's out of scope for TensorBoard team, can you please file an issue under PyTorch for them to check the syntax of your PyTorch Lighting Logging code? Thanks!
when i add --extra_data_server_flags=--no-checksum, it stills happend
[2022-02-25T23:39:20Z WARN rustboard_core::run] Read error in log/default/version_0/events.out.tfevents.1645831401.72bfeec77920.1.0: ReadRecordError(BadLengthCrc(ChecksumError { got: MaskedCrc(0x07980329), want: MaskedCrc(0x00000000) }))@yatbear
actually, default behavior is no-checksum.