tensorboard icon indicating copy to clipboard operation
tensorboard copied to clipboard

Read record error

Open scrungus opened this issue 3 years ago • 5 comments

Diagnostics

Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version e43767ef2b648d0d5d57c00f38ccbd38390e38da

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='garlick', release='5.4.0-89-generic', version='#100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: None

--- check: installed_packages
INFO: installed: tensorboard==2.7.0
INFO: installed: tensorflow==2.6.0
INFO: installed: tensorflow-estimator==2.6.0
INFO: installed: tensorboard-data-server==0.6.0

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.7.0'

--- check: tensorflow_python_version
INFO: tensorflow.__version__: '2.6.0'
INFO: tensorflow.__git_version__: 'v2.6.0-rc2-32-g919f693420e'

--- check: tensorboard_data_server_version
INFO: data server binary: '/usr/local/lib/python3.8/dist-packages/tensorboard_data_server/bin/server'
INFO: data server binary version: b'rustboard 0.6.0'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/usr/local/bin/tensorboard\n'

--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]

--- check: readable_fqdn
INFO: socket.getfqdn(): 'garlick'

--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=60167861, st_dev=2114, st_nlink=2, st_uid=1251190, st_gid=665, st_size=4096, st_atime=1638393678, st_mtime=1645831882, st_ctime=1645831882)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/usr/local/lib/python3.8/dist-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.12.0
aiohttp==3.7.4.post0
appdirs==1.4.4
argcomplete==1.12.3
argon2-cffi==20.1.0
arviz==0.11.2
astunparse==1.6.3
async-generator==1.10
async-timeout==3.0.1
attrs==19.3.0
Automat==0.8.0
backcall==0.2.0
bleach==3.3.0
blinker==1.4
blis==0.7.4
bokeh==2.4.1
boto==2.49.0
cachetools==4.2.1
catalogue==2.0.6
certifi==2019.11.28
cffi==1.14.5
cftime==1.4.1
chardet==3.0.4
clang==5.0
click==8.0.3
clikit==0.6.2
cloud-init==21.3
cloudpickle==1.6.0
colorama==0.4.3
command-not-found==0.3
configobj==5.0.6
constantly==15.1.0
crashtest==0.3.1
crcmod==1.7
cryptography==2.8
cupshelpers==1.0
cycler==0.10.0
cymem==2.0.5
Cython==0.29.24
dbus-python==1.2.16
decorator==5.0.7
defer==1.0.6
defusedxml==0.7.1
dill==0.3.3
distro==1.4.0
distro-info===0.23ubuntu1
dm-tree==0.1.6
edward==1.3.5
entrypoints==0.3
fail2ban==0.11.1
fasteners==0.16.3
fastprogress==1.0.0
filelock==3.3.1
flatbuffers==1.12
gast==0.4.0
gcs-oauth2-boto-plugin==2.7
gensim==4.1.2
google-apitools==0.5.32
google-auth==1.29.0
google-auth-oauthlib==0.4.4
google-pasta==0.2.0
google-reauth==0.1.1
grpcio==1.41.1
gsutil==4.64
h5py==3.1.0
httplib2==0.19.1
httpstan==4.6.1
hyperlink==19.0.0
idna==2.8
importlib-metadata==1.5.0
incremental==16.10.1
ipykernel==5.5.3
ipython==7.22.0
ipython-genutils==0.2.0
ipywidgets==7.6.3
jax==0.2.24
jaxlib==0.1.73+cuda11.cudnn82
jedi==0.18.0
Jinja2==2.10.1
joblib==1.0.1
jsonpatch==1.22
jsonpointer==2.0
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.12
jupyter-console==6.4.0
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
keras==2.6.0
Keras-Preprocessing==1.1.2
keyring==18.0.1
kiwisolver==1.3.1
language-selector==0.1
launchpadlib==1.10.13
lazr.restfulclient==0.14.2
lazr.uri==1.0.3
ldaptor==21.2.0
macaroonbakery==1.3.1
Markdown==3.3.4
MarkupSafe==1.1.0
marshmallow==3.14.0
matplotlib==3.4.3
mistune==0.8.4
mock==2.0.0
monotonic==1.6
more-itertools==4.2.0
multidict==5.2.0
murmurhash==1.0.5
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.3
nest-asyncio==1.5.1
netCDF4==1.5.6
netifaces==0.10.4
nltk==3.6.5
notebook==6.3.0
numpy==1.19.5
numpyro==0.8.0
nvidia-ml-py3==7.352.0
oauth2client==4.1.3
oauthlib==3.1.0
opt-einsum==3.3.0
packaging==20.9
pandas==1.3.4
pandocfilters==1.4.3
parso==0.8.2
passlib==1.7.4
pastel==0.2.1
pathy==0.6.1
patsy==0.5.1
pbr==5.6.0
pexpect==4.6.0
pickleshare==0.7.5
Pillow==8.4.0
pip==21.3.1
plac==1.1.3
preshed==3.0.5
prometheus-client==0.10.1
prompt-toolkit==3.0.18
protobuf==3.15.8
psutil==5.8.0
ptyprocess==0.7.0
pyasn1==0.4.2
pyasn1-modules==0.2.1
pycairo==1.16.2
pycparser==2.20
pycups==1.9.73
pydantic==1.8.2
Pygments==2.8.1
PyGObject==3.36.0
PyHamcrest==1.9.0
pyinotify==0.9.6
PyJWT==1.7.1
pylev==1.4.0
pymacaroons==0.13.0
pymc3==3.11.4
PyNaCl==1.3.0
pyOpenSSL==19.0.0
pyparsing==2.4.7
pyRFC3339==1.1
pyro-api==0.1.2
pyro-ppl==1.7.0
pyrsistent==0.15.5
pyserial==3.4
pysimdjson==3.2.0
pystan==3.3.0
python-apt==2.0.0+ubuntu0.20.4.6
python-dateutil==2.8.1
python-debian===0.1.36ubuntu1
pytz==2019.3
pyu2f==0.1.5
PyYAML==5.3.1
pyzmq==22.0.3
qtconsole==5.0.3
QtPy==1.9.0
regex==2021.10.23
requests==2.22.0
requests-oauthlib==1.3.0
requests-unixsocket==0.2.0
retry-decorator==1.1.1
rsa==4.7.2
scikit-learn==1.0.1
scipy==1.7.1
screen-resolution-extra==0.0.0
seaborn==0.11.2
SecretStorage==2.3.1
semver==2.13.0
Send2Trash==1.5.0
service-identity==18.1.0
setuptools==45.2.0
simplejson==3.16.0
six==1.15.0
smart-open==5.0.0
sos==4.1
spacy==3.1.3
spacy-legacy==3.0.8
srsly==2.4.2
ssh-import-id==5.10
systemd-python==234
tensorboard==2.7.0
tensorboard-data-server==0.6.0
tensorboard-plugin-wit==1.8.0
tensorflow==2.6.0
tensorflow-estimator==2.6.0
tensorflow-probability==0.14.1
termcolor==1.1.0
terminado==0.9.4
testpath==0.4.4
Theano-PyMC==1.1.2
thinc==8.0.12
threadpoolctl==2.1.0
torch==1.10.0
tornado==6.1
tqdm==4.60.0
traitlets==5.0.5
Twisted==18.9.0
typer==0.4.0
typing-extensions==3.7.4.3
ubuntu-advantage-tools==27.2
ufw==0.36
urllib3==1.25.8
wadllib==1.3.3
wasabi==0.8.2
wcwidth==0.2.5
webargs==8.0.1
webencodings==0.5.1
Werkzeug==1.0.1
wheel==0.37.0
widgetsnbextension==3.5.1
wrapt==1.12.1
xarray==0.17.0
xkit==0.0.0
yarl==1.7.0
zipp==1.0.0
zmq==0.0.0
zope.interface==4.7.1

Issue description

Starting tensorboard, I get : [2022-02-25T23:39:20Z WARN rustboard_core::run] Read error in log/default/version_0/events.out.tfevents.1645831401.72bfeec77920.1.0: ReadRecordError(BadLengthCrc(ChecksumError { got: MaskedCrc(0x07980329), want: MaskedCrc(0x00000000) }))

Using Pytorch Lightning. Logging is done as :

      print("LOGGING!")
        self.log("total_reward", torch.tensor(self.total_reward).to(device), on_step=True, on_epoch=True, prog_bar=True, logger=True)
        self.log("reward", torch.tensor(reward).to(device), on_step=True, on_epoch=True, prog_bar=True, logger=True)
        self.log("train_loss", loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)

Which has never thrown an error locally running on cpu in docker, but is now throwing an error running in the same container on gpu on a remote server.

scrungus avatar Feb 25 '22 23:02 scrungus

@scrunguss

Can you please share the complete standalone code to reproduce the issue from our end? Thanks!

pindinagesh avatar Feb 28 '22 08:02 pindinagesh

Hi @scrunguss,

Please make sure to share how the summaries were written (the tf.summary.* part), and note that we also have a flag to skip checksum for all records, if you trust the source, you can use it like this: --extra_data_server_flags=--no-checksum.

yatbear avatar Mar 02 '22 23:03 yatbear

when i add --extra_data_server_flags=--no-checksum, it stills happend


[2022-02-25T23:39:20Z WARN rustboard_core::run] Read error in log/default/version_0/events.out.tfevents.1645831401.72bfeec77920.1.0: ReadRecordError(BadLengthCrc(ChecksumError { got: MaskedCrc(0x07980329), want: MaskedCrc(0x00000000) }))

@yatbear

zenglw06 avatar Sep 19 '22 06:09 zenglw06

Hi,

There might be an issue with the format of the data logged. Since the data was logged using PyTorch Lighting Logging API, rather than the TensorFlow Summary API, it's out of scope for TensorBoard team, can you please file an issue under PyTorch for them to check the syntax of your PyTorch Lighting Logging code? Thanks!

yatbear avatar Sep 19 '22 15:09 yatbear

when i add --extra_data_server_flags=--no-checksum, it stills happend


[2022-02-25T23:39:20Z WARN rustboard_core::run] Read error in log/default/version_0/events.out.tfevents.1645831401.72bfeec77920.1.0: ReadRecordError(BadLengthCrc(ChecksumError { got: MaskedCrc(0x07980329), want: MaskedCrc(0x00000000) }))

@yatbear

actually, default behavior is no-checksum.

wongxinjie avatar Jan 03 '25 07:01 wongxinjie