erigon icon indicating copy to clipboard operation
erigon copied to clipboard

Erigon OOM killed on docker using a 64GB machine

Open luarx opened this issue 1 year ago • 1 comments

System information

Erigon version: v2.58.1

OS & Version: Linux

Erigon Command (with flags/config):

"--chain=mainnet",
"--db.size.limit=8TB",
"--metrics",
"--metrics.addr=0.0.0.0",
"--metrics.port=6060",
"--private.api.addr=0.0.0.0:9089",
"--pprof",
"--pprof.addr=0.0.0.0",
"--pprof.port=6061",
"--snapshots=true",
"--torrent.upload.rate=500mb",
"--torrent.download.rate=1200mb",
"--http",
"--http.api=engine,net,eth,debug,trace,txpool,web3",
"--http.addr=0.0.0.0",
"--http.port=8545",
"--http.vhosts=*",
"--http.corsdomain=*",
"--authrpc.addr=0.0.0.0",
"--authrpc.jwtsecret=/mnt/jwtsecret/jwtsecret",
"--authrpc.port=9545",
"--authrpc.vhosts=*",
"--prune=",
"--rpc.batch.limit=1000",
"--rpc.returndata.limit=500000",
"--trace.maxtraces=5000000"

Consensus Layer: Lighthouse

Consensus Layer Command (with flags/config):

"--debug-level=info",
"--datadir=/beacondata",
"--network=mainnet",
"beacon_node",
"--disable-enr-auto-update",
"--enr-address=127.0.0.1",
"--enr-tcp-port=9000",
"--enr-udp-port=9000",
"--port=9000" ,
"--discovery-port=9000",
"--eth1",
"--http",
"--http-address=0.0.0.0",
"--http-port=5052",
"--metrics",
"--metrics-address=0.0.0.0",
"--metrics-port=5054",
"--listen-address=0.0.0.0",
"--target-peers=100",
"--http-allow-sync-stalled",
"--disable-packet-filter",
"--execution-endpoint=http://localhost:9545",
"--jwt-secrets=/tmp/jwtsecret",
"--disable-deposit-contract-sync",
"--checkpoint-sync-url=https://beaconstate-mainnet.chainsafe.io"

Chain/Network: Mainnet

Expected behaviour

Use less memory, or at least do not have huge memory spikes of 175GB

Actual behaviour

image

Steps to reproduce the behaviour

Erigon memory usage has been always an issue since I run Erigon. We specified memory.request and memory.limits to run it, but it surpass them and it is restarted several times per day due to OOM I know that there are some closed issues that says that it is fixed OOM within docker so I want to know if this is a different issue or that the previous fixes didn't work as expected. Should Erigon stop before consuming memory before reaching the specified memory limits?

From my point of view, it is not practical running Erigon nodes with more than 32GB, as it is expensive and not every could do it. On the other hand, running a 64GB with 175GB memory spikes is not a good symptom...

Thanks!

luarx avatar Mar 08 '24 10:03 luarx

do you have logs before kill?

AskAlexSharov avatar Mar 08 '24 10:03 AskAlexSharov

@AskAlexSharov thanks for asking! But at least in this case, I have realised that the issues are related with Lighthouse OOM from v4.6.0 (https://github.com/sigp/lighthouse/issues/4918) and I mixed the memory consumption metrics as they run in the same machine.

I will close this issue and open a new one in case I detect other Erigon issues Thanks! 🙏

luarx avatar Mar 12 '24 22:03 luarx