Quick look at JSON-RPC batching
This post follows an issue I posted in truckblocks-docker regarding rpc connections to erigon breaking, and further discussion in the Erigon discord. In an effort to overcome the issues I explored batcing requests- which is part of the JSON-RPC spec (should work in go-ethereum and Erigon). The below code is in Python and was adapted from code from https://github.com/ethstorage
Batching requests is as simple as passing an array values with the call.
To confuse things I've Im also using a function BATCHED (native to Python 3.12, imported for <3.12), which breaks up values into smaller groups to pass with the JSON-RPC call.
`
batchBlocks = 100
rpcBatchLimit = 1000
# Batch get_blocks
for i in batched(remaining_blocks, batchBlocks):
batch = [
{
"jsonrpc":"2.0",
"method":"eth_getBlockByNumber",
"params":[
block,
True
],
"id":id
}
for id, block in enumerate(i)
]
try:
blockResponse = requests.post(nodeUrl, json=batch).json()
except HTTPError as e:
print(e.response.text)
except Exception as e:
print(e)
if "error" in blockResponse:
raise Exception(transactionsResponse["error"])
# Batch transactions receipts per block
items = []
for block in blockResponse:
time = fr = to = value = gas = gasprice = blockid = txhash = input = contract_to = contract_value = status = ''
block = block['result']
blockid = int(block['number'], base=16)
time = int(block['timestamp'], base=16)
if len(block['transactions']) < 1:
cur.execute('INSERT INTO public.ethtxs(time, block) VALUES (%s, %s )',(time, blockid))
else:
items = []
transactionsResponse = []
batch = [
{
"jsonrpc":"2.0",
"method":"eth_getTransactionReceipt",
"params":[
transaction['hash']
],
"id":id
}
for id, transaction in enumerate(block['transactions'])
]
try:
for i in batched(batch, rpcBatchLimit):
transactionsResponse.extend(requests.post(nodeUrl, json=i).json())
except HTTPError as e:
print(e.response.text)
except Exception as e:
print(e)
if "error" in transactionsResponse:
raise Exception(transactionsResponse["error"])
for txNumber in range(0, len(block['transactions'])):
trans = block['transactions'][txNumber]
transReceipt = transactionsResponse[txNumber]['result']
fr = trans['from']
to = trans['to']
value = int(trans['value'], base=16)
gas = int(trans['gas'], base=16)
gasprice = int(trans['gasPrice'], base=16)
input = trans['input']
txhash = trans['hash'] # eth-storage uses .hex() here but I think that's because web3 converts to type hexbytes
status = transReceipt['status']
status = bool(transReceipt['status'])
values = [time, fr, to, value, gas, gasprice, blockid, txhash, input, status]
items.append(values)
sql = """INSERT INTO public.ethtxs(time, txfrom, txto, value, gas, gasprice, block, txhash, input, status)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
"""
cur.executemany(sql, items)
`
Assuming 100 blocks each with 100 transactions: using single calls - 10100 calls using batches as above - 101 calls Connection issues went from chronic to non-existent.
Erigon has some limits that should be loosened if you've for a local node you can smash: --rpc.batch.concurrency= (default 2) - this is the number of go co-routines a batch can use. co-routines are very light weight, and not correlated to cores etc... I initially set to number of cores and speed increased dramatically. setting it much higher at 1000 didn't see much more of an increase. --rpc.batch.limit= (default 100) - I set to 1000 without issue on a lcoal node. if you go above the limit you'll get an error in the JSON response - helpfully it includes the current limit set by Erigon.
There are so many problems with using batch requests.
- How do you deal with HTTP status codes vs Errors?
If you send 30 batch requests and one of them fails in the middle of that batch you're still gonna get an HTTP 200 - you're just going to have a random error in the middle of it. how do you track it? How do you issue a retry?
-
What if you're hitting a load balancer? How does the load balancer handle the batches? Does it break it up? Does it maintain the batch?
-
How do you deal with rate limited endpoints?
-
You have to wait for the response to all batch objects in a batch request.
If you send 30 requests in a batch and one of them takes an extra long time for whatever reason (say it's a large debug), then your entire batch has to wait for that one long call because you expect to get a batch response.
Some of these might not apply to a local node, but error handling and retrying certainly do.
There's also a bunch of limits in the node like you pointed out batch limits, concurrency limits, as well as response size limits, all of which have to be tweaked to optimize for batch requests - something most users will not have configured out of the box.
There are so many problems with using batch requests.
All good points.
I don't know why I am getting broken connetions to Erigon - but using my own indexer or TrueBlocks connections would break and never get killed. Is it a limitation or a faut? with Erigon, TCP, Docker, or the kernel? And perhaps even with batching I'm still experiencing them in some way - the above code can use 80% of my CPU resources, but right now it's using about 20% - pehaps I'm getting errors but go is handing them better.
I am trying to index 1M blocks. My first attempt was single-threaded one-at-a-time processing of blocks and receipts - this was impossiblly slow. So I turned to using multiple threads/processes - this sped things up but led to broken connections, but now I was managing a pool and monitoring processes for errors and timeouts so I could kill processes with dead connections. Then I turned to batching - so I'm managing a queue, erigon is managing a queue, I've got a new type of error to monitor for, but no more broken connections, and I'm processing blocks about 60X faster.
Regardless of the method, you are managing a queue and errors somewhere. So regarding 1 - yeah, but I think extra error handling is acceptable.
re. 3 - I think it's quite common to scale up and down in response to available resources and errors. Erigon does provide limit details in error messages. Further I asked if they could add an endpoint that returned the limits - knowing the limits from the outset would save a whole lot of code.
I am going to delay this issue until the next release which means I am going to close it but mark it with a Delayed tag. It will be re-opened.
If I understand this issue correctly it is in response to our VERY slow scraper since version 1.0.0.
We actually found a very significant bug in the code that was causing the scraper to greatly slow down (we were querying the node once for every trace which was insane!) That's been fixed which means the scraper is back to being as fast as it was before version 1.0.0.
Batching may or may not speed things up, but before we implement it, it will require significant performance testing, which it will get, just not until next release.
If I understand this issue correctly it is in response to our VERY slow scraper since version 1.0.0.
this post was in response to Trueblocks for me being not slow, but unusable, as RPC connections died and weren't killed and restarted. I experienced the same with my own scraper that attempted one-by-one rpc calls in bulk.
checking in seeing this has been reopened - currently I'm scraping with Trueblocks and not seeing any of the issues I had previously (running with channelCount = 60 on a 20 core (40 thread) CPU and Erigon is using ~1500% and Trueblocks ~600%)
If would be good to know when and where batching has been implemented - for instance 'tokens' allows multiple addresses to be passed in 'addrs', but are these then batched or queried individually?
We don't use batching anywhere in the code. We had an issue where Golang's defaults were too low for chifra and it would open too many ports, running out of available ports eventually. We fixed that in September, but maybe you didn't upgrade?
Anyways if you run into the same problem in the future (with any software) it may help to check if the number of open ports is sane with ss -tulpn (for Linux)
I'm going to close this. If anyone is interested in picking it up, please do so via a PR.