lbry-sdk icon indicating copy to clipboard operation
lbry-sdk copied to clipboard

DHT: improve data hosting metric

Open shyba opened this issue 3 years ago • 2 comments

this issue is considered completed when the dashboard has an automatically-updating metric for how many TB of data is available for download

Today, we listen for queries under a shard (node id prefix) and calculate the data availability from what got announced vs the amount that got claimed. This is efficient but inaccurate, because:

  • background downloader does not announce
  • not every announcement is reachable
  • claim set size changes (minor)

This issue proposes a new way using the script from #3625 like:

  • pick 2 random bytes
  • query hub for all streams starting with those 2 random bytes (should be 270-350 claims)
  • actively search them (slowly to avoid flood)
  • check results (slowly)
  • calculate reachable / total sample size as % downloadable
  • calculate total results / total sample size as % theoretical maximum
  • 2h should make it low impact while still probing 12 times a day

Not in this PR idea:

  • compare result sets from iterative find vs the script. This should give how well everything is working end-to-end.

fixes #3633

shyba avatar Jul 11 '22 17:07 shyba

pick 2 random bytes query hub for all streams starting with those 2 random bytes (should be 270-350 claims)

Are you talking about searching by stream name or stream ID?

Claim names are human-meaningful, and the distribution of characters will not be uniform. The claim IDs would be uniformly random (IIUC) hex characters.

I worry that searching by name would produce widely varying numbers of claims (or claims that are correlated in some way).

moodyjon avatar Jul 18 '22 20:07 moodyjon

Hello there,

LBRY DHT is based on Kademlia with sha384 hashing. Items are only searchable by content hash (sd_hash in a claim). This step searches the hub for sd_hashes samples. Check https://github.com/lbryio/lbry-sdk/blob/cc6cdc07f5067aa3a8e40b5421e0fd50fffbe0e7/scripts/sd_hash_sampler.py

shyba avatar Jul 27 '22 11:07 shyba