hydrus icon indicating copy to clipboard operation
hydrus copied to clipboard

Tag Insights report generator

Open bbappserver opened this issue 5 years ago • 1 comments

It would be nice to be able to get various statistics about tags for the hashes I have on file (not the PTR in general). Should have some common export formats like CSV for further analysis. I don't need these statistics calculated all the time, but an on demand report generation button would be ideal.

  • All tags ordered by incidence, keyed by sibling king
  • Commonly coincident tags
  • Coincidence of specifically named tag with other tags
  • Fuzzy matched tags (likely spelling errors etc)
  • A listing of all creator:s (any namespace really but this is the most obvious immediate application) across all services but only if I have a has associated to the tag.

Wizard Screens

Tag Services
(*) All Combined local, remote and virtual
( )Selection:
  [*] Local
  [*] Local virtual
  [ ] Remote
Hash Services
[*] Local hashes
[ ] All hashes
From
(*)All namespaces
( )Single namespace [____] (blank for global)
( ) Single namespace and global [____] 
#Or do view for a list of namespaces if you are a masochist
Calculate
(*) tag-hash incidence count
( ) tag-tag coincidence count
  [*] ignore direct coincidence from parent relationships

Example Algorithm

###Gather

tagDomain=COMBINED
hash_domain=LOCAL
namespace="creator"

r=set()
for ns in (namespace,''):
  # [(hash_id,tag_id,namespace_id)]
  rs = hashes_join_real_tags(tag_domain=tag_domain,hash_domain=hash_domain,namespace=ns) 
  # [(hash_id,tag_id,namespace_id)]
  rsv= hashes_join_virtual_tags(tag_domain=tag_domain,hash_domain=hash_domain,namespace=ns)

  # [(hash_id,tag_id,namespace_id)]
  # Your tuples-like must be __hash__, and __eq__ aware to make set work properly.
  # Combine the lists removing any duplicaiton
  r= r.union(rs.union(rsv))

Count

d={}
for t in r:
  k= t.tag #this is a ubtag and anamespace wrapped in a keyabeble object
  if r.tag_id in d:
    d[k]+=1
  else:
   d[k]=1
csvwriter.write('namespace','subtag','count')
for k in d:
  csvwriter.write(k.namespace_string,k.subtag_string,d[k])

##Coincidence Just use hydrus's regular logic to convert (hash,tag_id)

for h in hashes:
 for t1 in tags[h]:
   for t2 in tags[h][1:]:
     if t1 in d:
       if t2 in d[t1];
         d[t1][t2]+=1
       else:
         d[t1][t2]=1
      else:
       d[t1]={}

You get the idea for csv writing

csvwriter(t1,t2,d[t1][t2] if t2 in d[t1] else 0)       

bbappserver avatar Aug 19 '20 01:08 bbappserver

related

https://hydrus.tumblr.com/post/187016946869/heres-the-stats-from-the-previous-post-i-think

above is statistic from ptr but i want some of those and what op described on hydrus itself

https://64.media.tumblr.com/7a64fee2269d74f135d85f69424c231e/tumblr_pw98wp0xnD1qznht1o1_500.png

rachmadaniHaryono avatar Aug 19 '20 01:08 rachmadaniHaryono