Tag Insights report generator

Open bbappserver opened this issue 5 years ago • 1 comments

It would be nice to be able to get various statistics about tags for the hashes I have on file (not the PTR in general). Should have some common export formats like CSV for further analysis. I don't need these statistics calculated all the time, but an on demand report generation button would be ideal.

All tags ordered by incidence, keyed by sibling king
Commonly coincident tags
Coincidence of specifically named tag with other tags
Fuzzy matched tags (likely spelling errors etc)
A listing of all creator:s (any namespace really but this is the most obvious immediate application) across all services but only if I have a has associated to the tag.

Wizard Screens

Tag Services
(*) All Combined local, remote and virtual
( )Selection:
  [*] Local
  [*] Local virtual
  [ ] Remote

Hash Services
[*] Local hashes
[ ] All hashes

From
(*)All namespaces
( )Single namespace [____] (blank for global)
( ) Single namespace and global [____] 
#Or do view for a list of namespaces if you are a masochist

Calculate
(*) tag-hash incidence count
( ) tag-tag coincidence count
  [*] ignore direct coincidence from parent relationships

Example Algorithm

###Gather

tagDomain=COMBINED
hash_domain=LOCAL
namespace="creator"

r=set()
for ns in (namespace,''):
  # [(hash_id,tag_id,namespace_id)]
  rs = hashes_join_real_tags(tag_domain=tag_domain,hash_domain=hash_domain,namespace=ns) 
  # [(hash_id,tag_id,namespace_id)]
  rsv= hashes_join_virtual_tags(tag_domain=tag_domain,hash_domain=hash_domain,namespace=ns)

  # [(hash_id,tag_id,namespace_id)]
  # Your tuples-like must be __hash__, and __eq__ aware to make set work properly.
  # Combine the lists removing any duplicaiton
  r= r.union(rs.union(rsv))

Count

d={}
for t in r:
  k= t.tag #this is a ubtag and anamespace wrapped in a keyabeble object
  if r.tag_id in d:
    d[k]+=1
  else:
   d[k]=1

csvwriter.write('namespace','subtag','count')
for k in d:
  csvwriter.write(k.namespace_string,k.subtag_string,d[k])

##Coincidence Just use hydrus's regular logic to convert (hash,tag_id)

for h in hashes:
 for t1 in tags[h]:
   for t2 in tags[h][1:]:
     if t1 in d:
       if t2 in d[t1];
         d[t1][t2]+=1
       else:
         d[t1][t2]=1
      else:
       d[t1]={}

You get the idea for csv writing

csvwriter(t1,t2,d[t1][t2] if t2 in d[t1] else 0)

Aug 19 '20 01:08 bbappserver

https://hydrus.tumblr.com/post/187016946869/heres-the-stats-from-the-previous-post-i-think

above is statistic from ptr but i want some of those and what op described on hydrus itself

https://64.media.tumblr.com/7a64fee2269d74f135d85f69424c231e/tumblr_pw98wp0xnD1qznht1o1_500.png

Aug 19 '20 01:08 rachmadaniHaryono