Tag Insights report generator
It would be nice to be able to get various statistics about tags for the hashes I have on file (not the PTR in general). Should have some common export formats like CSV for further analysis. I don't need these statistics calculated all the time, but an on demand report generation button would be ideal.
- All tags ordered by incidence, keyed by sibling king
- Commonly coincident tags
- Coincidence of specifically named tag with other tags
- Fuzzy matched tags (likely spelling errors etc)
- A listing of all
creator:s (any namespace really but this is the most obvious immediate application) across all services but only if I have a has associated to the tag.
Wizard Screens
Tag Services
(*) All Combined local, remote and virtual
( )Selection:
[*] Local
[*] Local virtual
[ ] Remote
Hash Services
[*] Local hashes
[ ] All hashes
From
(*)All namespaces
( )Single namespace [____] (blank for global)
( ) Single namespace and global [____]
#Or do view for a list of namespaces if you are a masochist
Calculate
(*) tag-hash incidence count
( ) tag-tag coincidence count
[*] ignore direct coincidence from parent relationships
Example Algorithm
###Gather
tagDomain=COMBINED
hash_domain=LOCAL
namespace="creator"
r=set()
for ns in (namespace,''):
# [(hash_id,tag_id,namespace_id)]
rs = hashes_join_real_tags(tag_domain=tag_domain,hash_domain=hash_domain,namespace=ns)
# [(hash_id,tag_id,namespace_id)]
rsv= hashes_join_virtual_tags(tag_domain=tag_domain,hash_domain=hash_domain,namespace=ns)
# [(hash_id,tag_id,namespace_id)]
# Your tuples-like must be __hash__, and __eq__ aware to make set work properly.
# Combine the lists removing any duplicaiton
r= r.union(rs.union(rsv))
Count
d={}
for t in r:
k= t.tag #this is a ubtag and anamespace wrapped in a keyabeble object
if r.tag_id in d:
d[k]+=1
else:
d[k]=1
csvwriter.write('namespace','subtag','count')
for k in d:
csvwriter.write(k.namespace_string,k.subtag_string,d[k])
##Coincidence Just use hydrus's regular logic to convert (hash,tag_id)
for h in hashes:
for t1 in tags[h]:
for t2 in tags[h][1:]:
if t1 in d:
if t2 in d[t1];
d[t1][t2]+=1
else:
d[t1][t2]=1
else:
d[t1]={}
You get the idea for csv writing
csvwriter(t1,t2,d[t1][t2] if t2 in d[t1] else 0)
related
https://hydrus.tumblr.com/post/187016946869/heres-the-stats-from-the-previous-post-i-think
above is statistic from ptr but i want some of those and what op described on hydrus itself
https://64.media.tumblr.com/7a64fee2269d74f135d85f69424c231e/tumblr_pw98wp0xnD1qznht1o1_500.png