Classify importance of Data (Characters, Houses,...) by Centrality measure
Hi there,
as mentioned around and in other teams, it would be extremely useful to have a measure of importance of a character, house, and everything that can be scraped from the wiki and has a unique link.
This can be done by creating a graph of in- and outgoing links via https://www.npmjs.com/package/ngraph.centrality and then storing in separate values the in-, out-, degree and betweenness centralities. Just be careful: the centralities are graph dependent. Thus you need to think wether you want to have a mega-graph or separate graphs for characters, houses,... Both are correct, it's a matter of choice. Also: think about the package handles multiple links from A -> B
Last but not least, some other teams have already been working on these things, so @Rostlab/js_cs_sose_2016_students please raise your voice. For now we have been wrongly advising this as page rank but that's just one flavor of centrality measure.
EDIT: Centrality is: https://en.wikipedia.org/wiki/Centrality
Is this a call for help/collaboration with another team or it is an issue for Project_A?
I did something . I took the data from Guy's wiki scraper and generated some jsons from it. I then matched this with the character list provided by and ordered by score.
The code can be found here . Its not great but it works. Maybe this helps someone...
It's a call for collab if someone has done something about it. But I think I described quite extensively how you could implement this, regardless of other's people work. So :D you know... :D
Well, we have just gathered some images for the characters, we created paths for and which are most important/popular. Those images can be found here: https://github.com/Rostlab/JS16_ProjectC_Group10/tree/develop/mockup/img/persons
@AlexBeischl that's still not really what I had in mind :) but again, a start: @kordianbruck @Adiolis can you assign someone except you two or @togiberlin to take care of this one? Assign as in, this issue is assigned.
I asked already on the facebook group, who wants to do this. Till now, there are no volunteers. I am really not able to also do this. :angry: Should we really just assign some one, @sacdallago :smile:
Is this even a part of Project A?
Yea, I don't think this is in the project scope of A nor can be done till the 25. I vouch to move this issue to 'someday'
I actually think it is within the scope of A, but since this requirement showed up late in the game we can defer it to the next version.
In the mean time, please integrate the data from here https://rostlab.org/~gyachdav/awoiaf/Data/pageRank/allchars.tgz
it's not perfect but at least gives a measure of which character is more referenced than others. I believe the range is [1-300](unknown to popular) . All you want to do here is, read the first item in the array from which you pick up the page_name to identify the character and assign that character the "score" value.
We really need this popularity measure in, to be able to sort the characters by the most important ones. otherwise we run into a case where we show bunch of negligible characters on the character portal.
@gyachdav thank you. We will do that.
I have implemented the updating of the pageRanks. With every refill, update etc. of the characters the pageRanks are added.
To just add the pageRanks and do nothing more run: npm run updatePageRanks --update=characters
Then the pageRanks.json in dir data/ is added to the db.
With the --file=dir/file.json option, one can change the json file to use.
To create this json from the dir that guy provided run: npm run updatePageRanks --dir=PATHTODIR
With this the many _data files are transformed to just one json, that contains the names and scores of the characters.
With the --to=dir/file.json option, one can define to which json file the result is saved.
So, @kordianbruck, please run npm run updatePageRanks --update=characters on the public server.
but this is not done with the fancy alg I suggested above, right @Adiolis ? In that case please leave this open with the sometime milestone and no assignation :P
@Adiolis can you please run quick stats on all characters and report min,max median and mean and stddev for "pageRank". It will help with interpreting level of importantce for a character. A histogram of pageRank would also be very helpful. Thanks!
@gyachdav i used pagerank for PLOD min = 0, max = 300 i normalized the values and only around 300 characters have over 0.1 normalized rank
thanks can you post here your normalized ranking?
i'm basically interested in cases like this https://got-api.bruck.me/api/characters/Tormund where the pageRank is rather low but in the show still plays a prominent role to get tweets.
Here my normalized pagerank and 60 is not that low normalized all above 0.1 are pretty much popular pagerank_normalized_json.txt
Thanks @Hack3l!
@Adiolis see yourself as excused from this task :smile:
As mentioned before the "pagerank" is far from perfect and so characters like http://awoiaf.westeros.org/index.php/Betharios_of_Braavos have mysteriously got top rank. can you please rescan the list and half the page rank points if the character does not have an image associated with it? that will make sure that all minor characters will be placed back in their proper place.
@adiolis would you do the honor?
@kordianbruck promised to improve this one day. When this happens we would get rid of this nasty hack.
@gyachdav Sure, i can do that. Not sure, when, because i am running into two exams, but i will find some time in between my learning sessions :stuck_out_tongue: .
thanks and good luck!
On Mar 26, 2016, at 7:13 PM, Michael Legenc [email protected] wrote:
@gyachdav Sure, i can do that. Not sure, when, because i am running into two exams, but i will find some time in between my learning sessions .
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub
@gyachdav Okay, that was very easy and so i have done it right now ;)
@kordianbruck Please run npm run updatePageRanks --update=characters
How do I access this updated data?
@Hack3l once the data is updated can you also update your normalized rankings and post it here?
Yes, it whould be nice if u notify me once it is updated.
@kajo404 there should be API calls for this
I know I can get the page rank from A but our system works with the normalized page ranks from @Hack3l so an update of that would be nice
Data updated!
Cool now I am only waiting for @Hack3l to post a like to the updated normalized ranks
Here the updated normalized ranks. pagerank_normalized_json.txt