llnl.github.io icon indicating copy to clipboard operation
llnl.github.io copied to clipboard

import tags from github, and allow fine-grained search on them

Open sbromberger opened this issue 8 years ago • 11 comments

Tags are a great way to categorize software repos and they're already built into GitHub repos. It would be great to be able to filter on "tag:foo" (and, incidentally, "language:C", but that's probably another issue).

sbromberger avatar Jun 26 '17 23:06 sbromberger

Yeah, language is possibly a separate issue.

As far as tags, this was difficult at the time I started the site, but should be possible now that GitHub added "Topics" (aka tags aka labels) for repositories: https://github.com/blog/2309-introducing-topics

IanLee1521 avatar Jun 27 '17 03:06 IanLee1521

Repo "language" and "topics" data is now being collected. (And displayed in the Explore tab!) This information could potentially be incorporated into the search functionality.

LRWeber avatar Aug 24 '17 17:08 LRWeber

Some of this work has become with @hauten 's lead

See also: https://github.com/LLNL/llnl.github.io/tree/add-topics

IanLee1521 avatar Feb 20 '19 18:02 IanLee1521

@angfl97 Could you do some data analysis to help us understand the categories already in use?

  1. What topics are already in use by our repos, and how many repos fall into each topic?
  2. Another way to broadly categorize the repos would be based on organization (other than LLNL). How many non-llnl organizations do we have in the catalog, and how many repos are in each?
  3. I also like @sbromberger's idea. We have the data about which languages are used, what is the set of unique languages, and how many repos use each one?

gonsie avatar Jun 12 '19 21:06 gonsie

It may be worth noting that logic for answering some of these questions exists to generate our "word cloud" visualizations at the bottom of the explore page and individual repo pages.

The cloud-generator takes a list of {name: aWord, value: wordCount} objects, which is what these functions output. They may be worth a look.

https://github.com/LLNL/llnl.github.io/blob/eb89e81d36d9a06b80be1a6b55f2142e842faedc/js/explore/cloud_topics.js#L69-L95

https://github.com/LLNL/llnl.github.io/blob/eb89e81d36d9a06b80be1a6b55f2142e842faedc/js/explore/cloud_languages.js#L69-L99

LRWeber avatar Jun 13 '19 17:06 LRWeber

I made an Excel workbook with the stats @gonsie asked for.

Here is the link

angela-flores-sndk avatar Jun 13 '19 22:06 angela-flores-sndk

For those not traversing the link, these topics are mentioned in 4 or more repositories:

  • hpc
  • scientific-computing
  • cpp
  • parallel-computing
  • mpi
  • visualization
  • llnl
  • python
  • high-order
  • finite-elements
  • c-plus-plus
  • data-viz
  • computational-science
  • simulation
  • blt
  • gov

I was hoping that we'd get some topics outside of the typical "hpc" stuff, but I guess not. The language tags are sort of interesting:

Language count
shell 292
python 252
C 210
C++ 202
Makefile 174
CMake 113
HTML 85

But I'm not sure that's immediately useful. There are 13 repos using AWK... maybe digging into the lesser used languages would be cool.

What I do think is actually useful are the repos we are pulling from non-LLNL organizations. The top 5 (most repos) come from:

Some of these projects would be very cool to highlight on their own as they sort of represent a whole ecosystem of interrelated repos. These are also the places where we get the most external interaction.

gonsie avatar Jun 14 '19 23:06 gonsie

Would be awesome if more repos had topics. I'd done a couple of inventories over the last year and it's something like <10%. Maybe this can encourage PIs: Our portal (not to mention GitHub) will provide more visibility to repos that have topics.

hauten avatar Jun 15 '19 00:06 hauten

See https://github.com/LLNL/llnl.github.io/blob/new-home-page/radiuss/README.md for a list of tags on radiuss repos - will aim to use that list & the notes above as starting points for standardizing tags across other LLNL repos

hauten avatar Jun 19 '19 19:06 hauten

@hauten -- Maybe list our standard tags on https://github.com/LLNL/llnl.github.io/blob/master/about/using-github.md ?

IanLee1521 avatar Jun 24 '19 17:06 IanLee1521

Actually, for the docs, we can start the listing here: https://github.com/LLNL/llnl.github.io/tree/master/categories

IanLee1521 avatar Jul 01 '19 16:07 IanLee1521