import tags from github, and allow fine-grained search on them
Tags are a great way to categorize software repos and they're already built into GitHub repos. It would be great to be able to filter on "tag:foo" (and, incidentally, "language:C", but that's probably another issue).
Yeah, language is possibly a separate issue.
As far as tags, this was difficult at the time I started the site, but should be possible now that GitHub added "Topics" (aka tags aka labels) for repositories: https://github.com/blog/2309-introducing-topics
Repo "language" and "topics" data is now being collected. (And displayed in the Explore tab!) This information could potentially be incorporated into the search functionality.
Some of this work has become with @hauten 's lead
See also: https://github.com/LLNL/llnl.github.io/tree/add-topics
@angfl97 Could you do some data analysis to help us understand the categories already in use?
- What topics are already in use by our repos, and how many repos fall into each topic?
- Another way to broadly categorize the repos would be based on organization (other than LLNL). How many non-llnl organizations do we have in the catalog, and how many repos are in each?
- I also like @sbromberger's idea. We have the data about which languages are used, what is the set of unique languages, and how many repos use each one?
It may be worth noting that logic for answering some of these questions exists to generate our "word cloud" visualizations at the bottom of the explore page and individual repo pages.
The cloud-generator takes a list of {name: aWord, value: wordCount} objects, which is what these functions output. They may be worth a look.
https://github.com/LLNL/llnl.github.io/blob/eb89e81d36d9a06b80be1a6b55f2142e842faedc/js/explore/cloud_topics.js#L69-L95
https://github.com/LLNL/llnl.github.io/blob/eb89e81d36d9a06b80be1a6b55f2142e842faedc/js/explore/cloud_languages.js#L69-L99
For those not traversing the link, these topics are mentioned in 4 or more repositories:
- hpc
- scientific-computing
- cpp
- parallel-computing
- mpi
- visualization
- llnl
- python
- high-order
- finite-elements
- c-plus-plus
- data-viz
- computational-science
- simulation
- blt
- gov
I was hoping that we'd get some topics outside of the typical "hpc" stuff, but I guess not. The language tags are sort of interesting:
| Language | count |
|---|---|
| shell | 292 |
| python | 252 |
| C | 210 |
| C++ | 202 |
| Makefile | 174 |
| CMake | 113 |
| HTML | 85 |
But I'm not sure that's immediately useful. There are 13 repos using AWK... maybe digging into the lesser used languages would be cool.
What I do think is actually useful are the repos we are pulling from non-LLNL organizations. The top 5 (most repos) come from:
Some of these projects would be very cool to highlight on their own as they sort of represent a whole ecosystem of interrelated repos. These are also the places where we get the most external interaction.
Would be awesome if more repos had topics. I'd done a couple of inventories over the last year and it's something like <10%. Maybe this can encourage PIs: Our portal (not to mention GitHub) will provide more visibility to repos that have topics.
See https://github.com/LLNL/llnl.github.io/blob/new-home-page/radiuss/README.md for a list of tags on radiuss repos - will aim to use that list & the notes above as starting points for standardizing tags across other LLNL repos
@hauten -- Maybe list our standard tags on https://github.com/LLNL/llnl.github.io/blob/master/about/using-github.md ?
Actually, for the docs, we can start the listing here: https://github.com/LLNL/llnl.github.io/tree/master/categories