parse lang command returns blank results
Related to the empathy session.
Running this query:
/* Top languages by repository count */
SELECT *
FROM (SELECT language, COUNT(repository_id) AS repository_count
FROM (SELECT DISTINCT
r.repository_id,
LANGUAGE(t.tree_entry_name, b.blob_content) AS language
FROM refs r
JOIN commits c ON r.commit_hash = c.commit_hash
JOIN commit_trees ct ON c.commit_hash = ct.commit_hash
JOIN tree_entries t ON ct.tree_hash = t.tree_hash
JOIN blobs b ON t.blob_hash = b.blob_hash
WHERE r.ref_name = 'HEAD') AS q1
GROUP BY language) AS q2
ORDER BY repository_count DESC
I noticed the result returns a blank language in the second position
+-------------------+------------------+
| LANGUAGE | REPOSITORY COUNT |
+-------------------+------------------+
| Ignore List | 6 |
| | 6 |
| Text | 6 |
| Markdown | 6 |
| JSON | 6 |
| YAML | 5 |
| Dockerfile | 5 |
| INI | 5 |
| Shell | 5 |
| HTML | 5 |
| Java | 5 |
| Makefile | 4 |
| JavaScript | 4 |
| Python | 4 |
| C | 4 |
| XML | 4 |
| TOML | 3 |
| Go | 3 |
| Protocol Buffer | 3 |
| SVG | 3 |
| Groovy | 3 |
| Unix Assembly | 3 |
| Gradle | 3 |
| Batchfile | 3 |
| Java Properties | 3 |
| Ruby | 3 |
| CSS | 3 |
| SQL | 3 |
| Smarty | 3 |
| Vim script | 2 |
| CSV | 2 |
| Git Config | 2 |
| reStructuredText | 2 |
| Git Attributes | 2 |
| Perl | 2 |
| Maven POM | 2 |
| AsciiDoc | 2 |
| XSLT | 2 |
| PLSQL | 2 |
| FreeMarker | 2 |
| Java Server Pages | 2 |
| Kotlin | 2 |
| PLpgSQL | 2 |
| Less | 1 |
| HAProxy | 1 |
| PowerShell | 1 |
| R | 1 |
| Ant Build System | 1 |
| Scala | 1 |
| Roff | 1 |
| Yacc | 1 |
| RMarkdown | 1 |
| HTML+Django | 1 |
| Thrift | 1 |
| AspectJ | 1 |
| Csound | 1 |
| GAP | 1 |
| SQLPL | 1 |
| HTML+ERB | 1 |
| HiveQL | 1 |
| q | 1 |
| ANTLR | 1 |
+-------------------+------------------+
These are the list of repositories I'm using:
- github.com/srcd-/gitbase
- github.com/srcd-/gitbase-web
- github.com/bblfsh/bblfshd
- github.com/apache/spark
- github.com/spring-projects/spring-boot
- github.com/spring-projecst/spring-framework
I found that using srcd parse lang on this file and this file return nothing.
Not sure if this is a bug or not.
I think it makes sense to return an empty string in LANGUAGE() when it cannot be detected. You can always add a WHERE language <> '' if you need to filter them. What do you think @ajnavarro?
~~Yep, no lang detected is an empty string for enry, so we are returning that.~~
Edit:
We return NULL if no lang is detected by enry:
lang := enry.GetLanguage(path, blob)
if lang == "" {
return nil, nil
}
So that empty result might be a null
@carlosms is this an issue for Engine or for Gitbase?
I think it's not a bug. If anything, we could edit the query example in gitbase-web to filter out empty languages... What do you think @mcarmonaa?
I thinks is a good idea adding a filter for empty languages, it'd play also as an example/documentation for this specific case which couldn't seem obvious at a first glance.