Migrate from smacker/go-tree-sitter to the official tree-sitter/go-tree-sitter
Migrating the the official bindings might resolve the Windows errors. See also https://github.com/bazelbuild/bazel-central-registry/pull/6529#issuecomment-3534338114
Initial naive investigation:
The official go-tree-sitter also has some slightly wonky file organization that the default Gazelle code gen doesn't seem to like.
allocator.go # has "CFLAGS: -Iinclude" and "include <tree_sitter/api.h>"
include/
tree_sitter/
api.h
https://github.com/tree-sitter/go-tree-sitter/blob/c9492002f76ed75037e3fe6d3bbabb54ed3e1ff5/allocator.go#L4
When this go module is used as a dep, Gazelle generates this target by default:
# ${OUTPUT_BASE}/external/gazelle++go_deps+com_github_tree_sitter_go_tree_sitter/BUILD.bazel
go_library(
name = "go-tree-sitter",
srcs = [
"allocator.c",
"allocator.go",
"allocator.h",
"dup_unix.go",
"dup_windows.go",
"edit.go",
"language.go",
"logger.go",
"lookahead_iterator.go",
"node.go",
"parser.go",
"point.go",
"query.go",
"ranges.go",
"tree.go",
"tree_cursor.go",
"tree_sitter.go",
],
cgo = True,
copts = ["-Iinclude -Isrc -std=c11 -D_POSIX_C_SOURCE=200112L -D_DEFAULT_SOURCE"],
importpath = "github.com/tree-sitter/go-tree-sitter",
visibility = ["//visibility:public"],
deps = ["@com_github_mattn_go_pointer//:go-pointer"],
)
We see that Gazelle doesn't include any reference to that include directory.
One experiment to try is go_deps.module_override and some patches, but that won't be suitable for merging because patches can only be applied on the root Bazel module.
Another path forward might be to get go-tree-sitter and the tree-sitter-python grammer added to BCR, as that would allow us to use custom, correct BUILD.bazel files.
TODO: there may be something in https://github.com/zadlg/tree-sitter-bazel ... That project is on BCR and exposes "@tree-sitter-bazel//:tree-sitter: the tree-sitter C API" which is the api.h file we need...
Get tree sitter onto bcr / use existing bcr module
This sounds pretty promising.
The gazelle generated build file is wonky
This is just an annoyance, right? We can hand-write the necessary build file still?
We can hand-write the necessary build file still?
I've been playing with that a bit and thus far been unsuccessful. It think it'll need not just go_library but also cc_library targets, and I have zero experience with the latter.
That said, I believe it's a solvable problem and that having a variety of targets is no big deal if it's a BCR module.
Because it has cgo = 1 and copts has -Iinclude, is the problem that srcs is missing include/tree_sitter/api.h ?
Otherwise, the naive definition for that api.h file is probably:
cc_library(
name = "whatever",
srcs = glob(include=["include/**/*.h"]),
includes = ["include"]
)
I thought (was hoping) it was something as simple as adding include/tree_sitter/api.h to the generated target srcs, but that resulted in
_main/external/gazelle++go_deps+com_github_tree_sitter_go_tree_sitter/allocator.go:5:10: fatal error: tree_sitter/api.h: No such file or directory
which doesn't have the include/ dir.
Do you have bandwidth to take over the BCR-ing of go-tree-sitter and tree-sitter-python? Both will be needed to migrate to the official stuff. I also can't guarantee that it'll fix the Windows issue...
I tried some basic approaches, using the Gazelle-generated BUILD.bazel file as a starting point. You can see what I was doing in https://github.com/dougthor42/go-tree-sitter-not-smacker/tree/bazel
Adding include/tree_sitter/api.h to the target's srcs got things a little farther along. Then I had to start adding things from the src/ directory and eventually just added everything with a glob.
I'm now at:
$ bazel build //:go-tree-sitter
INFO: Analyzed target //:go-tree-sitter (0 packages loaded, 0 targets configured).
ERROR: /usr/local/google/home/dthor/dev/go-tree-sitter/BUILD.bazel:3:11: GoCompilePkg go-tree-sitter.a failed: (Exit 1): builder failed: error executing GoCompilePkg command (from target //:go-tree-sitter) bazel-out/k8-opt-exec-ST-d57f47055a04/bin/external/rules_go++go_sdk+go_default_sdk/builder_reset/builder compilepkg -sdk external/rules_go++go_sdk+go_default_sdk -goroot ... (remaining 173 arguments skipped)
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
In file included from src/./lexer.c:3,
from src/lib.c:4,
from tree_sitter.go:6:
src/././unicode.h: In function 'ts_decode_utf16_le':
src/././unicode.h:18:9: error: implicit declaration of function 'le16toh' [-Wimplicit-function-declaration]
18 | (c)=le16toh((s)[(i)++]); \
| ^~~~~~~
src/././unicode.h:57:3: note: in expansion of macro 'U16_NEXT_LE'
57 | U16_NEXT_LE(((uint16_t *)string), i, length, *code_point);
| ^~~~~~~~~~~
src/././unicode.h: In function 'ts_decode_utf16_be':
src/././unicode.h:29:9: error: implicit declaration of function 'be16toh' [-Wimplicit-function-declaration]
29 | (c)=be16toh((s)[(i)++]); \
| ^~~~~~~
src/././unicode.h:67:3: note: in expansion of macro 'U16_NEXT_BE'
67 | U16_NEXT_BE(((uint16_t *)string), i, length, *code_point);
| ^~~~~~~~~~~
compilepkg: error running subcommand /usr/bin/gcc: exit status 1
Target //:go-tree-sitter failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 3.284s, Critical Path: 3.00s
INFO: 2 processes: 2 internal.
ERROR: Build did NOT complete successfully
Try adding --copt=-Wno-implicit-function-declaration that will tell the compiler to not treat the warning as an error
That solves some problems, but we still get some duplicate function definitions (I guess I forgot to mention those in the previous comments...)
I threw Gemini at the problem and it was able to get things somewhat working by making a separate cc_target for the go_library's cdeps. Branch: https://github.com/dougthor42/go-tree-sitter-not-smacker/tree/bazel-gemini
But now I think we're running into issues with tests, which I think are pretty big blockers: in order to fully test go-tree-sitter, we must have BCR modules for all of the various language parsers tree_sitter_(go|python|ruby|rust|...). Bazel will correctly find and download those go repos by looking at go.mod, but Gazelle can't generate the correct BUILD file for those downloaded repos. Even if we made BCR modules for all these, maintaining them is not something I can commit to.
Once again I think we're coming back to "bazel-gazelle isn't capable enough to autogenerate targets for the tree-sitter repos", which is the same issue we had with bumping smacker/go-tree-sitter.
Given that we (rules_python) are only interested in (a) the python parser and (b) the go bindings, we may be able to get away is only making two BCR modules: go-tree-sitter and tree-sitter-python.
However, I haven't tested that far yet and the go-tree-sitter BCR module would be very misleading to other people who may expect it to be able to parse other languages.