Make use of auto creating tables
Makes inserting into the snippet tables easier as there is no need to manually create tables along the way.
TODOs:
- [ ] Benchmark if this is worth it and check if creating new functions for each created table is nice or if depth should be checked via a table member
- [x] check at which places now
rawget()has to be used (it this nice code style?) to check if a key exists - [ ] tests? see this comment
#513 works in the same code area, I just wanted to have this ready for testing. If we merge this PR, then make adjustments to #513 and merge that PR after that, or if we simply close this PR, we will see (mainly wanted to have a place to discuss which is bundled with the rest of the project, not in some other repo)
Help wanted: Ideas how to benchmark all this? (will most probably be done with this approach but we need test scenarios to benchmark, I think)
I've got the issue, that you preallocate the tables with the 1000 index (see e.g. here). Is this really necessary (I see no reason why)? (it prevents me from nicely injecting the automagick metatable)
Oh, no that's not necessary, it's just that 1000 is the default-priority, so those would probably be created eventually.
Ah, ok. So I think we either do it the way I'm doing it right now (originally thought as a quick and dirty fix), or just remove this preallocation and rely on it being created when the snippets get inserted :shrug:
TY for moving that discussion here :+1:
or just remove this preallocation and rely on it being created when the snippets get inserted
Yeah, that's cleaner, go ahead :)
Benchmarking: I think the quickest way would be just load()ing (not lazy_load, there the snippets are only added once they're really required) all of vim-snipmate, and comparing before-after flamegraphs (at least that's what I've done so far, seems to work pretty well :D ).
I don't think we need to very much care about priority-snippets, that should only be a very small amount.
A way to test them would be just adding snippets with different priorities in a loop.
I think checking the depth is not necessary, we only ever ipairs or set a new key in the "depth=1"-tables.
Also, the tables are only accessed directly in snippet_collection, and I can't see any place where we'd actually need the rawget, so we're in luck there :)
Oh, one problem maybe: in M.get_snippets we need to return a list of snippets for some filetype.
I'm not sure if we want to return tables with (maybe) unexpected metatables, so we should probably do a shallow copy there.
Alternatively, we could specify that those tables may only be iterated via ipairs, but that's a bit unconventional :/
Small note on rawset and tab[i] regarding performance: (but doesn't concern us)
Just read on a lua discord (tbh, didn't check it on my own)
PUC Lua: rawset is ~3x slower than just setting the key, Luajit: about the same (its implemented as a fast function)
But this shouldn't be a problem as nvim uses luajit.
Small note on
rawsetandtab[i]regarding performance: (but doesn't concern us) Just read on a lua discord (tbh, didn't check it on my own)PUC Lua: rawset is ~3x slower than just setting the key, Luajit: about the same (its implemented as a fast function)
But this shouldn't be a problem as nvim uses luajit.
Ouh, good to know, I would've expected it to be faster, if anything.. Also, just so you're aware, and I've just recently learned this myself, nvim might fall back to regular lua if luajit is not available, so there are some platforms where PUC lua is used. But I'll assume that's a diminishingly small number of users, so your point about not worrying about it stands IMO
They are not the same, in metatables it's safer to use rawset as it bypasses the metatable.
Yeah, we'll have to careful around those tables from now on, eg. should have some tests that cover a good portion of that code. I think we don't have any tests for get_snippets.
(Just throwing it out there, let's wait with adding those until the groundwork is actually done :D)
Alright, implementing should be done by now so far. Next step should be the benchmarking. L3MON4D3 once suggested doing so as described here.
Tried this, but the jit.p module couldn't be found. Anything to install before being able to used this?
Note: I'm using
$ nvim --version
NVIM v0.7.2
Build type: Release
LuaJIT 2.1.0-beta3
Compiled by builduser
Features: +acl +iconv +tui
See ":help feature-compile"
system vimrc file: "$VIM/sysinit.vim"
fall-back for $VIM: "/usr/share/nvim"
(so the requirement luajit 2.1 stated in the comment) should be fulfilled)
Sorry I have to ask, but I also cannot find a real documentation on the jit.p module (would be interesting tough to understand the passed arguments (only found some source code on this, but no documentation/reference)). @leiserfg, maybe you can help with this.
And on what to benchmark, as far as I understood L3MON4D3, loading (no lazy-load) the snippets from tests/data/snipmate-snippets/ served well so far, right?
Sorry I have to ask, but I also cannot find a real documentation on the jit.p module (would be interesting tough to understand the passed arguments (only found some source code on this, but no documentation/reference)).
There's some doc here, but I'm stumped as to why it doesn't work on your machine o.O Maybe LUA_(C)PATH again?
And on what to benchmark, as far as I understood L3MON4D3, loading (no lazy-load) the snippets from tests/data/snipmate-snippets/ served well so far, right?
Ah, no, I meant the snippets provided by vim-snippets, that should be a good number of add_snippets-calls (at least one per file, with quite some snippets each).
If your luajit for a packaging reason does not comes with jit.p, you can just copy it from here, it's just a set of wrappers to jit.profile.
Maybe LUA_(C)PATH again?
Yes this was the problem (again). Thanks for the reminder (this gets a bit annoying, I should think about some nice way to do this automatically).
Ah, no, I meant the snippets provided by vim-snippets, that should be a good number of add_snippets-calls (at least one per file, with quite some snippets each).
Ah nice yes this works. I get these two graphs for the current master branch and this one (also attached the data files).
To be honest I never worked with flamegraphs before and while just reading one graph, I've troubles comparing them properly.
Generated via
require("jit.p").start("10,i1,s,m0,G", "/tmp/output.log")
require("luasnip.loaders.from_snipmate").load{
paths = "~/coding/vim-snippets/snippets"
}
require'jit.p'.stop()
I just had a look at this talk (of the author of the flamegraph tooling) and it's amazing, you can view the svg in the browser and you'll be able to hover over the bars (click and zoom) and get more information.
Turns out that the flamegraphs I generated are hard to compare, since there are many one sample bars (some functions may be fast enough so that they only appear by chance, generating "noise").
Best would be if we could get a higher resolution, generating more samples. As I'm afraid this is not possible with jit.p, I guess we have to cope with having 1 sample bars which may or may not appear (so these shouldn't be weighted so strong).
Best would be if we could get a higher resolution, generating more samples. As I'm afraid this is not possible with
jit.p, I guess we have to cope with having 1 sample bars which may or may not appear (so these shouldn't be weighted so strong).
You can, just change the 10 to another number.
Ah nice, thanks (based on the docs I thought the 10 would be the stack dump depth, not the sampling rate, which already should have been on 1ms due to the i1 argument)
EDIT: Hm still the same amount of samples.
But well at least it looks as if this branch even increases performance (less samples in total, but this of course is no ideal benchmark)
EDIT2: Thought about this statement about performance once again and of course these tests do not measure the loss of performance we certainly get due to copying tables instead of just returning them in get_snippets. We only measure the startup time where we gain performance (by not having to check if the key has to be set to {} on our own, but let lua do the work figuring out when a key is accessed which does not exits up to this point in time)
Ah nice, thanks (based on the docs I thought the
10would be the stack dump depth, not the sampling rate, which already should have been on1msdue to thei1argument)
you are right, I was abusing my memory :facepalm:
Well, I think in this case it's alright to just look at the total time elapsed, it's just nice to have the flamegraphs ready, then one can directly see what takes too long.
In the two graphs you posted, IIUC, master takes 87ms, this branch 107ms, but add_snippets does not contribute to this total at all.
Is that difference consistent? (Probably not, you later said that this branch performed better, so it's probably fluctuating)
Maybe it would be best to write a benchmark that really just calls add_snippets in a loop, with large numbers of snippets. Then there would be no parse or deepcopy polluting the flamegraph.
loss of performance we certainly get due to copying tables instead of just returning them in get_snippets
yup, that's true... We'd need another benchmark for this.
Probably not, you later said that this branch performed better, so it's probably fluctuating
Ah yes, that statement was based on another graph then the one I posted. I've got 91 samples for selfPop and 96 for master. I think I'll play with the snipmate loader to return a list of the nodes to be added instead of directly adding them, so I can add them manually afterwards. (Changes just for benchmarking)
I think I'll play with the snipmate loader to return a list of the nodes to be added instead of directly adding them, so I can add them manually afterwards. (Changes just for benchmarking)
Oh yeah, good idea, that's more realistic than my suggestion :+1:
(Not sure how you're approaching this, easiest might be appending them to some global table instead of add_snippets)
I ran this with
local ss = require("luasnip.loaders.from_snipmate").load{
paths = "~/coding/vim-snippets/snippets"
}
require("jit.p").start("1,i1,s,m0,G", "/tmp/output.log")
print(os.time())
print(#ss)
for _=1,1000 do
for _,s in ipairs(ss) do
print(ls.add_snippets(unpack(s)))
end
end
print(os.time())
print("hi")
require'jit.p'.stop()
(should not be catched by caching, as it should invalidate the snippets on each iteration and insert them once again, right?)
And the changes, you can see at the selfpopulating_tables_benchmark and master_benchmark branches on my repo (global table instead of merging would be faster, you're right).
The outer loop was necessary as one run to add all 346 snippets was too fast (I guess) and I got no samples at all. Now it runs approx 10 seconds.
This gives me the following:
master.log
selfPop.log
EDIT: Seams that all in all this branch is a bit slower than the current master
This idea comes from the lua-users wiki, should we add some acknowledgement/appreciation for the authors (mentioned in the article) somewhere? (like I already said, I'm new to open source contributing and don't know exactly the code of conduct)
xD Yeah, that's true, I think it's okay to add a comment above the metatable-definition, "idea from ..."
(posted in the wrong PR, sorry)
Ahh, ss contains the arg-tuples to add_snippets? Cool :+1:
(should not be catched by caching, as it should invalidate the snippets on each iteration and insert them once again, right?)
Might be that parsing is skipped, but that doesn't affect us here, I think.
Are you sure it's just 346 snippets? I could've sworn C alone has like 100 or so..
Okay it for sure looks like the performance-difference is negligible, and dwarfed by stuff like parsing, so that's pretty good :+1:
Could you do one more test for get_snippets?
Might be that parsing is skipped, but that doesn't affect us here, I think.
parsing (as it takes place in the load function) is outside the loop, I think.
Are you sure it's just 346 snippets? I could've sworn C alone has like 100 or so..
At least that's what print(#ss) returned (I didn't doublecheck it).
Could you do one more test for
get_snippets?
Yes I'll do (later today/tomorrow I think). Any remarks on the benchmark? (a simple repeated get_snippets (is pretty far from reality and) will of course show a massive slowdown as the difference between return copy vs return is pretty obvious)
Ahh, 346 is just the number of add_snippets-calls, not the number of snippets :facepalm:
Okay that makes more sense :+1:
get_snippets is pretty much always used from cmp_luasnip. It would be interesting to see the difference of getting just all snippets of a big collection, if both are not measurable, we good :D
I don't think the slowdown will be massive tbh, even if it is, we could just cache the results and invalidate on add_snippets.
Seams that you're right. Well actually indeed we only do a shallow copy so copying takes some time (especially on if many many snippets are registered for one filetype) but its only "pointers" we have to copy.
Results of
local foo = function(...) end
require("jit.p").start("1,i1,s,m0,G", "/tmp/output.log")
print(os.time())
for _=1,100000000 do
local i = ls.get_id_snippet(10)
local fs = ls.get_snippets("c")
for _,f in ipairs(fs) do
i = ls.get_id_snippet(f.id)
end
print(#ls.get_id_snippet(fs[#fs].id))
print(#i, #fs)
foo(i, #i, fs, #fs)
end
print(os.time())
print("hi")
require'jit.p'.stop()
(Remark: get_id_snippet didn't always returned an empty table (but snippet expansion worked), I wonder is this is intended event if the ids gathered from the fs snippets are taken)
If you've got any ideas about how to benchmark otherwise (or know why getting by id doesn't work), just let me know
I don't understand what the benchmark is supposed to do :sweat_smile:
Wouldn't a simple get_snippets("<ft>") do it? (maybe repeated to actually see the differences)
(Remark: get_id_snippet didn't always returned an empty table (but snippet expansion worked), I wonder is this is intended event if the ids gathered from the fs snippets are taken)
get_id_snippet(id) should return nil (invalid id) or a snippet (valid id), not sure where empty tables are coming from :grimacing:
I don't understand what the benchmark is supposed to do sweat_smile Wouldn't a simple
get_snippets("<ft>")do it? (maybe repeated to actually see the differences)
Oh that's approximately what I'm doing (just added some stuff to try to find a snippet by_id as well).
not sure where empty tables are coming from
Hm, that's my bad, I keep forgetting that #table only works for tables with array part to check if the table is empty :see_no_evil: