Robert Burke comments

Results 32 comments of


                                            Robert Burke

vqsort repro by djb

![another plot similar to the previous plots, but with less anomalies](https://cdn.discordapp.com/attachments/595375439971876877/988470656490426398/Screen_Shot_2022-06-21_at_0.06.17.png) This is a Haswell "Intel(R) Xeon(R) CPU E5-1650 15 MiB L3" on which I've temporarily disabled turbo boost for...

vqsort repro by djb

Yeah. I hoped to compare the huge peak in 0.17.0 to the peak after all the changes that mention this issue, but I didn't end up with a huge peak...

vqsort repro by djb

The graph for master looks basically the same as the gold points in the above graph on the machine I used. It does seem to sort correctly now though.

Use `-march=haswell` or similar flags instead of `-march=native`

I think it makes a lot of sense to build this on the machine you intend to run it on.

Avoid unnecessary sign-extending instructions

Implementations actually affected by this patch seem to be these on my Haswell server: ``` avx2_despace_branchless(buffer, N) : base frequency 3.91 GHz speed: 10.80 GB/s -> 11.08 GB/s avx2_despace_branchless(buffer, N)...

Avoid unnecessary sign-extending instructions

Sorry, I think this needs more work to avoid doing any harm. I'll try to come back to this in a couple days.

io_uring_prep_setsockopt

> This brings up a deeper question - I've been wondering, would it be worth while (or even possible) to have a "permanent quickack" setting in the kernel? Yes, absolutely....

No mention of the SSE4.2 string operations is a bit surprising

`pcmpestri` seems quite expensive according to the latency numbers at https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html. I would be surprised if we could find a problem where it's correct to use it.

open to hererocks managing global installs as well?

After using it a bit, I find that hererocks is already fit for this purpose and it's more of a branding/convenience issue whether other users also think so.

Cannot tokenize byte sequences that are not valid UTF-8 due to design flaw

Sorry, what's the correct way to use the python bindings to use an existing vocab to encode byte-sequences? For example, the below does not work: ```py from transformers import GPT2Tokenizer...