highway icon indicating copy to clipboard operation
highway copied to clipboard

arm64 lanes

Open xiaozhuai opened this issue 1 year ago • 10 comments

Hi there.

constexpr hn::ScalableTag<uint8_t> df;
constexpr hn::CappedTag<uint8_t, 1> d1;
constexpr size_t N = hn::Lanes(df);

On arm64 with neon, the N should be 32, but it's 16 when I debug, why?

BTW, here is my origin code use neon directly, what does its equivalent hwy code look like?

for (; dst_ptr <= dst_end - 8 * 4; dst_ptr += 8 * 4, src_ptr += 8 * 4) {
        // load 8 pixels
        uint8x8x4_t rgba = vld4_u8(src_ptr);

        // multiply the color by alpha, expand to 16-bit
        uint16x8_t r = vmull_u8(rgba.val[0], rgba.val[3]);
        uint16x8_t g = vmull_u8(rgba.val[1], rgba.val[3]);
        uint16x8_t b = vmull_u8(rgba.val[2], rgba.val[3]);

        // (x + 127) / 255 == (x + ((x + 128) >> 8) + 128) >> 8
        // This form is well suited to NEON:
        // vrshrq_n_u16(...,8) gives the inner (x+128)>>8,
        // vraddhn_u16() both the outer add-shift and our conversion back to 8-bit.
        rgba.val[0] = vraddhn_u16(r, vrshrq_n_u16(r, 8));
        rgba.val[1] = vraddhn_u16(g, vrshrq_n_u16(g, 8));
        rgba.val[2] = vraddhn_u16(b, vrshrq_n_u16(b, 8));
}
// process the remaining pixels
for (; dst_ptr < dst_end; dst_ptr += 4, src_ptr += 4) {
    dst_ptr[0] = (src_ptr[0] * src_ptr[3] + 127) / 255;
    dst_ptr[1] = (src_ptr[1] * src_ptr[3] + 127) / 255;
    dst_ptr[2] = (src_ptr[2] * src_ptr[3] + 127) / 255;
}

xiaozhuai avatar Jun 17 '24 05:06 xiaozhuai

https://questdb.io/docs/reference/function/window/#avg @nwoolmer I went through this document, and I guess VWMA will be implemented with the PARTITION BY clause in the same way, that's how the window will be defined for quantity and price. Correct me if I am wrong here.

siddharth0815 avatar Jun 30 '24 21:06 siddharth0815

@siddharth0815 Yes! :)

nwoolmer avatar Jul 01 '24 06:07 nwoolmer

@puzpuzpuz @nwoolmer Action items till now :

  • Create a VolumeWeightedMovingAverageFunctionFactory class which implements FunctionFactory and returns isWindow()= true which accepts 2 sigArgs - vwma(DD)
  • Added it in module-info.java so that it can be loaded in factory list and get identified by token vwma
  • Change paramCount condition for window functions in SqlCodeGenerator.generateSelectWindow() to ast.paramCount > 2

Now I'm analysing what happens in :

  • Factory instance creation
  • How windowCursor fetches average

Please tell if this is the correct way to approach the problem or is anything I'm missing here, also if you want me to explore something in addition to the ones mentioned above, please mention.

siddharth0815 avatar Jul 03 '24 21:07 siddharth0815

You may use AvgDoubleWindowFunctionFactory as the starting point. It's worth running WindowFunctionTest with debugger and going through the calls to understand better when it's initialized and how it works.

@nwoolmer if you can add more hints here, please go ahead.

puzpuzpuz avatar Jul 04 '24 11:07 puzpuzpuz

I'm hopeful that by looking at the pass methods in AvgOverPartitionFunction and derivatives, you can see where the 'meat' of the averaging is done. Those parts would be modified to handle the weighted average logic.

They are pretty complex functions so this is likely to take some time to get right!

nwoolmer avatar Jul 04 '24 11:07 nwoolmer

Hello, I'm interested on working on this! Added a pull request recently with some of the above comments in mind, it is a bit dense (trying to add support for both partition-by and order-by clauses) but order-by seems to work. Unit tests and partition-by support is still WIP but wanted to see if the work is in the right direction :) The PR is here: https://github.com/questdb/questdb/pull/4749

bzhu23 avatar Jul 05 '24 19:07 bzhu23

Hi @bzhu23,

Sorry for any confusion here, @siddharth0815 was looking at this issue too.

Would you both consider collaborating on this together as PR co-authors? I am sure that with both of you developing and testing it, we will get an even better outcome.

We are always grateful for support from the community!

If not, no worries, we have other things to work on - or I can open issues for missing window functions if you have specific interest in them.

Thanks again!

nwoolmer avatar Jul 08 '24 07:07 nwoolmer

@nwoolmer I can contribute in the same PR. :)

siddharth0815 avatar Jul 08 '24 18:07 siddharth0815

Hi @nwoolmer I went through the logic of the (avg) window function Please validate my understanding here, post that I'll proceed with coding

Now basically we have 2 types of queries here :

  1. Computation on values (of some columns) contained in rows between given offsets ROW OVER PARTITION
  2. Computation on values (of some columns) in rows containing values between given range RANGE OVER PARTITION

Most complicated case The window is bounded from the lower end by a given offset/timestamp-diff and the higher end by some offset/timestamp-diff/current row.

Case Handling In such cases, We use 2 data structures :

  1. Map: keeps track of values which will change in each iteration basis partition key (sum/avg/ring buffer offset) etc

  2. Ring buffer: To keep track of old values which may/may not be considered in current computation

  3. A Ring buffer type data structure is used where memory locations are allocated by default for each partition key

  4. Starting offsets of those memory locations are mapped to the partition key in Map for further reference

  5. Computable values of the records are put circularly, it is discarded (if out of range) and replaced as new index is calculated by modulo with bufferCapacity

Questions

  1. Wanted to know in brief about how memory allocation works for the ring buffer (There's a case where we add more memory if buffer reaches capacity)?
  2. How is the calculated average/sum at each record iteration returned to the result (as computeNext() is a void function)?

siddharth0815 avatar Jul 10 '24 21:07 siddharth0815

Hey @siddharth0815 ,

Thank you for this and your efforts so far. It looks like you've got a good understanding of the issue.

Having reviewed this part of the codebase, we'd like to refactor and document it ourselves before taking contributions. Sorry for the confusion on this, this is partially my fault for putting it on the same list as easier tasks!

nwoolmer avatar Jul 11 '24 12:07 nwoolmer

Closing this down for now, we have some alternate plans.

nwoolmer avatar Nov 15 '24 12:11 nwoolmer