Henry Ho

Results 5 comments of Henry Ho

1. more DSPs, 2. higher Fmax, 3. better algorithm like Winograd

You only use 16x32=512 DSPs, while arria10 can have 3036 DSPs, and remaining ALMs can also use as multiplier. So if someone use all DSPs, he have 6 times faster...

you can also check DLA with Arria10 on OpenVINO the throughput is even higher than 2017”s paper And I don’t think local memory is limitation if someone have carefully optimize...

DSP u% is low may owing to high fanout that increase effort on fitter. You can use systolic array to reduce fanout