Chenghua
Chenghua
@ZizhouJia Have you successfully integrated onnx-mlir into your MLIR project? I'm writing a small inference framework that needs to take onnx-mlir and convert it to VM-bytecode to this framework. Is...
需求描述:SAM 中 Generate masks by sampling a grid over the image with this many points to a side. 如 meta 的 SAM everything demo. [ref code](https://github.com/facebookresearch/segment-anything/blob/6fdee8f2727f4506cfbbe553e23b895e27956588/segment_anything/automatic_mask_generator.py#L35)。 需求场景:产生 low-level 任务中全图不同物体的 mask。...
I have encountered the same issue. The dynamic libraries fail to load because the `pip install ./foo` command only copies the `_core.xxxxxx.pyd` file to the `site-packages/foo/` directory, without also copying...
The mllm project currently only provides C++ API. Since the front end of mllm has not yet fully stabilized, we have not prioritized the work on Python bindings. If you...
Got it, thanks for the reply!
I used the following code to test the performance of w8a8. ```python @torch.no_grad() def generate(model, tokenizer, device, prompt, max_new_tokens): inputs = tokenizer(prompt, return_tensors="pt", padding=True) start = time.time() outputs = model.generate(...
Unfortunately, after using `torch.compile`, there was not much speed improvement; the inference time went from 60 seconds to 42 seconds. It is still much slower than the model using FP16....
I mean the Inputs to ONNX Ops. Such as parameters of the convolution operator. These parameters appear to be embedded in the MLIR code? For my second question, I misunderstood...
请问您使用的设备是什么型号的高通芯片?