Fan Hong
Fan Hong
Hi, I found the program will stuck when calling `System.load` with Py4J on Windows. After simplified, a code snippet for reproduction is as follows: ``` from py4j.java_gateway import launch_gateway, JavaGateway,...
Hi, I found creating a Java array of callbacks is impossible. It seems `_set_item` in class `JavaArray` calls `get_command_part` without providing a `python_proxy_pool`. `get_command_part` then fails. The error stack is...
The auto type conversion in py4j for primitive types are very convenient when calling methods. But, when Java methods has arrays as the parameters, I have to manually create the...
## What is the purpose of the change Add Transformer and Estimator for GBTClassifier and GBTRegressor. Details about features compared to SparkML's implementation are as follows: - Implemented in this...
# ❓ Questions and Help I just found fused kernels in sequence parallel got poor performance in real model training. Here is a snapshot of nsys timeline of a ColumnParallelLinear...
# 🚀 Feature When `seqlen` is not fixed, `triton.autotune` is triggered for every unseen value of `seqlen` for tiled matmul kernel and matmul kernel. It makes the training extremely slow...