mmlspark.lightgbm._LightGBMClassifier does not exist
Describe the bug mmlspark.lightgbm._LightGBMClassifier does not exist
To Reproduce
I git cloned the repo and sys.path.append the mmlspark python path, import mmlspark has no issue, but the classifier inside can't be used
There is no clear instruction on how to install mmlspark for python.
spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
.config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1") \
.getOrCreate()
from mmlspark.lightgbm import LightGBMClassifier
model = LightGBMClassifier(learningRate=0.3,
numIterations=100,
numLeaves=31).fit(train)
spark.stop()
Expected behavior from mmlspark.lightgbm import LightGBMClassifier should work
Info (please complete the following information):
- MMLSpark Version: from latest repo
- Spark Version 2.4.4
- Spark Platform: custom platform not on azure
** Stacktrace**
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-9-1b16cbc5ea7e> in <module>
5 .config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1") \
6 .getOrCreate()
----> 7 from mmlspark.lightgbm import LightGBMClassifier
8 model = LightGBMClassifier(learningRate=0.3,
9 numIterations=100,
/mnt/user-home/git/mmlspark/src/main/python/mmlspark/lightgbm/LightGBMClassifier.py in <module>
9 basestring = str
10
---> 11 from mmlspark.lightgbm._LightGBMClassifier import _LightGBMClassifier
12 from mmlspark.lightgbm._LightGBMClassifier import _LightGBMClassificationModel
13 from pyspark import SparkContext
ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMClassifier'
Additional context I tried this on Jupyter on a Linux machine. Does this only work on Azure?
👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.
@goodwanghan sorry about the trouble you are having. You need to run the autogen to autogenerate the wrappers, _LightGBMClassifier is automatically generated from the scala API. " I tried this on Jupyter on a Linux machine. Does this only work on Azure?" LightGBM is open source and works anywhere, linux, windows, macos. The mmlspark integration can be run anywhere, Azure, Cloudera, AWS, etc. If it's not running somewhere then that's unexpected and probably a bug.
@imatiach-msft Instead of generating the wrappers, because of restrictions on my business laptop, I usually extracted the wrappers from the JAR package that I got through maven. The problem is that for version 1.0-rc1 I can't download the file (while it worked fine for version 0.18.x). See https://github.com/Azure/mmlspark/issues/715#issuecomment-544462083
Also, if I look here https://search.maven.org/search?q=g:com.microsoft.ml.spark the last version is 0.18.1. Why can't I find version 1.0-rc1?
Thank you
@candalfigomoro see the fix i posted in #715 apologize for the confusion!
facing similar issue with the LightGBMRegressor module as well. mmlspark : 0.17 pyspark : 2.4.4 Scala: 2.11.12
Looks like in version 0.17, the name has changed from mmlspark.lightgbm to specific module name repeated. It worked for me by using: from mmlspark.LightGBMRegressor import LightGBMRegressor Please update the sample in the documentation.
Also could not find vowpalWabbit module in the latest package from https://github.com/Azure/mmlspark/archive/bba5c10ff774a7541be4cde7438ba710bd51f5e6.zip Although could find it in the master : https://github.com/arijeetm1/mmlspark/blob/master/src/main/python/mmlspark/vw/VowpalWabbitRegressor.py Wondering if it's released yet? maybe in latest version 0.18 but could find it https://spark-packages.org/package/Azure/mmlspark
@goodwanghan sorry about the trouble you are having. You need to run the autogen to autogenerate the wrappers, _LightGBMClassifier is automatically generated from the scala API. " I tried this on Jupyter on a Linux machine. Does this only work on Azure?" LightGBM is open source and works anywhere, linux, windows, macos. The mmlspark integration can be run anywhere, Azure, Cloudera, AWS, etc. If it's not running somewhere then that's unexpected and probably a bug.
I am running into this issue as well but I do not understand what this means. Can you please explain step-by-step, on a linux machine with Python 3.7 Jupyter + Spark 2.4.5, how can I run the autogen?
I realized the usual method of loading with Pyspark didn't work for me because we use a company internal maven repo. To fix this, I suggest making the .jar and pom.xml files for mmlspark officially available in the releases and mention this in the README. Then i just had to manually upload it to our maven repo and it worked perfectly.
run the autogen to autogenerate the wrappers
Hi, would you please give me more details about how to autogenerate the wrappers? For example, how to do that in databricks. I follow the instructions installing the package via mave. But it has the No module named 'mmlspark.lightgbm._LightGBMRegressor' issue. Thank you
ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMRegressor'
Developers, can you help us?
Getting the same error as others on this thread. I was going through the LightGBM example here:
from mmlspark.lightgbm import LightGBMRegressor
model = LightGBMRegressor(objective='quantile', alpha=0.2, learningRate=0.3, numLeaves=31).fit(train)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-25-064a17bb600b> in <module>
----> 1 from mmlspark.lightgbm import LightGBMRegressor
2 model = LightGBMRegressor(objective='quantile', alpha=0.2, learningRate=0.3, numLeaves=31).fit(train)
/opt/conda/anaconda/lib/python3.7/site-packages/mmlspark/lightgbm/LightGBMRegressor.py in <module>
9 basestring = str
10
---> 11 from mmlspark.lightgbm._LightGBMRegressor import _LightGBMRegressor
12 from mmlspark.lightgbm._LightGBMRegressor import _LightGBMRegressionModel
13 from pyspark import SparkContext
ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMRegressor'
So is there any solutions for this problem?
Where are support? We are wait much time
Having this issue, please help.
@MarsXDM @MacJei @vinhdiesal @bkowshik not sure why you are having this issue. I need to understand more about your environment to try and reproduce the issue and diagnose it. What cluster are you running this code in - is it yarn, spark standalone, mesos, or databricks which runs its own version of spark standalone? How are you adding the package? Clearly this is actually not even getting to the scala code, there is some issue with importing the python wrappers. Somehow the python files in the maven jar are not getting registered in pyspark.
if you download the jar, you can run on linux:
jar xvf mmlspark_2.11-1.0.0-rc1.jar
this will extract the jar. Note there is one "com" folder where all of the scala code is at, and an "mmlspark" folder for all of the python code. It seems the python code is not getting registered properly on your environment. I've seen this happen with cloudera clusters before, and the workaround, as I recall, was to specify the python files as a zip file either in the pyspark command or load it via the code.
I think this was the question for the cloudera cluster: https://github.com/Azure/mmlspark/issues/311 There was also an external azure customer who had this issue and we were able to work around the problem by extracting the zip file and specifying it in the --py-files command, see relevant doc: https://spark.apache.org/docs/latest/submitting-applications.html I'm also happy to go on a call with anyone to resolve their issue. As you can probably guess it's usually something very environment/cluster specific, I mostly run code on azure databricks so I can't possibly fix all cluster configurations/cluster types out there, but contributions are definitely welcome.
So is there any solutions for this problem?
I met this issue because the import sentence (from mmlspark.lightgbm import LightGBMClassifier) is running before sparksession create...
my SparkSession object is created as follow:
ss.master('local[*]').appName("My App")
.config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1")
.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven")
.getOrCreate()
when I put the import sentence after this , no more this issue.
from mmlspark.lightgbm import LightGBMRegressor
ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMRegressor'
Please, help!
@romario076 what version of mmlspark are you running in? How are you loading the mmlspark jar and the lightgbm jar? Have you tried to specify spark.jars.packages or some other parameters? What cluster are you running in (azure databricks, cloudera, etc)? The problem seems to be with not loading the python files in the jar.
spark = SparkSession.builder
.appName("Churn Scoring LightGBM")
.master("local[4]")
.config("spark.jars.packages","com.microsoft.ml.spark:mmlspark_2.11:0.18.1")
.getOrCreate()
this worked for me. Spark 2.4.5 Ubuntu, python3.6
@romario076 what version of mmlspark are you running in? How are you loading the mmlspark jar and the lightgbm jar? Have you tried to specify spark.jars.packages or some other parameters? What cluster are you running in (azure databricks, cloudera, etc)? The problem seems to be with not loading the python files in the jar.
.config("spark.jars.packages","com.microsoft.ml.spark:mmlspark_2.11:0.18.1") works but .config("spark.jars","/home/erkan/spark/mmlspark_jars/com.microsoft.ml.spark_mmlspark_2.11-0.18.1.jar") doesn't work
@erkansirin78
.config("spark.jars","/home/erkan/spark/mmlspark_jars/com.microsoft.ml.spark_mmlspark_2.11-0.18.1.jar") doesn't work
this doesn't work because you are missing the lightgbm jar, you actually need to have two jars specified there
Hello, I have the same problem `from mmlspark.lightgbm import LightGBMRegressor
ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMRegressor'`
Cluster runs on GCP Dataproc. mmlspark is installed from PIP
gcloud dataproc clusters create "models-cluster"
--num-secondary-workers=0 --num-workers=0
--region=europe-west2
--metadata 'PIP_PACKAGES=scikit-learn lightgbm google-cloud-storage PyYAML mmlspark'
--initialization-actions gs://goog-dataproc-initialization-actions-europe-west2/python/pip-install.sh
--image-version=1.4
SparkSession looks that way (configs copied from mmlspark repo)
spark = SparkSession.builder.appName('Models')
.config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc3")
.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven")
.getOrCreate()
@imatiach-msft could you please help?
@erkansirin78
.config("spark.jars","/home/erkan/spark/mmlspark_jars/com.microsoft.ml.spark_mmlspark_2.11-0.18.1.jar") doesn't workthis doesn't work because you are missing the lightgbm jar, you actually need to have two jars specified there
I think the problem is not the lightgbm jar. When I use the option spark.jars.packages, the mmlspark can auto generate and register the wrapper correctly. But when I use the option spark.jars, the mmlspark.jar can not auto generate the wrapper. Is there something we need to modify when we import the jar?
@MarsXDM , thanks for the tip. Actually, this is what worked for me below...
import pyspark
spark = pyspark.sql.SparkSession.builder.appName("MyApp")\
.config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1")\
.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven")\
.getOrCreate()
import mmlspark
Run this in a jupyter notebook after you've installed pyspark
@goodwanghan sorry about the trouble you are having. You need to run the autogen to autogenerate the wrappers, _LightGBMClassifier is automatically generated from the scala API.
Has this been resolved? Looking at the github there is no _LightGBMClassifier or _LightGBMRegressor modules, which are called in the program. I am trying to use this in Databricks and have not found a solution anywhere. Advice @imatiach-msft? Thanks
Nothing from the above works for me. Databricks, Spark 2.4.5, Python 3.7.3
ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMRegressor'
@anivlam please try this walkthrough with pictures on databricks: https://docs.microsoft.com/en-us/azure/cognitive-services/big-data/getting-started#azure-databricks for spark 2.4.5 you can use rc1 to rc3 releases. For latest spark 3.0 you will need to use a build from master:

For example:
Maven Coordinates com.microsoft.ml.spark:mmlspark_2.12:1.0.0-rc3-76-aad223e0-SNAPSHOT Maven Resolver https://mmlspark.azureedge.net/maven
Has anyone gotten this to work in Jupyter notebook? I've tried all the above spark session create statements, nothing works. Still stuck with the same error:
from mmlspark.lightgbm import LightGBMRegressor
ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMRegressor'
Using pyspark 2.4.3
Seems like in dataproc I can import lightGBM with this.
gcloud dataproc jobs submit pyspark --cluster=$CLUSTER ./main.py --properties=spark.jars.packages='com.microsoft.ml.spark:mmlspark_2.11:0.18.1'
In main.py you can import by from mmlspark.lightgbm import LightGBMClassifier
I'm not sure but with rc- I got an import error
--properties=spark.jars.packages='com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc3'
Pls don't forget add --metadata "mmlspark lightgbm" when you create a cluster.
Hope this helps someone!
@her0e1c1 I would recommend to use the rc-3 release since 0.18.1 is quite old. You are probably getting an import error because for rc releases you need to specify the maven resolver as: Maven Resolver https://mmlspark.azureedge.net/maven
You are only specifying spark.jars.packages above and you are missing "spark.jars.repositories=https://mmlspark.azureedge.net/maven" Note since rc releases we haven't been releasing to maven central anymore.
@imatiach-msft Thanks! After I changed to this, I can use the rc-3 (looks a bit tricky format, anw ...)
--properties=spark.jars.packages='com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc3',spark.jars.repositories='https://mmlspark.azureedge.net/maven'
https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties
@imatiach-msft I installed mmlspark lightgbm by pip and the issue happeded. I move the lightgbm files in mmlspark/lightgbm to site-packages/mmlspark/lightgbm, then the issue is gone. Following the guide here : https://www.pythonf.cn/read/177095