SynapseML icon indicating copy to clipboard operation
SynapseML copied to clipboard

mmlspark.lightgbm._LightGBMClassifier does not exist

Open goodwanghan opened this issue 6 years ago • 34 comments

Describe the bug mmlspark.lightgbm._LightGBMClassifier does not exist

To Reproduce I git cloned the repo and sys.path.append the mmlspark python path, import mmlspark has no issue, but the classifier inside can't be used

There is no clear instruction on how to install mmlspark for python.

spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
            .config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1") \
            .getOrCreate()
from mmlspark.lightgbm import LightGBMClassifier
model = LightGBMClassifier(learningRate=0.3,
                           numIterations=100,
                           numLeaves=31).fit(train)
spark.stop()

Expected behavior from mmlspark.lightgbm import LightGBMClassifier should work

Info (please complete the following information):

  • MMLSpark Version: from latest repo
  • Spark Version 2.4.4
  • Spark Platform: custom platform not on azure

** Stacktrace**

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-9-1b16cbc5ea7e> in <module>
      5             .config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1") \
      6             .getOrCreate()
----> 7 from mmlspark.lightgbm import LightGBMClassifier
      8 model = LightGBMClassifier(learningRate=0.3,
      9                            numIterations=100,

/mnt/user-home/git/mmlspark/src/main/python/mmlspark/lightgbm/LightGBMClassifier.py in <module>
      9     basestring = str
     10 
---> 11 from mmlspark.lightgbm._LightGBMClassifier import _LightGBMClassifier
     12 from mmlspark.lightgbm._LightGBMClassifier import _LightGBMClassificationModel
     13 from pyspark import SparkContext

ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMClassifier'

Additional context I tried this on Jupyter on a Linux machine. Does this only work on Azure?

goodwanghan avatar Oct 17 '19 23:10 goodwanghan

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

welcome[bot] avatar Oct 17 '19 23:10 welcome[bot]

@goodwanghan sorry about the trouble you are having. You need to run the autogen to autogenerate the wrappers, _LightGBMClassifier is automatically generated from the scala API. " I tried this on Jupyter on a Linux machine. Does this only work on Azure?" LightGBM is open source and works anywhere, linux, windows, macos. The mmlspark integration can be run anywhere, Azure, Cloudera, AWS, etc. If it's not running somewhere then that's unexpected and probably a bug.

imatiach-msft avatar Oct 21 '19 04:10 imatiach-msft

@imatiach-msft Instead of generating the wrappers, because of restrictions on my business laptop, I usually extracted the wrappers from the JAR package that I got through maven. The problem is that for version 1.0-rc1 I can't download the file (while it worked fine for version 0.18.x). See https://github.com/Azure/mmlspark/issues/715#issuecomment-544462083

Also, if I look here https://search.maven.org/search?q=g:com.microsoft.ml.spark the last version is 0.18.1. Why can't I find version 1.0-rc1?

Thank you

candalfigomoro avatar Oct 22 '19 07:10 candalfigomoro

@candalfigomoro see the fix i posted in #715 apologize for the confusion!

mhamilton723 avatar Oct 26 '19 03:10 mhamilton723

facing similar issue with the LightGBMRegressor module as well. mmlspark : 0.17 pyspark : 2.4.4 Scala: 2.11.12

arijeetm1 avatar Feb 04 '20 19:02 arijeetm1

Looks like in version 0.17, the name has changed from mmlspark.lightgbm to specific module name repeated. It worked for me by using: from mmlspark.LightGBMRegressor import LightGBMRegressor Please update the sample in the documentation.

Also could not find vowpalWabbit module in the latest package from https://github.com/Azure/mmlspark/archive/bba5c10ff774a7541be4cde7438ba710bd51f5e6.zip Although could find it in the master : https://github.com/arijeetm1/mmlspark/blob/master/src/main/python/mmlspark/vw/VowpalWabbitRegressor.py Wondering if it's released yet? maybe in latest version 0.18 but could find it https://spark-packages.org/package/Azure/mmlspark

arijeetm1 avatar Feb 04 '20 23:02 arijeetm1

@goodwanghan sorry about the trouble you are having. You need to run the autogen to autogenerate the wrappers, _LightGBMClassifier is automatically generated from the scala API. " I tried this on Jupyter on a Linux machine. Does this only work on Azure?" LightGBM is open source and works anywhere, linux, windows, macos. The mmlspark integration can be run anywhere, Azure, Cloudera, AWS, etc. If it's not running somewhere then that's unexpected and probably a bug.

I am running into this issue as well but I do not understand what this means. Can you please explain step-by-step, on a linux machine with Python 3.7 Jupyter + Spark 2.4.5, how can I run the autogen?

surajiyer avatar Mar 20 '20 00:03 surajiyer

I realized the usual method of loading with Pyspark didn't work for me because we use a company internal maven repo. To fix this, I suggest making the .jar and pom.xml files for mmlspark officially available in the releases and mention this in the README. Then i just had to manually upload it to our maven repo and it worked perfectly.

surajiyer avatar Mar 20 '20 11:03 surajiyer

run the autogen to autogenerate the wrappers

Hi, would you please give me more details about how to autogenerate the wrappers? For example, how to do that in databricks. I follow the instructions installing the package via mave. But it has the No module named 'mmlspark.lightgbm._LightGBMRegressor' issue. Thank you

goalkeeperfyc avatar May 02 '20 17:05 goalkeeperfyc

ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMRegressor'

Developers, can you help us?

MacJei avatar Jul 07 '20 09:07 MacJei

Getting the same error as others on this thread. I was going through the LightGBM example here:

from mmlspark.lightgbm import LightGBMRegressor
model = LightGBMRegressor(objective='quantile', alpha=0.2, learningRate=0.3, numLeaves=31).fit(train)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-25-064a17bb600b> in <module>
----> 1 from mmlspark.lightgbm import LightGBMRegressor
      2 model = LightGBMRegressor(objective='quantile', alpha=0.2, learningRate=0.3, numLeaves=31).fit(train)

/opt/conda/anaconda/lib/python3.7/site-packages/mmlspark/lightgbm/LightGBMRegressor.py in <module>
      9     basestring = str
     10 
---> 11 from mmlspark.lightgbm._LightGBMRegressor import _LightGBMRegressor
     12 from mmlspark.lightgbm._LightGBMRegressor import _LightGBMRegressionModel
     13 from pyspark import SparkContext

ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMRegressor'

bkowshik avatar Jul 09 '20 13:07 bkowshik

So is there any solutions for this problem?

MarsXDM avatar Jul 27 '20 08:07 MarsXDM

Where are support? We are wait much time

MacJei avatar Jul 27 '20 09:07 MacJei

Having this issue, please help.

ghost avatar Jul 27 '20 10:07 ghost

@MarsXDM @MacJei @vinhdiesal @bkowshik not sure why you are having this issue. I need to understand more about your environment to try and reproduce the issue and diagnose it. What cluster are you running this code in - is it yarn, spark standalone, mesos, or databricks which runs its own version of spark standalone? How are you adding the package? Clearly this is actually not even getting to the scala code, there is some issue with importing the python wrappers. Somehow the python files in the maven jar are not getting registered in pyspark.

imatiach-msft avatar Jul 27 '20 14:07 imatiach-msft

if you download the jar, you can run on linux:

jar xvf mmlspark_2.11-1.0.0-rc1.jar

this will extract the jar. Note there is one "com" folder where all of the scala code is at, and an "mmlspark" folder for all of the python code. It seems the python code is not getting registered properly on your environment. I've seen this happen with cloudera clusters before, and the workaround, as I recall, was to specify the python files as a zip file either in the pyspark command or load it via the code.

imatiach-msft avatar Jul 27 '20 15:07 imatiach-msft

I think this was the question for the cloudera cluster: https://github.com/Azure/mmlspark/issues/311 There was also an external azure customer who had this issue and we were able to work around the problem by extracting the zip file and specifying it in the --py-files command, see relevant doc: https://spark.apache.org/docs/latest/submitting-applications.html I'm also happy to go on a call with anyone to resolve their issue. As you can probably guess it's usually something very environment/cluster specific, I mostly run code on azure databricks so I can't possibly fix all cluster configurations/cluster types out there, but contributions are definitely welcome.

imatiach-msft avatar Jul 27 '20 15:07 imatiach-msft

So is there any solutions for this problem?

I met this issue because the import sentence (from mmlspark.lightgbm import LightGBMClassifier) is running before sparksession create... my SparkSession object is created as follow: ss.master('local[*]').appName("My App")
.config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1")
.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven")
.getOrCreate()

when I put the import sentence after this , no more this issue.

MarsXDM avatar Jul 28 '20 07:07 MarsXDM

from mmlspark.lightgbm import LightGBMRegressor

ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMRegressor'

Please, help!

romario076 avatar Aug 11 '20 14:08 romario076

@romario076 what version of mmlspark are you running in? How are you loading the mmlspark jar and the lightgbm jar? Have you tried to specify spark.jars.packages or some other parameters? What cluster are you running in (azure databricks, cloudera, etc)? The problem seems to be with not loading the python files in the jar.

imatiach-msft avatar Aug 12 '20 15:08 imatiach-msft

spark = SparkSession.builder
.appName("Churn Scoring LightGBM")
.master("local[4]")
.config("spark.jars.packages","com.microsoft.ml.spark:mmlspark_2.11:0.18.1")
.getOrCreate()

this worked for me. Spark 2.4.5 Ubuntu, python3.6

erkansirin78 avatar Aug 18 '20 15:08 erkansirin78

@romario076 what version of mmlspark are you running in? How are you loading the mmlspark jar and the lightgbm jar? Have you tried to specify spark.jars.packages or some other parameters? What cluster are you running in (azure databricks, cloudera, etc)? The problem seems to be with not loading the python files in the jar.

.config("spark.jars.packages","com.microsoft.ml.spark:mmlspark_2.11:0.18.1") works but .config("spark.jars","/home/erkan/spark/mmlspark_jars/com.microsoft.ml.spark_mmlspark_2.11-0.18.1.jar") doesn't work

erkansirin78 avatar Aug 18 '20 15:08 erkansirin78

@erkansirin78 .config("spark.jars","/home/erkan/spark/mmlspark_jars/com.microsoft.ml.spark_mmlspark_2.11-0.18.1.jar") doesn't work this doesn't work because you are missing the lightgbm jar, you actually need to have two jars specified there

imatiach-msft avatar Aug 24 '20 15:08 imatiach-msft

Hello, I have the same problem `from mmlspark.lightgbm import LightGBMRegressor

ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMRegressor'`

Cluster runs on GCP Dataproc. mmlspark is installed from PIP

gcloud dataproc clusters create "models-cluster"
--num-secondary-workers=0 --num-workers=0
--region=europe-west2
--metadata 'PIP_PACKAGES=scikit-learn lightgbm google-cloud-storage PyYAML mmlspark'
--initialization-actions gs://goog-dataproc-initialization-actions-europe-west2/python/pip-install.sh
--image-version=1.4

SparkSession looks that way (configs copied from mmlspark repo)

spark = SparkSession.builder.appName('Models')
.config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc3")
.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven")
.getOrCreate()

@imatiach-msft could you please help?

AdamJedz avatar Nov 10 '20 09:11 AdamJedz

@erkansirin78 .config("spark.jars","/home/erkan/spark/mmlspark_jars/com.microsoft.ml.spark_mmlspark_2.11-0.18.1.jar") doesn't work this doesn't work because you are missing the lightgbm jar, you actually need to have two jars specified there

I think the problem is not the lightgbm jar. When I use the option spark.jars.packages, the mmlspark can auto generate and register the wrapper correctly. But when I use the option spark.jars, the mmlspark.jar can not auto generate the wrapper. Is there something we need to modify when we import the jar?

Ereebay avatar Dec 17 '20 18:12 Ereebay

@MarsXDM , thanks for the tip. Actually, this is what worked for me below...

import pyspark
spark = pyspark.sql.SparkSession.builder.appName("MyApp")\
    .config("spark.jars.packages", "com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1")\
    .config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven")\
    .getOrCreate()
import mmlspark

Run this in a jupyter notebook after you've installed pyspark

Lawrence-Krukrubo avatar Jan 17 '21 15:01 Lawrence-Krukrubo

@goodwanghan sorry about the trouble you are having. You need to run the autogen to autogenerate the wrappers, _LightGBMClassifier is automatically generated from the scala API.

Has this been resolved? Looking at the github there is no _LightGBMClassifier or _LightGBMRegressor modules, which are called in the program. I am trying to use this in Databricks and have not found a solution anywhere. Advice @imatiach-msft? Thanks

lvassor avatar Apr 01 '21 08:04 lvassor

Nothing from the above works for me. Databricks, Spark 2.4.5, Python 3.7.3

ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMRegressor'

anivlam avatar May 03 '21 16:05 anivlam

@anivlam please try this walkthrough with pictures on databricks: https://docs.microsoft.com/en-us/azure/cognitive-services/big-data/getting-started#azure-databricks for spark 2.4.5 you can use rc1 to rc3 releases. For latest spark 3.0 you will need to use a build from master:

image

For example:

Maven Coordinates com.microsoft.ml.spark:mmlspark_2.12:1.0.0-rc3-76-aad223e0-SNAPSHOT Maven Resolver https://mmlspark.azureedge.net/maven

imatiach-msft avatar May 03 '21 16:05 imatiach-msft

Has anyone gotten this to work in Jupyter notebook? I've tried all the above spark session create statements, nothing works. Still stuck with the same error:

from mmlspark.lightgbm import LightGBMRegressor ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMRegressor'

Using pyspark 2.4.3

kellyulrich avatar May 04 '21 18:05 kellyulrich

Seems like in dataproc I can import lightGBM with this.

gcloud dataproc jobs submit pyspark --cluster=$CLUSTER ./main.py  --properties=spark.jars.packages='com.microsoft.ml.spark:mmlspark_2.11:0.18.1'

In main.py you can import by from mmlspark.lightgbm import LightGBMClassifier

I'm not sure but with rc- I got an import error

--properties=spark.jars.packages='com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc3'

Pls don't forget add --metadata "mmlspark lightgbm" when you create a cluster.

Hope this helps someone!

her0e1c1 avatar May 10 '21 08:05 her0e1c1

@her0e1c1 I would recommend to use the rc-3 release since 0.18.1 is quite old. You are probably getting an import error because for rc releases you need to specify the maven resolver as: Maven Resolver https://mmlspark.azureedge.net/maven

You are only specifying spark.jars.packages above and you are missing "spark.jars.repositories=https://mmlspark.azureedge.net/maven" Note since rc releases we haven't been releasing to maven central anymore.

imatiach-msft avatar May 10 '21 14:05 imatiach-msft

@imatiach-msft Thanks! After I changed to this, I can use the rc-3 (looks a bit tricky format, anw ...)

--properties=spark.jars.packages='com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc3',spark.jars.repositories='https://mmlspark.azureedge.net/maven'

https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties

her0e1c1 avatar May 11 '21 10:05 her0e1c1

@imatiach-msft I installed mmlspark lightgbm by pip and the issue happeded. I move the lightgbm files in mmlspark/lightgbm to site-packages/mmlspark/lightgbm, then the issue is gone. Following the guide here : https://www.pythonf.cn/read/177095

chenxianwang avatar Sep 14 '21 01:09 chenxianwang