causalml icon indicating copy to clipboard operation
causalml copied to clipboard

AttributeError: module 'causalml.inference.tree.uplift' has no attribute 'bootstrap'

Open ZhangXInFD opened this issue 4 years ago • 4 comments

Describe the bug Hi, I met the bug when fitting an UpliftRandomForestClassifier with n_jobs=-1

` joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/pickle.py", line 269, in _getattribute obj = getattr(obj, subpath) AttributeError: module 'causalml.inference.tree.uplift' has no attribute 'bootstrap'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.6/pickle.py", line 918, in save_global obj2, parent = _getattribute(module, name) File "/usr/lib/python3.6/pickle.py", line 272, in _getattribute .format(name, obj)) AttributeError: Can't get attribute 'bootstrap' on <module 'causalml.inference.tree.uplift' from '/usr/local/lib/python3.6/dist-packages/causalml/inference/tree/uplift.cpython-36m-x86_64-linux-gnu.so'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/joblib/externals/loky/backend/queues.py", line 150, in feed obj = dumps(obj, reducers=reducers) File "/usr/local/lib/python3.6/dist-packages/joblib/externals/loky/backend/reduction.py", line 247, in dumps dump(obj, buf, reducers=reducers, protocol=protocol) File "/usr/local/lib/python3.6/dist-packages/joblib/externals/loky/backend/reduction.py", line 240, in dump _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj) File "/usr/local/lib/python3.6/dist-packages/joblib/externals/cloudpickle/cloudpickle.py", line 482, in dump return Pickler.dump(self, obj) File "/usr/lib/python3.6/pickle.py", line 409, in dump self.save(obj) File "/usr/lib/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce save(state) File "/usr/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.6/pickle.py", line 821, in save_dict self._batch_setitems(obj.items()) File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems save(v) File "/usr/lib/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce save(state) File "/usr/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.6/pickle.py", line 821, in save_dict self._batch_setitems(obj.items()) File "/usr/lib/python3.6/pickle.py", line 852, in _batch_setitems save(v) File "/usr/lib/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce save(state) File "/usr/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.6/pickle.py", line 821, in save_dict self._batch_setitems(obj.items()) File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems save(v) File "/usr/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.6/pickle.py", line 781, in save_list self._batch_appends(obj) File "/usr/lib/python3.6/pickle.py", line 808, in _batch_appends save(tmp[0]) File "/usr/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.6/pickle.py", line 736, in save_tuple save(element) File "/usr/lib/python3.6/pickle.py", line 507, in save self.save_global(obj, rv) File "/usr/local/lib/python3.6/dist-packages/joblib/externals/cloudpickle/cloudpickle.py", line 875, in save_global Pickler.save_global(self, obj, name=name) File "/usr/lib/python3.6/pickle.py", line 922, in save_global (obj, module_name, name)) _pickle.PicklingError: Can't pickle <cyfunction UpliftRandomForestClassifier.bootstrap at 0x7fd2e8a6d270>: it's not found as causalml.inference.tree.uplift.bootstrap """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "main.py", line 53, in model, time = train(model=model, data=df_train, outcomecol=outcomecol, featurecols=featurecols, treatmentcol=treatmentcol, controlname=controlname, time_monitor=True) File "/home/ansible/online/operator/pluto/utils.py", line 43, in train model.fit(data[featurecols].values, data[treatmentcol].values, data[outcomecol].values) File "causalml/inference/tree/uplift.pyx", line 1325, in causalml.inference.tree.uplift.UpliftRandomForestClassifier.fit File "/usr/local/lib/python3.6/dist-packages/joblib/parallel.py", line 1017, in call self.retrieve() File "/usr/local/lib/python3.6/dist-packages/joblib/parallel.py", line 909, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/usr/local/lib/python3.6/dist-packages/joblib/_parallel_backends.py", line 562, in wrap_future_result return future.result(timeout=timeout) File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result return self.__get_result() File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result raise self._exception _pickle.PicklingError: Could not pickle the task to send it to the workers. `

To Reproduce Steps to reproduce the behavior: Initialize with model = UpliftRandomForestClassifier(control_name=controlname, max_depth=8, n_estimators=10) Fit the model with model.fit(data[featurecols].values, data[treatmentcol].values, data[outcomecol].values)

Environment (please complete the following information):

  • OS: [Ubuntu]
  • Python Version: [3.6]
  • Versions of Major Dependencies (causalml, pandas, scikit-learn, cython): [causalml==0.12.0,pandas==1.0.1, scikit-learn==0.23.2, cython==0.29.15]

ZhangXInFD avatar Jan 25 '22 07:01 ZhangXInFD

@ZhangXInFD Thanks for reaching out, it sounds like a similar issue one of our colleagues ran into. To narrow down, can you try passing in 1 or 2 for n_jobs argument? for example

model = UpliftRandomForestClassifier(control_name=controlname, max_depth=8, n_estimators=10, n_jobs=1)
model.fit(data[featurecols].values, data[treatmentcol].values, data[outcomecol].values)

also, some followup questions might be helpful:

  • What's the cpu count? import multiprocessing as mp; print(mp.cpu_count())
  • How many observations and features is your data set?
  • Did you run it with jupyter notebook

paullo0106 avatar Jan 26 '22 06:01 paullo0106

@paullo0106 I met the same bug after passing in 2 for n_jobs argument while the bug displayed when I set n_jobs=1. The bug aslo occured in my win10 platform when I tried using python setup.py install to install the package from source downloaded from github, meanwhile it works (with n_jobs=-1) well if I installed it using pip or conda.

  • The output of mp.cpu_count() is 64.
  • I generated synthetic data by using data, featurecols = make_uplift_classification(n_samples=int(2e4), treatment_name=['control', 'treatment'], n_classification_features=5, n_classification_informative=5, n_uplift_increase_dict={'treatment': 5}, n_uplift_decrease_dict={'treatment': 0}, delta_uplift_increase_dict={'treatment': 0.1} ) . The bug occured when I set n_samples from 1e3 to 2e7.
  • I met the bug when I run it in both jupyter notebook and terminal.

ZhangXInFD avatar Jan 26 '22 12:01 ZhangXInFD

Thanks for the details! Looks like it's actually a different issue that I didn't see it before. I can reproduce it when running the pytest. @jeongyoonlee do you know what we might be missing for this PicklingError?

Command that I ran: pytest -vs tests/test_uplift_trees.py --cov causalml

PicklingError output:

=================================== FAILURES ===================================
______________________ test_UpliftRandomForestClassifier _______________________
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/Users/paul.lo/opt/anaconda3/lib/python3.8/site-packages/joblib/externals/loky/backend/queues.py", line 153, in _feed
    obj_ = dumps(obj, reducers=reducers)
  File "/Users/paul.lo/opt/anaconda3/lib/python3.8/site-packages/joblib/externals/loky/backend/reduction.py", line 271, in dumps
    dump(obj, buf, reducers=reducers, protocol=protocol)
  File "/Users/paul.lo/opt/anaconda3/lib/python3.8/site-packages/joblib/externals/loky/backend/reduction.py", line 264, in dump
    _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
  File "/Users/paul.lo/opt/anaconda3/lib/python3.8/site-packages/joblib/externals/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
_pickle.PicklingError: Can't pickle <cyfunction UpliftRandomForestClassifier.bootstrap at 0x7ff632059ba0>: attribute lookup bootstrap on causalml.inference.tree.uplift failed
"""

The above exception was the direct cause of the following exception:

generate_classification_data = <function generate_classification_data.<locals>._generate_data at 0x7ff6320889d0>

    def test_UpliftRandomForestClassifier(generate_classification_data):
        df, x_names = generate_classification_data()
        df_train, df_test = train_test_split(df,
                                             test_size=0.2,
                                             random_state=RANDOM_SEED)

        # Train the UpLift Random Forest classifier
        uplift_model = UpliftRandomForestClassifier(
            min_samples_leaf=50,
            control_name=TREATMENT_NAMES[0],
            random_state=RANDOM_SEED
        )

>       uplift_model.fit(df_train[x_names].values,
                         treatment=df_train['treatment_group_key'].values,
                         y=df_train[CONVERSION].values)

tests/test_uplift_trees.py:31:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
causalml/inference/tree/uplift.pyx:1325: in causalml.inference.tree.uplift.UpliftRandomForestClassifier.fit
    (delayed(self.bootstrap)(X, treatment, y, tree) for tree in self.uplift_forest)
../../opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py:1054: in __call__
    self.retrieve()
../../opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py:933: in retrieve
    self._output.extend(job.get(timeout=self.timeout))
../../opt/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py:542: in wrap_future_result
    return future.result(timeout=timeout)
../../opt/anaconda3/lib/python3.8/concurrent/futures/_base.py:439: in result
    return self.__get_result()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Future at 0x7ff6209ec4f0 state=finished raised PicklingError>

    def __get_result(self):
        if self._exception:
>           raise self._exception
E           _pickle.PicklingError: Could not pickle the task to send it to the workers.

../../opt/anaconda3/lib/python3.8/concurrent/futures/_base.py:388: PicklingError


=========================== short test summary info ============================
FAILED tests/test_uplift_trees.py::test_UpliftRandomForestClassifier - _pickl...
=================== 1 failed, 3 passed, 2 warnings in 2.25s ====================

paullo0106 avatar Jan 26 '22 21:01 paullo0106

Thanks @paullo0106 and @ZhangXInFD. BTW, I couldn't reproduce the issue with Python 3.8 on Ubuntu. @ZhangXInFD, could you test it with Python 3.7 or above? @paullo0106 could you share what environment you're using?

jeongyoonlee avatar Feb 11 '22 18:02 jeongyoonlee

can't reproduce the issue for now

paullo0106 avatar Aug 19 '23 02:08 paullo0106