AttributeError: module 'causalml.inference.tree.uplift' has no attribute 'bootstrap'
Describe the bug Hi, I met the bug when fitting an UpliftRandomForestClassifier with n_jobs=-1
` joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.6/pickle.py", line 269, in _getattribute obj = getattr(obj, subpath) AttributeError: module 'causalml.inference.tree.uplift' has no attribute 'bootstrap'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/lib/python3.6/pickle.py", line 918, in save_global obj2, parent = _getattribute(module, name) File "/usr/lib/python3.6/pickle.py", line 272, in _getattribute .format(name, obj)) AttributeError: Can't get attribute 'bootstrap' on <module 'causalml.inference.tree.uplift' from '/usr/local/lib/python3.6/dist-packages/causalml/inference/tree/uplift.cpython-36m-x86_64-linux-gnu.so'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/joblib/externals/loky/backend/queues.py", line 150, in feed obj = dumps(obj, reducers=reducers) File "/usr/local/lib/python3.6/dist-packages/joblib/externals/loky/backend/reduction.py", line 247, in dumps dump(obj, buf, reducers=reducers, protocol=protocol) File "/usr/local/lib/python3.6/dist-packages/joblib/externals/loky/backend/reduction.py", line 240, in dump _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj) File "/usr/local/lib/python3.6/dist-packages/joblib/externals/cloudpickle/cloudpickle.py", line 482, in dump return Pickler.dump(self, obj) File "/usr/lib/python3.6/pickle.py", line 409, in dump self.save(obj) File "/usr/lib/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce save(state) File "/usr/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.6/pickle.py", line 821, in save_dict self._batch_setitems(obj.items()) File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems save(v) File "/usr/lib/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce save(state) File "/usr/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.6/pickle.py", line 821, in save_dict self._batch_setitems(obj.items()) File "/usr/lib/python3.6/pickle.py", line 852, in _batch_setitems save(v) File "/usr/lib/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce save(state) File "/usr/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.6/pickle.py", line 821, in save_dict self._batch_setitems(obj.items()) File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems save(v) File "/usr/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.6/pickle.py", line 781, in save_list self._batch_appends(obj) File "/usr/lib/python3.6/pickle.py", line 808, in _batch_appends save(tmp[0]) File "/usr/lib/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python3.6/pickle.py", line 736, in save_tuple save(element) File "/usr/lib/python3.6/pickle.py", line 507, in save self.save_global(obj, rv) File "/usr/local/lib/python3.6/dist-packages/joblib/externals/cloudpickle/cloudpickle.py", line 875, in save_global Pickler.save_global(self, obj, name=name) File "/usr/lib/python3.6/pickle.py", line 922, in save_global (obj, module_name, name)) _pickle.PicklingError: Can't pickle <cyfunction UpliftRandomForestClassifier.bootstrap at 0x7fd2e8a6d270>: it's not found as causalml.inference.tree.uplift.bootstrap """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "main.py", line 53, in
To Reproduce Steps to reproduce the behavior: Initialize with model = UpliftRandomForestClassifier(control_name=controlname, max_depth=8, n_estimators=10) Fit the model with model.fit(data[featurecols].values, data[treatmentcol].values, data[outcomecol].values)
Environment (please complete the following information):
- OS: [Ubuntu]
- Python Version: [3.6]
- Versions of Major Dependencies (
causalml,pandas,scikit-learn,cython): [causalml==0.12.0,pandas==1.0.1,scikit-learn==0.23.2,cython==0.29.15]
@ZhangXInFD Thanks for reaching out, it sounds like a similar issue one of our colleagues ran into. To narrow down, can you try passing in 1 or 2 for n_jobs argument? for example
model = UpliftRandomForestClassifier(control_name=controlname, max_depth=8, n_estimators=10, n_jobs=1)
model.fit(data[featurecols].values, data[treatmentcol].values, data[outcomecol].values)
also, some followup questions might be helpful:
- What's the cpu count?
import multiprocessing as mp; print(mp.cpu_count()) - How many observations and features is your data set?
- Did you run it with jupyter notebook
@paullo0106 I met the same bug after passing in 2 for n_jobs argument while the bug displayed when I set n_jobs=1. The bug aslo occured in my win10 platform when I tried using python setup.py install to install the package from source downloaded from github, meanwhile it works (with n_jobs=-1) well if I installed it using pip or conda.
- The output of
mp.cpu_count()is 64. - I generated synthetic data by using
data, featurecols = make_uplift_classification(n_samples=int(2e4), treatment_name=['control', 'treatment'], n_classification_features=5, n_classification_informative=5, n_uplift_increase_dict={'treatment': 5}, n_uplift_decrease_dict={'treatment': 0}, delta_uplift_increase_dict={'treatment': 0.1} ). The bug occured when I set n_samples from 1e3 to 2e7. - I met the bug when I run it in both jupyter notebook and terminal.
Thanks for the details! Looks like it's actually a different issue that I didn't see it before. I can reproduce it when running the pytest. @jeongyoonlee do you know what we might be missing for this PicklingError?
Command that I ran:
pytest -vs tests/test_uplift_trees.py --cov causalml
PicklingError output:
=================================== FAILURES ===================================
______________________ test_UpliftRandomForestClassifier _______________________
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/paul.lo/opt/anaconda3/lib/python3.8/site-packages/joblib/externals/loky/backend/queues.py", line 153, in _feed
obj_ = dumps(obj, reducers=reducers)
File "/Users/paul.lo/opt/anaconda3/lib/python3.8/site-packages/joblib/externals/loky/backend/reduction.py", line 271, in dumps
dump(obj, buf, reducers=reducers, protocol=protocol)
File "/Users/paul.lo/opt/anaconda3/lib/python3.8/site-packages/joblib/externals/loky/backend/reduction.py", line 264, in dump
_LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
File "/Users/paul.lo/opt/anaconda3/lib/python3.8/site-packages/joblib/externals/cloudpickle/cloudpickle_fast.py", line 563, in dump
return Pickler.dump(self, obj)
_pickle.PicklingError: Can't pickle <cyfunction UpliftRandomForestClassifier.bootstrap at 0x7ff632059ba0>: attribute lookup bootstrap on causalml.inference.tree.uplift failed
"""
The above exception was the direct cause of the following exception:
generate_classification_data = <function generate_classification_data.<locals>._generate_data at 0x7ff6320889d0>
def test_UpliftRandomForestClassifier(generate_classification_data):
df, x_names = generate_classification_data()
df_train, df_test = train_test_split(df,
test_size=0.2,
random_state=RANDOM_SEED)
# Train the UpLift Random Forest classifier
uplift_model = UpliftRandomForestClassifier(
min_samples_leaf=50,
control_name=TREATMENT_NAMES[0],
random_state=RANDOM_SEED
)
> uplift_model.fit(df_train[x_names].values,
treatment=df_train['treatment_group_key'].values,
y=df_train[CONVERSION].values)
tests/test_uplift_trees.py:31:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
causalml/inference/tree/uplift.pyx:1325: in causalml.inference.tree.uplift.UpliftRandomForestClassifier.fit
(delayed(self.bootstrap)(X, treatment, y, tree) for tree in self.uplift_forest)
../../opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py:1054: in __call__
self.retrieve()
../../opt/anaconda3/lib/python3.8/site-packages/joblib/parallel.py:933: in retrieve
self._output.extend(job.get(timeout=self.timeout))
../../opt/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py:542: in wrap_future_result
return future.result(timeout=timeout)
../../opt/anaconda3/lib/python3.8/concurrent/futures/_base.py:439: in result
return self.__get_result()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Future at 0x7ff6209ec4f0 state=finished raised PicklingError>
def __get_result(self):
if self._exception:
> raise self._exception
E _pickle.PicklingError: Could not pickle the task to send it to the workers.
../../opt/anaconda3/lib/python3.8/concurrent/futures/_base.py:388: PicklingError
=========================== short test summary info ============================
FAILED tests/test_uplift_trees.py::test_UpliftRandomForestClassifier - _pickl...
=================== 1 failed, 3 passed, 2 warnings in 2.25s ====================
Thanks @paullo0106 and @ZhangXInFD. BTW, I couldn't reproduce the issue with Python 3.8 on Ubuntu. @ZhangXInFD, could you test it with Python 3.7 or above? @paullo0106 could you share what environment you're using?
can't reproduce the issue for now