evidently Failed to calculate profile file from command line

Hi!

I've trying to calculate profile file from command line in Jupyter Notebook (from cell), but it return next error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.9/site-packages/evidently/__main__.py", line 183, in <module>
    parsed.handler(**parsed.__dict__)
  File "/opt/conda/lib/python3.9/site-packages/evidently/__main__.py", line 114, in calculate_profile
    sampling = __get_not_none(opts_data, "sampling", {})
  File "/opt/conda/lib/python3.9/site-packages/evidently/__main__.py", line 53, in __get_not_none
    return default if src.get(key, None) is None else src.get(key)
AttributeError: 'str' object has no attribute 'get'

My code:

import json

config = {
  "data_format":{
    "separator":",",
    "header":False,
    "date_column":None
  },
  "column_mapping":{},
  "dashboard_tabs":["regression_perfomance"],
  "pretty_print":True
}

json_string = json.dumps(config)

with open('config.json', 'w') as outfile:
    json.dump(json_string, outfile)

!python -m evidently calculate profile --config config.json --reference reference.csv --current current.csv --output reports --report_name profile.json

Feb 02 '22 08:02 jenoOvchi

UPDATE

I'm trying to add "sampling" section to config.json, but error still there:

import json

config = {
  "data_format":{
    "separator":",",
    "header":False,
    "date_column":None
  },
  "column_mapping":{},
  "dashboard_tabs":["regression_perfomance"],
  "pretty_print":True,
  "sampling": {
      "reference": {
      "type": "none"
    },
      "current": {
      "type": "nth",
      "n": 2
    }
  }
}

json_string = json.dumps(config)

with open('config.json', 'w') as outfile:
    json.dump(json_string, outfile)

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.9/site-packages/evidently/__main__.py", line 183, in <module>
    parsed.handler(**parsed.__dict__)
  File "/opt/conda/lib/python3.9/site-packages/evidently/__main__.py", line 114, in calculate_profile
    sampling = __get_not_none(opts_data, "sampling", {})
  File "/opt/conda/lib/python3.9/site-packages/evidently/__main__.py", line 53, in __get_not_none
    return default if src.get(key, None) is None else src.get(key)
AttributeError: 'str' object has no attribute 'get'

Feb 02 '22 09:02 jenoOvchi

Hi @jenoOvchi, In one of the recent updates, we changed the structure of config.json, and dashboard_tabs should be a dictionary.

You can see an example of config.json in the repo (https://github.com/evidentlyai/evidently/blob/main/config.json).

Feb 02 '22 09:02 Liraim

Hi @jenoOvchi, In one of the recent updates, we changed the structure of config.json, and dashboard_tabs should be a dictionary.

You can see an example of config.json in the repo (https://github.com/evidentlyai/evidently/blob/main/config.json).

For the clean test I've tryed this config.json as is, but error still there:

import json

config = {
  "data_format": {
    "separator": ",",
    "header": True,
    "date_column": "dteday"
  },
  "column_mapping" : {},
  "dashboard_tabs": {
    "data_drift": {
    },
    "cat_target_drift":{
      "verbose_level": 0
    }
  },
  "options": {
    "data_drift": {
      "confidence": 0.95,
      "drift_share": 0.5,
      "nbinsx": null,
      "xbins": null
    }
  },
  "pretty_print": True,
  "sampling": {
    "reference": {
      "type": "none",
      "n": 1,
      "ratio": 0.1
    },
    "current": {
      "type": "nth",
      "n": 2,
      "ratio": 0.1
    }
  }
}

json_string = json.dumps(config)

with open('config.json', 'w') as outfile:
    json.dump(json_string, outfile)

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.9/site-packages/evidently/__main__.py", line 183, in <module>
    parsed.handler(**parsed.__dict__)
  File "/opt/conda/lib/python3.9/site-packages/evidently/__main__.py", line 114, in calculate_profile
    sampling = __get_not_none(opts_data, "sampling", {})
  File "/opt/conda/lib/python3.9/site-packages/evidently/__main__.py", line 53, in __get_not_none
    return default if src.get(key, None) is None else src.get(key)
AttributeError: 'str' object has no attribute 'get'

I think it is not about "dashboard_tabs", but about "sampling" section.

Feb 02 '22 09:02 jenoOvchi

Oh, sorry, my bad, In your code snippet, you are incorrectly writing JSON to file.

Should be something like:

with open('config.json', 'w') as outfile:
    json.dump(config, outfile)

Feb 02 '22 09:02 Liraim

Oh, sorry, my bad, In your code snippet, you are incorrectly writing JSON to file.

Should be something like:
with open('config.json', 'w') as outfile:
    json.dump(config, outfile)

That's works for me, thanks! But i found some errors before correct config has been written:

We still need "profile_sections" section for correct execution with format, that is different from documentation examples - "profile_sections":{"data_drift": {}}. Error without "profile_sections":

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.9/site-packages/evidently/__main__.py", line 183, in <module>
    parsed.handler(**parsed.__dict__)
  File "/opt/conda/lib/python3.9/site-packages/evidently/__main__.py", line 117, in calculate_profile
    usage["parts"] = opts_data["profile_sections"]
KeyError: 'profile_sections'

"nbinsx": null from config in master is incorrect value:

INFO:root:reference dataset loaded: 50 rows
INFO:root:current dataset loaded: 25 rows
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.9/site-packages/evidently/__main__.py", line 183, in <module>
    parsed.handler(**parsed.__dict__)
  File "/opt/conda/lib/python3.9/site-packages/evidently/__main__.py", line 146, in calculate_profile
    runner.run()
  File "/opt/conda/lib/python3.9/site-packages/evidently/runner/profile_runner.py", line 52, in run
    profile.calculate(reference_data, current_data, self.options.column_mapping)
  File "/opt/conda/lib/python3.9/site-packages/evidently/model_profile/model_profile.py", line 31, in calculate
    self.execute(reference_data, current_data, column_mapping)
  File "/opt/conda/lib/python3.9/site-packages/evidently/pipeline/pipeline.py", line 45, in execute
    instance.calculate(rdata, cdata, column_mapping)
  File "/opt/conda/lib/python3.9/site-packages/evidently/analyzers/data_drift_analyzer.py", line 84, in calculate
    current_nbinsx = data_drift_options.get_nbinsx(feature_name)
  File "/opt/conda/lib/python3.9/site-packages/evidently/options/data_drift.py", line 39, in get_nbinsx
    raise ValueError(f"DataDriftOptions.nbinsx is incorrect type {type(self.nbinsx)}")
ValueError: DataDriftOptions.nbinsx is incorrect type <class 'NoneType'>

Feb 02 '22 11:02 jenoOvchi

Thanks for reporting this. We will check and fix this.

Feb 02 '22 12:02 Liraim

Thanks for reporting this. We will check and fix this.

Thanks!

Feb 02 '22 12:02 jenoOvchi