DataProfiler icon indicating copy to clipboard operation
DataProfiler copied to clipboard

Presets for ProfilerOptions/Profiler

Open pietz opened this issue 3 years ago • 6 comments

Is your feature request related to a problem? Please describe.

The Profiler can be slow with all options enabled. Most users won't need all of the settings to be enabled. Changing the settings one by one takes quite a bit of time and you need to study the docs quite a bit.

Describe the outcome you'd like:

It would be nice if ProfilerOptions or the Profiler itself had the possibility of quickly adjusting multiple settings by setting a preset. The current default behavior could have the option "all" or "complete". It might look like this:

profiler = dp.Profiler(data, preset="complete")

or

opts = dp.ProfilerOptions(preset="complete")

Some other presets could be:

  • "standard" - where some niche features that are performance intensive are deactivated
  • "numeric_stats_disabled" - self explainatory
  • "data_types" - where only data types of columns are infered (this is actually what I want right now)

I'm sure you can think of others. The down side of this feature is that many presets would be opinionated.

The implementation would be straight forward and the source code e.g. profiler_presets.py would nicely list all the settings that are changed by the preset.

pietz avatar Aug 11 '22 07:08 pietz

Thanks, @pietz, for the well-documented idea -- thanks for putting in the issue. Will be in touch if we have any questions. Cheers!

taylorfturner avatar Aug 11 '22 10:08 taylorfturner

I think we can focus on getting the data_types done first as that is the most desired.

JGSweets avatar Sep 13 '22 15:09 JGSweets

Working on this

lovleen3112 avatar Sep 16 '22 16:09 lovleen3112

@lovleen3112 I think we can focus on the options preset: opts = dp.ProfilerOptions(preset="complete") as that could lead to easy add of an option preset during Profiling subsequently

JGSweets avatar Sep 16 '22 16:09 JGSweets

Sure @JGSweets

lovleen3112 avatar Sep 16 '22 16:09 lovleen3112

Once this is merged, we will have to do a validate on running the Profiler with the options to ensure there aren't any report failures since many options have been disabled.

JGSweets avatar Sep 16 '22 21:09 JGSweets

Tested all three preset options (i.e. complete, data_types, and numeric_stats_disabled) and no failures to generate reports using profiler.report()

@pietz thoughts?

taylorfturner avatar Oct 20 '22 12:10 taylorfturner

If no additional thoughts regarding this, @pietz, I'll go ahead and close. Thx!

taylorfturner avatar Apr 26 '23 17:04 taylorfturner