missingno icon indicating copy to clipboard operation
missingno copied to clipboard

md.pattern

Open vankesteren opened this issue 5 years ago • 7 comments

Would it be possible to include a plot for patterns of missingness similar to the md.pattern functionality in the mice package in R?

Here's an example from that package: image

this plot tells us the following: 13 observations have 0 missing values 3 observations have missing values on chl only 10 observations have missing values on chl etc...

the patterns are easily visible and compact: the plot scales with the number of missingness patterns, not with the number of rows in the dataframe!

vankesteren avatar Mar 25 '20 14:03 vankesteren

This is a neat idea! Will see what I can do.

ResidentMario avatar Mar 25 '20 15:03 ResidentMario

I am both interested in the feature and interested in contributing to this. This would be especially handy with data that exceeds memory (so would be great to make this dask compatible).

SultanOrazbayev avatar Jun 24 '20 23:06 SultanOrazbayev

@vankesteren: while the PR is reviewed, it would be great if you could do an independent test-drive of the new pattern function.

SultanOrazbayev avatar Jun 25 '20 00:06 SultanOrazbayev

I'll see what I can do!

vankesteren avatar Jul 02 '20 09:07 vankesteren

Looks great! Here is the pattern function applied to the same dataset:

image

I do have the following suggestions:

  • there is some information missing:
    • add the number of missing values in the pattern (right margin of the md.pattern R plot above)
    • There are no column counts (bottom row in the md.pattern plot)
  • add a method for visualisation or a suggestion on how to visualise this in the documentation
  • call the mvcount column simply count (or to avoid overlap with the column names, maybe something like _count_?). mvcount in my head goes immediately to "multivariate count"

vankesteren avatar Jul 02 '20 09:07 vankesteren

Thanks for the suggestions!

re: adding number of missing values: do you have a suggestion for the name of this column? values_missing?

SultanOrazbayev avatar Jul 02 '20 23:07 SultanOrazbayev

yeah, that works! or maybe n_missing?

vankesteren avatar Jul 03 '20 09:07 vankesteren