SDMetrics icon indicating copy to clipboard operation
SDMetrics copied to clipboard

Add get score methods for multi table QualityReport

Open katxiao opened this issue 3 years ago • 0 comments

Problem Description

Add get score methods for multi table Quality Report, so that the user can programmatically drill deeper and get the score info.

Add the following methods:

  • get_score: Return the overall quality score that was printed during the report generation.
  • get_properties: Return the breakdown that was printed during the report generation
  • get_details: This method should return the details for each score that we have computed. The inputs are the score names.
  • get_raw_score: A general method to access any of the metrics we computed

Expected behavior

>>> report.get_score()
0.899779301
>>> report.get_properties()

Property              Score
Column Shapes         0.841484929101
Column Pairs          0.74425019399111
Table Relationships   0.88592111111022
>>> report.get_details(property='Column Shapes', table_name='users')
Table Name       Column         Metric        Quality Score
users            purchase_amt   KSComplement   0.880
users            card_type      TVComplement   0.690
users            start_date     KSComplement   0.790
>>> report.get_details(property='Column Pairs', table_name='users')
Table Name       Columns  Metric                 Kwargs                 Quality Score   Real Score  Synthetic Score
users                  (a, b)   ContingencySimilarity                         0.880       
users                  (a, b)   CorrelationSimilarity  coefficient='Pearson'  0.745           0.6829      0.174499
...
>>> report.get_details(property='Table Relationships')
Child Table   Parent Table     Metric                         Quality Score
users         transactions     CardinalityShapeSimilarity     0.89018
users         accounts         CardinalityShapeSimilarity     0.91740
...
>>> report.get_raw_results(metric_name='KSComplement')
[{
  'metric': {
    'method': 'single_table.KSComplement.compute_breakdown',
    'kwargs': None
  },
  'results': {
    'user_id': { 'score': None }
    'start_date': { 'score': 0.790 },
    'purchase_amt': { 'score': 0.880 },
  ... 
  }
}]

# If there are multiple variants of it, return a list of them all
>>> report.get_raw_results(metric_name='CorrelationSimilarity')
[{
   'metric': {
     'method': 'single_table.CorrelationSimilarity.compute_breakdown',
     'kwargs': { 'coefficient': 'Pearson' }
   },
   'results' : {
      ('start_date','purchase_amt'): { 'real': 0.89, 'synthetic': 0.577, 'score': 0.8435 },
     ... 
   },
}, {
   'metric': {
     'method': 'single_table.CorrelationSimilarity.compute_breakdown',
     'kwargs': { 'coefficient': 'Spearman' }
   },
   'results' : {
      ('start_date','purchase_amt'): { 'real': 0.89, 'synthetic': 0.577, 'score': 0.8435 },
     ... 
   }
}]

katxiao avatar Aug 12 '22 17:08 katxiao