SDV icon indicating copy to clipboard operation
SDV copied to clipboard

Communicate the training progress during `fit`

Open EduardoPassaro opened this issue 4 years ago • 3 comments

Environment details

If you are already running SDV, please indicate the following details about the environment in which you are running it:

  • SDV version: 0.12.0
  • Python version: 3.8
  • Operating System: Windows Server 2016
  • RAM 64GB
  • CPU 3.3GHz

Problem description

I executed sdv.fit(meta, tables) but the run is in progress for over 12 hrs. Is that expected, is there any simplification I could do? Relational model 10 tables 100 records 90 columns distributed across the tables Mix of categorical, date (%Y-%m-%d) and float datatypes

AA =  pd.read_csv('C:\AA.csv')
BB =  pd.read_csv('C:\BB.csv')
CC =  pd.read_csv('C:\CC.csv')  
DD =  pd.read_csv('C:\DD.csv')
EE =  pd.read_csv('C:\EE.csv')
FF =  pd.read_csv('C:\FF.csv')
GG =  pd.read_csv('C:\GG.csv')
HH =  pd.read_csv('C:\HH.csv')
II =  pd.read_csv('C:\II.csv')
JJ =  pd.read_csv('C:\JJ.csv')

AA['aa_date'] = pd.to_datetime(AA['aa_date'], format= '%Y-%m-%d')
BB['bb_date'] = pd.to_datetime(BB['bb_date'], format= '%Y-%m-%d')

meta = Metadata()
meta.add_table(
   name='AA',
   data=AA,
   primary_key='AA_id'
   )
meta.add_table(
   name='BB',
   data=BB,
   primary_key='BB_id',
   parent='AA',
   foreign_key='AA_id'
   )
meta.add_table(
   name='CC',
   data=CC,
   primary_key='BB_id',
   parent='AA',
   foreign_key='AA_id'
   )
meta.add_table(
   name='DD',
   data=DD,
   primary_key='DD_id'
   )
meta.add_table(
   name='EE',
   data=EE,
   primary_key='EE_id'
   )
meta.add_table(
   name='FF',
   data=FF,
   primary_key='FF_id'
   )
meta.add_table(
   name='GG',
   data=GG,
   primary_key='GG_id',
   )
meta.add_table(
   name='HH',
   data=HH,
   primary_key='AA_id',
   parent='AA'
   )   
meta.add_table(
   name='II',
   data=II,
   primary_key='AA_id',
   parent='AA'
   )  
meta.add_table(
   name='JJ',
   data=JJ,
   primary_key='JJ_id'
   )
meta.add_relationship(parent='DD', child='AA', foreign_key='DD_id', validate=True)
meta.add_relationship(parent='EE', child='DD', foreign_key='EE_id', validate=True)
meta.add_relationship(parent='FF', child='DD', foreign_key='FF_id', validate=True)
meta.add_relationship(parent='JJ', child='DD', foreign_key='JJ_id', validate=True)
meta.add_relationship(parent='GG', child='AA', foreign_key='GG_id', validate=True)

tables = {
   'AA': AA,
   'BB': BB,
   'CC': CC,  
   'DD': DD,
   'EE': EE,
   'FF': FF,
   'GG': GG,
   'HH': HH,
   'II': II,
   'JJ': JJ
   }
   
from sdv import SDV
sdv = SDV()
sdv.fit(meta, tables)

There is no output, The model is running for 12+ hrs

EduardoPassaro avatar Sep 08 '21 18:09 EduardoPassaro

Hi @EduardoPassaro, thanks for raising this issue! This definitely highlights the need for some notion of progress to be communicated during the fit process.

In order to help us understand your use case, could you share some sample data of the tables you are trying to model? It would be helpful to see the column types in each table and examples of the column data.

Additionally, if you're interested in another way of communicating with the sdv community, you could check out our slack workspace!

katxiao avatar Sep 20 '21 21:09 katxiao

Hi! Since this issue is stale now, I'm removing the "under discussion" label and repurposing it to a feature request: Communicate the training progress during fit

npatki avatar Jun 01 '22 22:06 npatki

FYI some related issues:

  • A pre-requisite for showing a progress bar would be to allow training in batches. #805 is tracking this feature request.
  • We also have an open feature request for improving the time characteristics of HMA1, found in #830

npatki avatar Jul 22 '22 18:07 npatki

Good news! The team is actively working on this feature and hoping to include soon in a future SDV release.

For more details see #1440. I'm closing this issue as a dupe of #1440.

npatki avatar Jun 01 '23 00:06 npatki