SDV icon indicating copy to clipboard operation
SDV copied to clipboard

Condition on primary keys

Open npatki opened this issue 3 years ago • 0 comments

Problem Description

Let's add the ability to condition on columns that are primary keys.

Expected behavior

If the user conditions on a primary key, return a row with the desired value as the primary key. (No special modeling needed. Just rewrite the ID per the user's request.)

Note: The API below expects we've already implemented the sample_conditions and sample_remaining_columns methods. See #691 and #692 .

from sdv.tabular.sampling import Condition

a = Condition(column_values={'user_id': 100}, num_rows=1)
b = Condition(column_values={'user_id': 101}, num_rows=1)

# model = any tabular model
model.sample_conditions(conditions) # returns ids 100, 101

# or passing in a dataframe with primary keys
import pandas as pd
known_ids = pd.DataFrame(data={'user_id': [100, 101]})
model.sample_remaining_columns(known_columns=known_ids)

Error Handling

You cannot request more than 1 row with the same primary key.

>>> a = Condition(column_values={'user_id': 100}, num_rows=1)
>>> b = Condition(column_values={'user_id': 101}, num_rows=2)
>>> model.sample_conditions(conditions)
Error: You have requested multiple rows with the same primary key.
Primary keys must be unique in the dataset. 

>>> known_ids = pd.DataFrame(data={'user_id': [100, 101, 101]})
>>> model.sample_remaining_columns(known_columns=known_ids)
Error: You have requested multiple rows with the same primary key.
Primary keys must be unique in the dataset. 

npatki avatar Jan 27 '22 19:01 npatki