SDV
SDV copied to clipboard
Condition on primary keys
Problem Description
Let's add the ability to condition on columns that are primary keys.
Expected behavior
If the user conditions on a primary key, return a row with the desired value as the primary key. (No special modeling needed. Just rewrite the ID per the user's request.)
Note: The API below expects we've already implemented the sample_conditions and sample_remaining_columns methods. See #691 and #692 .
from sdv.tabular.sampling import Condition
a = Condition(column_values={'user_id': 100}, num_rows=1)
b = Condition(column_values={'user_id': 101}, num_rows=1)
# model = any tabular model
model.sample_conditions(conditions) # returns ids 100, 101
# or passing in a dataframe with primary keys
import pandas as pd
known_ids = pd.DataFrame(data={'user_id': [100, 101]})
model.sample_remaining_columns(known_columns=known_ids)
Error Handling
You cannot request more than 1 row with the same primary key.
>>> a = Condition(column_values={'user_id': 100}, num_rows=1)
>>> b = Condition(column_values={'user_id': 101}, num_rows=2)
>>> model.sample_conditions(conditions)
Error: You have requested multiple rows with the same primary key.
Primary keys must be unique in the dataset.
>>> known_ids = pd.DataFrame(data={'user_id': [100, 101, 101]})
>>> model.sample_remaining_columns(known_columns=known_ids)
Error: You have requested multiple rows with the same primary key.
Primary keys must be unique in the dataset.