PClean icon indicating copy to clipboard operation
PClean copied to clipboard

Delay proposals of values with no likelihoods in the current subproblem

Open alex-lew opened this issue 4 years ago • 0 comments

Consider the program

class NameWithNickname begin
  true_name ~ string_prior(1, 30) preferring all_first_names
  nickname ~ string_prior(1, 30) preferring all_first_names
end
class Person begin
  fname ~ NameWithNickname
  lname ~ string_prior(1, 30) preferring all_last_names
end
class Record begin
  person ~ Person
  name ~ uniform([person.fname.true_name, person.fname.nickname])
end

The problem here is that when you first process a record, you are (by design) assumed to be observing either the person’s true first name, or their nickname. But PClean will try to initialize both latent fields. Suppose you see a person’s first name, and PClean gets it right that it’s a full first name. Then later you see their nickname in another record. You won’t be able to assign the new record to the same “person” object, because the “person” object will already have some (other, generated-from-the-prior) nickname.

If we can delay the proposal of the "other" latent until we have evidence for it, we could circumvent this issue, and do accurate inference in models like this.

This is also very relevant for data integration across multiple sources, where different sources may report different attributes.

alex-lew avatar Mar 04 '21 06:03 alex-lew