atomspace icon indicating copy to clipboard operation
atomspace copied to clipboard

Generalized Distributional TV proposal

Open ngeiswei opened this issue 9 years ago • 9 comments

Generalized Distributional TV

Overview

This is a proposal for a new TV type that encompasses probabilistic, fuzzy, distributional TV types and more. In short it is a distributional TV that may work over any domain, not just boolean/fuzzy, equipped with counts to quantify uncertainty.

Disclaimer: things are presented under the angle of Kolmogorov probability theory, mostly cause I feel more comfortable with it.

Omega

Omega is a probability space representing the subjective universe. We learn about Omega via random variables. A random variable X: Omega->E is a measurable function from domain Omega to co-domain E.

Any atom in the AtomSpace associated with a TV represents a random variable. The TV (or GDTV as defined below) describes the distribution over the co-domain of that random variable, and the atom defines the name (or the structure) of that random variable and its relationships to other random variables, i.e. other atoms.

When talking about the co-domain of an atom A we mean the co-domain of the associated random variable of the TV of that atom.

Definition

We introduce a new TV type called Generalized Distributional Truth Value, or GDTV for short. It could also be called merely Distributional Value, or DV for short.

Unconditional GDTV

An unconditional GDTV of atom A is a pair <D,N> where

  1. D is a probability distribution over E, the co-domain of A.
  2. N is a count, a non negative real number.

In practice, discrete case, the probability measure will be described as a mass function over a partition of E summing up to 1, following the format

<{k1:p1, ..., kn:pn}, N>

where k1 to kn are the partition blocks (often singletons or intervals), and p1 to pn are probabilities summing up to 1.

Conditional GDTV

More generally a GDTV is a conditional probability distribution. In the discrete case, a GDTV over 2 atoms A and B is defined as a map from a partition of the co-domain of A to a unconditional GDTV of B, as defined above. The whole thing forms a conditional probability matrix, equipped with a count on each row. The format may look like

{ka1: <{kb1:p11, ..., kbm:p1m}, N1>,
 ...
 kan: <{kb1:pn1, ..., kbm:pnm}, Nn>}

where ka1 to kan are partition blocks over the co-domain of A, and kb1 to kbm are partition blocks over the co-domain of B.

Examples

Unconditional GDTV

Perfect boolean sensor

Let's assume our window to the universe is a perfect and timeless boolean sensor A. Since the sensor is boolean the co-domain E of the associated random variable is {0,1}. One may represent an active sensor as follows

Evaluation <{1:1}, +inf>
  Predicate "sensor"
  Concept "A"

meaning that the random variable with name/structure

Evaluation
  Predicate "sensor"
  Concept "A"

has probability

P(X=1)=1 with count +inf.

The first 1 in {1:1} indicates the sensor's value, the second indicates its probability. There is a bit of notational abuse because the first 1 represents in fact a singleton {1}.

Likewise an inactive sensor would be represented as

Evaluation <{0:1}, +inf>
  Predicate "sensor"
  Concept "A"

The count is +inf as we assume that the sensor is perfect, i.e. based on infinite evidences.

Imperfect boolean sensor

Let's assume that sensor B is imperfect and is faulty one percent of the time based on a million observations. One may represent such a random variable as follows

Evaluation <{1:0.99, 0:0.01}, 1M>
  Predicate "sensor"
  Concept "B"

Perfect fuzzy sensor

Let's assume that perfect sensor C can take any value within [0,1] and is true with degree 0.9786 with absolute certainty.

Evaluation <{0.9786:1}, +inf>
  Predicate "sensor"
  Concept "C"

Imperfect boolean sensor with infinite certainty

Let's assume that imperfect boolean sensor D has probability 0.9786 of being active, and we're absolutely sure of that probability.

Evaluation <{1:0.9786}, {0:0.0214} +inf>
  Predicate "sensor"
  Concept "D"

Imperfect fuzzy sensor

Let's assume that imperfect sensor E can take any value within [0,1], and background knowledge suggests (after testing this sensor a thousand of times) we may safely discretize it in 4 bins.

Evaluation <{[0,0.25):0.002,
             [0.25,0.5):0.008,
             [0.5,0.75):0.2,
             [0.75,1]:0.7}, 1K>
  Predicate "sensor"
  Concept "E"

where [x,y) represent the left-closed right-open interval from x to y.

Imperfect non-boolean, non-fuzzy sensors

What if we want to describe random variable with co-domain other than [0,1]? Then we can no longer use predicates, but we can still use schemata!

Say we want to represent the distribution of heights in a population of 10K individuals. We may describe it as follows

ExecutionOutput <{[0cm,100cm):0.2,
                  [100cm,160cm):0.3,
                  [160cm,240cm):0.5}, 10K>
  Schema "in-cm"
  Concept "height"

We may actually represent the intervals using atoms. For instance [0cm,100cm) might be described as

LeftClosedRightOpenIntervalLink
  QuantityLink
    NumberNode 0
    UnitNode "cm"
  QuantityLink
    NumberNode 100
    UnitNode "cm"

or equivalently

QuantityLink
  LeftClosedRightOpenIntervalLink
    NumberNode 0
    NumberNode 100
  UnitNode "cm"

Note: QuantityLink and UnitNode are currently undefined types and used for expository purpose.

Conditional GDTV

SubSet

Assume we have 2 crisp concepts A and B. The exact probability distribution of P(B|A), with P(B=0|A=0)=0.3 and P(B=1|A=0)=0.7, P(B=0|A=1)=0.1 and P(B=1|A=1), may be represented as

SubSet {0:<{0:0.3, 1:0.7}, +inf>,
        1:<{0:0.1, 1:0.9}, +inf>}
  Concept "A"
  Concept "B"

ExtensionalImplication over schemata

Assume we have 2 schemata over individuals, one that outputs the age, another that outputs the height. There are 3 individuals, I1, I2 and I3 with ages 10, 50 and 100 yo respectively, and with heights 120, 172 and 169 cm respectively.

The schema "age" is defined as follows

Execution <{1:1}, +inf>
  Schema "age"
  Concept "I1"
  Number 10
Execution <{1:1}, +inf>
  Schema "age"
  Concept "I2"
  Number 50
Execution <{1:1}, +inf>
  Schema "age"
  Concept "I3"
  Number 100

One may also use the equivalent representation

ExecutionOuput <{10:1}, +inf>
  Schema "age"
  Concept "I1"
ExecutionOuput <{50:1}, +inf>
  Schema "age"
  Concept "I2"
ExecutionOuput <{100:1}, +inf>
  Schema "age"
  Concept "I3"

The schema "height" is defined as follows

Execution <{1:1}, +inf>
  Schema "height"
  Concept "I1"
  Number 120
Execution <{1:1}, +inf>
  Schema "height"
  Concept "I2"
  Number 173
Execution <{1:1}, +inf>
  Schema "height"
  Concept "I3"
  Number 169

One may also use the equivalent representation

ExecutionOuput <{120:1}, +inf>
  Schema "height"
  Concept "I1"
ExecutionOuput <{172:1}, +inf>
  Schema "height"
  Concept "I2"
ExecutionOuput <{169:1}, +inf>
  Schema "height"
  Concept "I3"

The conditional distributional probabilities of the height given the age, assuming 2 partitions for age (0 to 40, and 40 to 130) and 2 partitions for height (0 to 150, and 150 to 200) may be represented as

ExtensionalImplication {[0,40):<{[0,150):1}, 1>,
                        [40,130):<{[150,200):1}, 2>}
  Schema "age"
  Schema "height"

meaning that based on 1 individual, being below 40 implies being under 150cm, and based on 2 individuals, being 40 or more implies being 150cm or more.

Note: we may want to introduce a new link type for schemata instead of using ExtensionalImplication. At least there is a small rational for using ExtensionalImplication instead of SubSet as a predicate is a kind of schema.

Relationship with existing TVs

Probabilistic simple TV

A probabilistic simple TV <s, N> is represented by the following GDTV

<{1:s}, N>

Fuzzy simple TV

A fuzzy simple TV <s, N> is represented by the following GDTV

<{s:1}, N>

Distributional TV

As we've seen a GDTV is essentially a DTV + a count, with its distribution over any co-domains.

Generic TV

I've never used Generic TV but I recall having a discussing with Linas as to why he introduced them and one of the reasons was to store the mutual entropy of pairs of random variables. With GDTV this is possible, see Section "Other operators".

Uncertainty

The uncertainty of a GDTV is represented by its count, or counts if it is a conditional GDTV. Counts are real non negative numbers. Although integers will be used most of the time, having real numbers allow counts derived from inferences to not be unnecessarily truncated into integers (this may or not turn out to be useful). The count reflects a second order distribution over the distributions over the co-domain, and should be used to estimate the counts of inferred conclusions. We could start experimenting with Dirichlet distributions as second order distributions.

Of course a confidence number can be used as well, however I believe the count has a clearer semantics and should be the default.

I suspect we want to drop the lookahead K as defined in the indefinite truth value and use plain Dirichlet prior instead (corresponding to K tends to +inf). The PLN books claims that using small Ks helps to decrease the premise/conclusion interval expansion rate. Since the PLN book doesn't back it up with data I challenge that claim. Ultimately experiments will have to be carried out to determine what is best.

ExtensionalImplication over non-predicate

As we've seen in the example one may define ExtensionalImplication not just over predicates but also schemata. One may push this even further and define extensional implication over almost any atom type, using Omega as underlying probability space, even though the elements of the partitions taken into consideration are undefined.

For instance let's define

Evaluation <{0:0.5, 1:0.5}, 200>
  P
  A

Evaluation <{0:0.5, 1:0.5}, 200>
  Q
  B

That is P(A) (resp. Q(B)) is true half of the time, based on 200 observations each. However we may also know that they are extremely correlated and when P(A) is true P(B) has 99% change of being true. One may represent that with

ExtensionalImplication {0:<{0:0.99, 1:0.01}, 100>,
                        1:<{0:0.01, 1:0.99}, 100>
  Evaluation
    P
    A
  Evaluation
    Q
    B

Note: in the PLN book this is conceptually handled by defining associated predicates of these evaluations over the domain of contexts. However I don't think it is necessary as conditional probabilities can be directly defined based on the probability space Omega.

Again instead of using ExtensionalImplication we might want to define a new link type.

Relationships between random variables (atoms and GDTVs)

As we know the AtomSpace is not just a way to represent random variables, but how they relate to each others. These relations allow us to create abstract structures, not necessarily true (most of them define concepts or predicates with null probabilities in Omega), but useful to reason with and ultimately infer new knowledge about Omega itself.

Examples

Funny people at the party

Let's build abstract sets of people at a party, funny, and how they relate to each others.

Let's define the members of concept "at-the-party"

Member <{1:1}, +inf>
  Concept "John"
  Concept "at-the-party"

Member <{1:1}, +inf>
  Concept "Mary"
  Concept "at-the-party"

Member <{1:1}, +inf>
  Concept "Claude"
  Concept "at-the-party"

Member <{0:1}, +inf>
  Concept "David"
  Concept "at-the-party"

saying that we live in a universe where John, Mary and Clause are at the party.

Let's define the members of concept "funny"

Member <{1:1}, +inf>
  Concept "John"
  Concept "funny"

Member <{1:1}, +inf>
  Concept "Mary"
  Concept "funny"

Member <{0:1}, +inf>
  Concept "Claude"
  Concept "funny"

Member <{1:1}, +inf>
  Concept "David"
  Concept "funny"

saying that we live in a universe where John, Mary and David are funny.

The concepts "at-the-party" or "funny" are very likely not meaningful subsets of Omega. We can give them null TVs because the set of people a party is in no way a description of the possible world we're in. But that doesn't matter, what matters is the abstract relationships we can build between them. So we may have

Concept "funny" <{0:1}, +inf>
Concept "at-the-party" <{0:1}, +inf>

meaning that "funny" and "at-the-party' are almost surely false in our subjective universe (neither John, nor Mary, etc, are possible worlds). However we can still calculate the following

SubSet <{0:<{0: 1}, 1>, 
         1:<{0: 0.3333, 1:0.6666}, 3>}, 4>
  Concept "at-the-party"
  Concept "funny"

saying that based on one evidence (David), people not at the party are funny, and based on 3 evidences, people at the party are funny with probability 2/3.

The formula used is to calculate the later is obviously

(funny(John) + funny(Mary) + funny(Clause))
/
(at-the-part(John) + at-the-party(Mary) + at-the-party(Clause))

Let's now assume that the knowledge of these people being at the party or funny is uncertain, so we have for instance

Member <(1: 0.6, 0:0.4), 1K>
  Concept "John"
  Concept "funny"

etc.

Then the calculation above would differ because funny(John), etc, in fact random variables, thus their sum turns into a convolution product (assuming they are independent).

Explicitly defining Omega

Above we defined abstract concepts like at-the-party and funny, and we assumed that these concepts were almost surely false, but we may well assume the opposite.

Let's assume the following Atom in the AtomSpace

Concept "A" <{1:1}, +inf>

meaning that the random variable with name/structure

Concept "A"

is almost surely true.

Now let's add the following

Member <{1:1}, +inf>
  Number 1
  Concept "A"
...
Member <{1:1}, +inf>
  Number 10
  Concept "A"

As well as

Member <{0:1}, +inf>
  Number 11
  Concept "A"
...
Member <{0:1}, +inf>
  <anything-but-numbers-from-1-to-10>
  Concept "A"

I'm using this trick so I don't have to deal with universal quantifiers. Basically what it says is that A contains the numbers from 1 to 10 and nothing else. But given that A almost surely true, this allows us to actually define Omega (or at least the subjective part that matters to us)!

I'm not sure how useful that is that but I thought I'd mention it. In practice we don't know what Omega is, it is open-ended, as is our universe.

Fuzzy Operators

And, Or and Not links can be seen as fuzzy operators over random variables with co-domains [0,1]. For instance in

And <GDTV>
  Evaluation <GDTV-1>
    P
    A
  Evaluation <GDTV-2>
    Q
    B

GDTV corresponds to the distribution of the random variable defined as a minimum between (Evaluation P A) and (Evaluation Q B). If for instance

GDTV-1 = <{0.7:1}, +inf>
GDTV-2 = <{0.5:1}, +inf>

then

GDTV = <{0.5:1}, +inf>

If these variables have independent distributions then the resulting probabilities for each resulting key min(h, k) are multiplied and added, etc. For instance

And
  Evaluation <{0.2:0.1, 0.4:0.4, 0.6:0.3, 0.8:0.2}, +inf>
    P
    A
  Evaluation <{0.1:0.2, 0.2:0.3, 0.4:0.3, 0.6:0.2}, +inf>
    Q
    B

will have as resulting GDTV

<{0.1:0.1*0.2 + 0.4*0.2 + ...,
  0.2:0.1+0.3 + 0.4*0.3 + ...,
  ...}, +inf>

Other Operators

Operators over any random variables' co-domain can be used. For instance

Plus <DGTV-1 + DGTV-2>
  ExecutionOutput <DGTV-1>
    Schema "S1"
    Concept "A1"
  ExecutionOutput <DGTV-2>
    Schema "S2"
    Concept "A2"

where <DGTV-1 + DGTV-2> corresponds to the TV of the random variable which is the sum of the random variables of GDTV-1 and GDTV-2. For instance assuming

GDTV-1 = <{10:0.9, 0:0.1}, +inf>
GDTV-2 = <{32:0.6, 0:0.4}, +inf>

and are independent then

GDTV-1 + GDTV-2 = <{42:0.54, 32:0.06, 10:0.36, 0:0.04}, +inf>

Operators like List constructions are very useful and can be used to build random vectors, such as random pairs of variables to store the necessary information to compute their mutual entropies, as shown below.

Bayesian network

GDTV allows us to represent Bayesian networks. Consider the BN in https://en.wikipedia.org/wiki/Bayesian_network#Example

S = Sprinkler
R = Rain
G = Grass wet

Each random variables are boolean, however not all are conditionally independent, for instance G depends on S and R combined. GDTV allows us to represent these dependencies.

Extensional implication between Rain and Sprinkler:

ExtImplication <{0:{1:0.4, 0:0.6},
                 1:{1:0.01, 0:0.99}}, +inf>
  R
  S

Extensional implication between (Sprinkler, Rain) and Grass. Note the use of List to build a random pair of variables:

ExtImplication <{(0, 0):{1:0.0, 0:1.0}, 
                 (0, 1):{1:0.8, 0:0.2},
                 (1, 0):{1:0.9, 0:0.1},
                 (1, 1):{1:0.99, 0:0.01}, +inf>
  List
    S
    R
  G

Where (0, 0) to (1, 1) enumerate all possible boolean pairs, that is the possible values of the co-domain of (List S R). One may see how that information is sufficient to compute the mutual information between S and R.

Probabilistic Programming Language

Likewise one can represent the conditional distribution of a program's output, or program subfunctions using GDTV. In that case a GDTV query using the BC chainer would amount to a PPL variable query. One may probably design PLN rules to perform the kind of Monte-Carlo simulations used by PPL to answer queries.

Mixed extensional-intensional relationships

In the PLN book, mixed implication is the disjunction of extensional and intensional implication (likewise for inheritance). The definition of intensional implication using GDTV naturally follows as it is merely an extensional implication in pattern space. The disjunction between two conditional GDTVs however is not trivial. In the PLN book the way to go about that would be to turn the conditional DGTV into a simple TV according to the formula Section 2.4.1.1

s = sum_x f(P(x), Q(x)) / sum_x P(x)

where f is min for instance. Then whether the simple TV is fuzzy

<{s:1}, N>

or probabilistic

<{1:s, 0:1-s}, N>

the disjunctions between these is trivially defined.

There other ways to go about, as there was many ways to turn a conditional GDTV into a unconditional one, and then disjunction is trivially defined as well. I'm personally not sure what is the right way, and ultimately this might may require experimentations.

ngeiswei avatar Jul 19 '16 19:07 ngeiswei

I strongly suggest that instead of inventing a new TV, that the work on ProtoAtom be finished, instead. I think the theory for ProtoAtom is mostly done, see http://wiki.opencog.org/wikihome/index.php/ProtoAtom and also the status in bug #513.

The ProtoAtom is intended to be the ultimate do-all, be-all replacement for TV's and any other generic values to be associated with atoms, as attributes. Of course, this won't happen, if people hat the API to it, or find it unsatisfying, and so a review of the wiki page, and of the bug #513 is strongly encouraged.

linas avatar Aug 13 '16 02:08 linas

Agreed. I suppose then the current AtomSpace class will be replaced by a AtomSpaceTVAV class or such that inherits an AtomSpace class that has no TV or AV and just a generic ProtoAtom instead.

Anyway, I don't intend to look into this for now, I prefer to focus on the Backward Chainer, but once we start getting deeper into PLN reasoning, this will have to be carefully addressed.

ngeiswei avatar Sep 02 '16 07:09 ngeiswei

Re backwards-chainng, plese review the blog post about multiverses, and see if you can steal ideas from it. It is not necessary, but it could be useful. Some day, in the more distant future, if it's done right, it could even be important, but for right now, its something to keep in mind, and try to apply.

blog post: http://blog.opencog.org/2016/08/31/many-worlds-reasoning-about-reasoning/

linas avatar Sep 02 '16 23:09 linas

FYI, the protoatom work #513 is mostly done; some extra polish and some utilities may be needed but the core works. It works like so: for any (key,atom) pair, you can associate a (mutable) array of floats (FloatValue), array of strings (StringValue), or array of these arrays (ListValue). The current, existing TV's all inherit from FloatValue.

The key can be any atom, although PredicateNode is recommended. The current TruthValues are associated with PredicateNode "*-TruthValueKey-*"

The triple (key, atom, value) is called a "Valuation" in the C++ code; its kind-of-ish like an EvaluationLink, and its kind-of-ish like the concept of "valuation" used in model theory books. Since the Valutions are not atoms, they are not indexed in the AtomSpace, and the pattern matcher cannot be used to find them. (in the same way that the pattern matcher cannot be used to search for TV's or AV's)

The valuations are saved/restored from the SQL tables, If you build the Distributional TV from these values, it will be handled automatically. New kinds of values can be added too; they use the same atom_types.script and classserver() mechanism as atoms.

linas avatar Mar 07 '17 16:03 linas

Cool! I haven't looked at how it has affected the C++ TV code, I suppose it's rather profound. It feels that all the C++ TV could now be replaced by scheme code. Not suggesting to do that, but it shows how much more flexibility is at our disposal. Well done!

ngeiswei avatar Mar 07 '17 20:03 ngeiswei

Yes, I guess that the "all the C++ TV could now be replaced by scheme code". For now, I did keep the TruthValuePtr inside of atoms, since I figured that maybe this makes some difference to performance ... except that ... it probably doesn't, in any practical use.

It would be nice to have some sort of PLN performance suite, to see what changes, if any, affect performance. The benchmark measures this, but its not clear how much certain subsystems matter.

linas avatar Mar 10 '17 10:03 linas

FYI, I recently added some scheme utilities to compute distributions aka "bin counting". See #1248. The resulting distributions are currently not attached to any atom, or otherwise integrated into the atomspace; I only use it to print TSV files to create graphs. https://github.com/linas/atomspace/blob/master/opencog/analysis/bin-count.scm

linas avatar Jun 04 '17 19:06 linas

I suspect you have not noticed/realized this, but the code in https://github.com/linas/atomspace/blob/master/opencog/matrix implements generalized distributional TV's and an extensive framework for working with them, and manipulating them in various ways.

The actual representational format is completely different from what is sketched above. The biggest feature is this: unlike the above, one does NOT have to define new atom types, new truth value types, new ways of representing things. Instead, one writes some access routines that fish out the GDTV's from whatever locations they might be currently stored at. Once these are written, all of the rest of the compute infrastructure comes "for free": you can compute marginals and conditionals and ... well, piles of stuff, anything that I found useful, so far.

The lesson that I learned is that this way of treating GDTV's works really really well: it was a trial by fire, of actually having to use it and make it work. It works. Now is the time to re-write it in C++, (or R, or something) because the existing scheme API is .. not that fast, and maybe hard to understand, if you are used to existing statistics packages like R (or SciPy). (I don't know how hard a port to SciPy would be. Python/cython is f**'ed up, on the inside, R is much cleaner.)

linas avatar Oct 16 '18 17:10 linas

Thanks for the reference @linas, we'll have a look at it.

ngeiswei avatar Oct 17 '18 04:10 ngeiswei