openfisca-core icon indicating copy to clipboard operation
openfisca-core copied to clipboard

Asof parameters

Open benjello opened this issue 4 years ago • 13 comments

New features

  • Introduce VectorialAsofDateParameterNodeAtInstant

    • Allows for extracting serialized parameters as of some date introduced as a np.datetime64 vector
  • Example : The test file shows parameters that depends on your date of birth being after or before some dates. So you need to get the parameters as of some date.

benjello avatar Feb 17 '21 11:02 benjello

@sandcha @guillett @maukoquiroga @eraviart : I know this PR is extremely ugly and I intent to improve it. But I definitively need your advice on how to do that.

benjello avatar Feb 17 '21 11:02 benjello

@maukoquiroga @sandcha @guillett : I would love to hear about you on this WIP PR ...

benjello avatar Mar 09 '21 15:03 benjello

@maukoquiroga : you went through a large refactoring and I am quite lost to rebase my branch on master ... Could you rebase it for me? Thanks.

benjello avatar Apr 09 '21 10:04 benjello

Does this PR answer this issue: https://github.com/openfisca/openfisca-core/issues/720 ? 🙂

sandcha avatar May 04 '21 13:05 sandcha

Does this PR answer this issue: #720 ?

No, it is a completely different topic.

benjello avatar May 04 '21 13:05 benjello

Rebasing the branch as discussed with @benjello.

sandcha avatar Dec 09 '21 15:12 sandcha

Coverage Status

Coverage increased (+0.1%) to 79.005% when pulling d215247fe0d53d6fbda888677e34963e3203fa22 on asof-parameters into 601a397fd02d6c56b751cadfbda0f3e5c911934f on master.

coveralls avatar Dec 09 '21 15:12 coveralls

@nikhilwoodruff @eraviart @sandcha @MattiSG :

I need your input to imporve this PR. In particular for the prefix of the generation (ne_apres_YYYY_MM-DD, ne_avant_YYYY_MM-DD). Do after_YYYY_MM-DD, before_YYYY_MM-DD would be more generic ?

benjello avatar Dec 09 '21 15:12 benjello

This looks nice to me. Of course my preference would definitely be before/after but feel free to outvote this.

nikhilwoodruff avatar Dec 09 '21 16:12 nikhilwoodruff

Thank you for this PR and for the examples!

Use cases

Thanks to the parameters_date_indexing/trimtp_rg.yml, I understand that what we call asof here could improve the modelization of some use cases (and that it has nothing to do with reforms 😅). Possible use cases:

  • Pensions that depend on the birth date or career of persons and therefore, depend on different dates specific to the person/entity.
  • In France for example, the merit benefit for students where the eligibility and amount depend on the year of the baccalaureate/general certificate of education of the student/entity (description in French).

Asof is a new feature because...

So far, when we filtered parameters by date, it was the simulation period or, more precisely, the date on which we observed the applicable law. Here, if I'm right, we have use cases where at the same time, we observe different applicable law options that depend on a date specific to the entity. So, we need to access a parameter by a double index: the usual "simulation period" and a new index, a "date" index.

Consequently, it looks great to have some mechanism to manage these situations!

Questions & issues

But I have some doubts about the implementation:

  • We introduce an entirely new parameter format while we already have double indexed parameters where one is the simulation period and the other is some entity information: the tax scales formats.
  • The chosen filter here (before_*, after_*) is not a simple string key. We need to parse it and the algorithm will have to adapt to the specific prefix of the item: it mixes a bit between the current abstraction levels in the parameter management and yaml data format. Besides, parsing a string and having keys to parse that depend on each other (like before_1934_01_01 and after_1934_01_01) implies to manage more error messages to check the coherence and help the user.
  • The typing of the parameters is managed between parameters/ and taxscales/. If we introduce the VectorialAsofDateParameterNodeAtInstant, it looks like we are bypassing some steps (between a ParameterNodeAtInstant and the Python class that defines the object like those we have for the scales: SingleAmountTaxScale, ...).

@benjello @nikhilwoodruff What do you think about these implementation questions? 🤔 cc @maukoquiroga @MattiSG

To summarize what I'm saying here: such a parameter would be great 🙌 , keys to parse like after_YYYY_MM_DD open multiple issues, any other structured format where we would not have keys to parse to extract multiple information (😰) would work for me 😊

sandcha avatar Jan 03 '22 11:01 sandcha

One solution might be to see the asof parameter as a scale that evolves by date: the thresholds values would be dates as they are naturally ordered and could be filtered with the usual .calc(information from the entity) method.

Here is a first example from trimtp_rg.yml; it would be changed as follows:

- nombre_trimestres_cibles_par_generation:
-   description: Nombre de trimestres cibles par génération
-   before_1934_01_01:
-     description: Avant 1934
-     values:
-       1983-01-01:
-         value: 150.0
-   after_1934_01_01:
-     description: '1934-01-01'
-     values:
-       1994-01-01:
-         value: 151.0
-       1983-01-01:
-         value: null

+ number_targeted_quarters_per_generation:
+   description: The number of targeted contribution quarters per generation
+   metadata:
+     threshold_unit: date
+     amount_unit: quarters
+ brackets:
+   - amount: 
+       1983-01-01:
+         value: 150.0
+     1994-01-01:
+         value: null
+     threshold: 
+       1983-01-01:
+         value: 0001-01-01 # from python/openfisca default date to next threshold
+   - amount: 
+       1994-01-01:
+          value: 151.0
+     threshold: 
+       1983-01-01:
+         value: 1934-01-01 # born after January 1934

This would also mean that we extend the allowed parameters types to include dates. Would it work for your use case?

sandcha avatar Jan 03 '22 11:01 sandcha

@sandcha @nikhilwoodruff : I think we should go for a more general multiindexed array parameters. What do you think of something like the following:

quarters_by_generation:
  description:  Quarters needed to get full retirement by generation
  index:
    generation:
      description: date of birth
      type: date
      selection: asof # could be exact by default or any other  
    # period is the implicit 0-axis
    # We can have more than one more axis.
  values:
    null:
      1983-01-01:
        value: 150.0
    1934-01-01:
      1994-01-01:
        value: 151.0
      1983-01-01:
        value: null

cc @eraviart @guillett @maukoquiroga

benjello avatar Jan 04 '22 10:01 benjello

@sandcha @nikhilwoodruff : I think we should go for a more general multiindexed array parameters. What do you think of something like the following:

quarters_by_generation:
  description:  Quarters needed to get full retirement by generation
  index:
    generation:
      description: date of birth
      type: date
      selection: asof # could be exact by default or any other  
    # period is the implicit 0-axis
    # We can have more than one more axis.
  values:
    null:
      1983-01-01:
        value: 150.0
    1934-01-01:
      1994-01-01:
        value: 151.0
      1983-01-01:
        value: null

cc @eraviart @guillett @maukoquiroga

I think that scheme would work great for us, @benjello - I like that it's more streamlined. I'm actually just re-implementing a part of the UK system that I think this feature would probably simplify a lot: the Child Tax Credit has a limit on the number of children eligible (2), but when testing eligibility for each child, we need to check against the child limit at the date of their birth, rather than at the current date. Am I right in thinking that with this PR, we'd just need to do something like the below parameter and formula?

description: Child limit
values: 
  null:
    null: .inf
  2017-04-06:
    2017-04-06: 2
def formula(person, period, parameters):
  birth_date = person("birth_date", period)
  child_index = person("child_index", period)
  meets_limit = child_index < parameters(period).child_limit[birth_date]
  return meets_limit

nikhilwoodruff avatar Jan 09 '22 12:01 nikhilwoodruff