param Add json/yaml serialization

Recently, PR #153 was opened, showing the desire for json/yaml serialization of parameterized objects.

My reading of #153's serialization:

does not appear to include information about the class (when you deserialize, I think you have to pass in the class)
does not include non-parameter attributes
does not provide a way to control serialization
assumes you only want to write to a file

I think 1, 3, and 4 are necessary to address.

Jun 15 '17 20:06 ceball

I think the YAML/JSON file definitely needs to include the class name somehow for any subobject; not sure how to best express that in that format.
I'm not certain whether it's useful to include non-parameter attributes, but probably not -- this mechanism is not for serializing the state of a live object, but for saving a specification by which that object could be instantiated just the same as if that object had been specified using a Python constructor call (which would also not currently allow the attributes to be set by the user, only within the constructor.
I'm not sure what you mean?
The code should definitely support streams, not just named files.

Jun 15 '17 21:06 jbednar

Re. 3: e.g. a class can control how it's pickled by implementing __getstate__() (see https://github.com/ioam/param/wiki/Serialization#controlling-serializationdeserialization).

Jun 15 '17 22:06 ceball

Ah, you mean that 3. it does not provide classes a way to control their own serialization, or for users to control serialization of specific classes.

Ok, yes, that's needed for pickling; not sure if it's needed for JSON/YAML. Seems like something that can be added once it's clearly required.

Jun 15 '17 22:06 jbednar

Guess I'm thinking about e.g. numbergen.RandomDistribution, which will contain a random.Random object. How would you expect that to appear in json? I think we'll have to provide some way to customize json serialization from the outset just to support what's already in param.

Jun 16 '17 00:06 ceball

Hmm, sticking with random generators as an example to illustrate something I'm not clear about...what exactly is serialization?

Pickle presumably saves a random generator's state (where possible) so it carries on generating where it left off.

Right now script_repr omits random generators. But maybe it should (I think?) give a representation that would start the same sequence from the beginning again (i.e. capture the parameters of the random generator like seed etc), because that seems most useful given the aim of script_repr. However, that doesn't seem like serialization (it's not capturing the object's state).

What about json/yaml? For #153, where I think the serialization is for some kind of file-based config system, doing the same thing as script_repr above would make sense to me. But I can imagine wanting to serialize the current state of a parameterized object to pass via json. Sure, probably nobody outside python could use that in the case of a python random generator, but it seems closer to the definition of 'serialization'.

I guess I'm just trying to say that different users of json serialization might expect different things: some might expect something more like what pickle does, while others might expect something more like what script_repr does.

Jun 16 '17 00:06 ceball

It would be great if script_repr included a specification for the random number generation that would start the sequence from the beginning, and the same is true for this JSON-based config generation and reading.

Trying to serialize the full state to/from JSON/YAML would be a much bigger job, which can be postponed until there's a specific application that would require such a rich (and probably unreadable) specification whose requirements would drive the implementation. Such rich specification could be enabled specifically, using a flag asking it to represent all state, once we support it, so we can safely ignore it for now.

Jun 16 '17 02:06 jbednar

Just wanted to note YAML's support for specifying types and how you could already use it (sort of) for a configuration file.

While you can write:

a: 1.0

You can also write:

a: !!float 1

YAML contains various tags like !!int, !!set, !!map, etc. Libraries like PyYAML also add support for various python-specific tags (http://pyyaml.org/wiki/PyYAMLDocumentation#YAMLtagsandPythontypes).

For example.py containing:

import param

def fn1(a,b):
    return a*b

def fn2(a,b):
    return a+b

class Thing(param.Parameterized):
    a = param.Parameter(default=1)

class SomeThing(Thing):
    b = param.Number(default=0.5,bounds=(0,1))
    c = param.Tuple(default=(1,2))
    
class Config(param.Parameterized):
    param1 = param.Callable(default=fn1)
    param2 = param.ClassSelector(Thing,default=SomeThing())
    param3 = param.Parameter(Thing)

You can load this config.yml with PyYAML:

!!python/object:example.Config
param1: !!python/name:example.fn2
param2: !!python/object:example.SomeThing
  b: 0.75
  c: !!python/tuple [1, 2]
param3: !!python/name:example.Thing

You could also use YAML tag directives:

%TAG !o! tag:yaml.org,2002:python/object:
%TAG !n! tag:yaml.org,2002:python/name:
---
!o!example.Config
param1: !n!example.fn2
param2: !o!example.SomeThing
  b: 0.75
  c: !!python/tuple [1, 2]
param3: !n!example.Thing

>>> yaml.load(open('config.yaml'))
Config(name='Config', param1=<function fn2 at 0x1143dfd08>, param2=SomeThing(a=1, b=0.75, c=(1, 2), name='SomeThing'), param3=<class 'example.Thing'>)

Note: YAML also specifies that a single ! represents a local/application-specific tag.

Jul 04 '17 12:07 ceball

I'd like some feedback about how e.g. a yaml config file might look.

Short version

If you have somemod.py containing:

import param
import numpy

def fn(a,b):
    return a*b

class Thing(param.Parameterized):
    thing_param = param.Parameter(default=1)

class SomeThing(Thing):
    something_number = param.Number(default=2)
    something_array = param.Array(default=numpy.array([2,2,2]))
    something_list = param.List(default=[2,2])
    something_bool = param.Boolean(default=False)
    something_string = param.Boolean(default="two")
    
class MoreThing(Thing):
    morething_list = param.List(default=[3,3,3])
    morething_param = param.Parameter(default=(3,3))
    morething_tupl = param.Tuple(default=(3,3,3,3))
    morething_call = param.Callable(default=fn)
    morething_array = param.Array(default=numpy.array([3,]))
    def __init__(self,a,**params):
        self.a = a
        super(Thing,self).__init__(**params)

class Config(param.Parameterized):
    display_name = param.String()
    conf_list = param.List()
    conf_objsel = param.ObjectSelector(default=SomeThing())
    conf_clssel = param.ClassSelector(Thing,default=SomeThing())
    conf_param1 = param.Parameter(Thing)
    conf_param2 = param.Parameter(MoreThing(a=4))
    conf_none = param.Parameter(None)
    conf_num_none = param.Number(default=None)

How does the following yaml example look for creating some instances, specifying some parameters? Crazy?

!somemod.Config
conf_param2: !somemod.MoreThing
  args*: [40, 40]
  morething_tupl: !tuple [30, 30, 30, 30]
  morething_array: !numpy.array [[30, 30],[30,30]]
conf_num_none: 40.0
conf_clssel: !somemod.MoreThing
  args*: [41, 41]
  morething_call: !somemod.fn
  morething_tupl: !tuple [31, 31, 31, 31]
  morething_array: !numpy.array [31,31,31]
conf_param1: !somemod.SomeThing
  something_number: 20
  something_list: [20, 20, 20]
  something_bool: true
  something_string: twenty

Long version

As I noted in a previous comment you can already yaml.load() the following:

!!python/object:somemod.Config
conf_param2: !!python/object/apply:somemod.MoreThing
  args: [40, 40]
  kwds:
    morething_tupl: !!python/tuple [30, 30, 30, 30]
    morething_array: !!python/object/apply:numpy.array
      args: [[[30, 30],[30,30]]]
conf_num_none: 40.0
conf_clssel: !!python/object/apply:somemod.MoreThing
  args: [41, 41]
  kwds:
    morething_call: !!python/name:somemod.fn
    morething_tupl: !!python/tuple [31, 31, 31, 31]
    morething_array: !!python/object/apply:numpy.array [[31,31,31]]
conf_param1: !!python/object:somemod.SomeThing
  something_number: 20
  something_list: [20, 20, 20]
  something_bool: true
  something_string: twenty

(You could also shorten the tags using the TAG directive.)

Above I'm using some built-in pyyaml tags (http://pyyaml.org/wiki/PyYAMLDocumentation#YAMLtagsandPythontypes). To make an instance with kwargs only:

!!python/object:module.Class { attribute: value, ... }

To make an instance with args and kwargs:

!!python/object/apply:module.function
args: [argument, ...]
kwds: {key: value, ...}

(or !!python/object/apply:module.function [argument, ...] if no kwargs).

I think we could shorten the above by registering a custom "constructor" as a fallback for unhandled tags (i.e. all single-!/"local" tags). The shortened yaml (shown earlier) could look something like this:

!somemod.Config
conf_param2: !somemod.MoreThing
  args*: [40, 40]
  morething_tupl: !tuple [30, 30, 30, 30]
  morething_array: !numpy.array [[30, 30],[30,30]]
conf_num_none: 40.0
conf_clssel: !somemod.MoreThing
  args*: [41, 41]
  morething_call: !somemod.fn
  morething_tupl: !tuple [31, 31, 31, 31]
  morething_array: !numpy.array [31,31,31]
conf_param1: !somemod.SomeThing
  something_number: 20
  something_list: [20, 20, 20]
  something_bool: true
  something_string: twenty

That is, for "local" (unhandled-by-pyyaml) tags:

a "scalar node" represents a name (e.g. !module.Class for module.Class)
a "sequence node" represents a name that should be instantiated with args (e.g. !tuple [] for tuple())
a "mapping node" represents a name that should be instantiated with args and/or kwargs (e.g. !param.Parameterized {} for param.Parameterized()

The above wouldn't stop someone from using 'standard' pyyaml tags as well or instead, if they prefer.

Note that args* can't be a python kwarg, which allows it to be used in a mapping node without conflicting with any possible existing key. Alternatively, we could instead allow only kwargs to be specified (i.e. params) for single-! tags, which would make things simpler - but quite a lot of useful things couldn't then be used as easily.

Jul 23 '17 02:07 ceball

Even the long version looks good to me, but the short version seems very readable and suitable for humans writing a config file. As long as the custom constructor can be written in a very general way that we wouldn't have to revisit, it seems worth it to me!

Jul 23 '17 09:07 jbednar

Hello. Can I weigh in a little bit?

I think that serialization is a bit of overkill for these file types. Encoding things like bounds, choices, etc. becomes excessively verbose. When I use these sorts of configuration files, I just want a way to store and retrieve parameter values with some trivial conversion from a string. I probably know what Parameterized class these key-value pairs belong to. For example, I might make a class like:

class WindowParams(param.Parameterized):
    window_colour = param.ObjectSelector('blue', objects=['blue', 'red', 'yellow'])
    window_width = param.Integer(1024, bounds=(1, None))
    # ...

Then configure it with a file like

left_window:
  window_colour: red

right_window:
  window_colour: blue

And I just want to find a way to load the specified options into a couple of WindowParams instances. I think this is mostly consistent with the use case proposed in #153.

The great thing about this strategy is that it's very easy to implement over a lot of config-like file types, including YAML, JSON, INI (ConfigParser), and even command-line arguments. All that's needed are standard ways to read values from and write values to strings. Then you can have some functions like

def read_params_from_yaml(yml, parameterized, section=None):
     # ...

def write_params_to_yaml(yml, parameterized, section=None):
     # ...

Note that by passing the config object as an argument, there's no need to include any external dependencies.

If this sounds useful, I'd be happy to write it up as a PR, as I've written much of this up already for my own internal use. I'd need to know where to put stuff, though.

Thanks for your time, Sean

Aug 31 '18 19:08 sdrobert

Thanks for the perspective and for nudging us to move this forward!

Encoding things like bounds, choices, etc. becomes excessively verbose.

I'm not sure what you mean here. The proposed serialization is about the state of a Parameterized class, not the code for the Parameterized class, and so it already wouldn't normally include the bounds, available choices other than the current value, etc.

I just want a way to store and retrieve parameter values with some trivial conversion from a string. I probably know what Parameterized class these key-value pairs belong to.

I think in many cases exporting and importing the values of a single, specified Parameterized class would be useful, but in general a Parameterized class can have parameter values that themselves are Parameterized objects, which requires a recursive and hierarchical approach. We definitely need this hierarchy for our own use cases, which typically involve nested objects, and then supporting only a single level should be easy to do by accepting a flag that limits the recursion depth and omits the class name information.

Sep 19 '18 14:09 jbednar

so it already wouldn't normally include the bounds, available choices other than the current value, etc.

True, I misspoke. What I meant was, in the above examples, because of the freedom allocated to the nested Parameterized instances, the YAML has been configured to figure out how to initialize a Parameterized instance, including what type to initialize, what arguments should be passed to it, and some default settings. My argument is that it's more natural to consider the config file as what to fill slots with, and let the parser determine how to fill those slots.

I think in many cases exporting and importing the values of a single, specified Parameterized class would be useful, but in general a Parameterized class can have parameter values that themselves are Parameterized objects, which requires a recursive and hierarchical approach.

More complicated, hierarchical structures are entirely compatible with this approach, since JSON and YAML are themselves hierarchical. You just need a way to customize the default slot-filling method when certain keys are reached. Here's an alternative way of configuring @ceball's above Parameterized instances:

# configuration for a Config instance
display_name : foo
conf_list :
   - a
   - b
   - c
conf_num_none: None
conf_objsel:
    type: MoreThing
    morething_array: [3,]
# no need to configure everything

Where you encode the custom recursive behaviour into the key "type", informing the user of the possible values of "type". Or maybe if you have an ObjectSelector with Parameterized objects as choices, you use a special key "name" to choose the correct type. The important point is that how to choose to fill the slots is up to the application. I think that, in a lot of cases, there's an obvious intuitive way to parse things that makes reading the YAML much easier.

Best, Sean

Sep 19 '18 16:09 sdrobert