PSyclone icon indicating copy to clipboard operation
PSyclone copied to clipboard

Add precision to Literals in LFRic API

Open TeranIvy opened this issue 5 years ago • 18 comments

Currently it is possible to specify datatype for the Literal class in PSyIR but not precision (kind). This has implications for adding default precisions for datatypes in LFRic API (issue #660).

TeranIvy avatar Feb 12 '20 13:02 TeranIvy

I started looking into this and I would appreciate some input from @rupertford, @sergisiso and @arporter. The problem is outlined below.

Precision is currently set, checked and handled within the DataSymbol class (psyir/symbols), so in order to supply the Literal class (psyir/nodes) with the precision I would need to import the DataSymbol class. Both Literal and DataSymbol classes are initialised with datatype, handled by the separate DataType class. Pulling in entire DataSymbol into Literal just to extract precision seems to me like creating overhead of pulling in DataType redundantly. Since datatype property is handled by a separate class, would it not be better to treat the concept of precision in this way as well and import it on a need-to-use basis, especially as it is an optional argument in code generation?

I see that @rupertford is making all PSyIR nodes have a datatype in #685 where a new class DataNode(Node) is created to capture the properties of the returned data via a DataSymbol object and Literal is a child of DataNode. It seems to me that this would be another reason to ship off precision in a separate property class so that DataSymbol would not need to be used once for introducing precision to Literal and then another time for storing data properties.

Any thoughts on this?

TeranIvy avatar Feb 13 '20 13:02 TeranIvy

Yers, bringing in datatype is too much and we could go with a separate precision class. However, the properties common to both datasymbol and literal are datatype, shape and precision, so perhaps they should be placed together in a class e.g. one called DataProperties that both datasymbol and literal use? At the moment literal implies scalar so shape should be empty but at some point I expect we will want to define a set of constant values (LiteralArray?). Just a suggestion for discussion.

rupertford avatar Feb 13 '20 14:02 rupertford

After a little more thought ... I think we should have different types of datatype. I would support Scalar and Array to start off with. Scalar has a Type and a Precision. Array has a Shape and a Scalar. In the future Structure which has a list of Scalar, Array or Structure could easily be added. These could have support for comparison a.same_as(b).

These datatype classes could then be used with datasymbol and literal.

However, this would be a separate issue.

rupertford avatar Feb 13 '20 15:02 rupertford

Yes, I think this is more complicated than we first thought. Although DataSymbol does contain a Precision class, this is only an enumeration and only part of the story. We support precision specified as a number of bytes, or as a Symbol or as one of the values of the precision enumeration. All of this needs to be taken out of DataSymbol and become part of e.g. a DataProperties class. Definitely a new issue.

arporter avatar Feb 13 '20 15:02 arporter

Precision is optional for scalars and arrays, whereas shape is mandatory for arrays so I would suggest creating class hierarchy on the basis of what is mandatory for a class. For instance, Scalar needs a DataType and Array needs DataType and Shape. Both classes can then optionally specify precision and initial value (which would be another fixed-value scalar, I guess).

Regarding proceeding with this issue, what shoud I then do - just import the entire DataSymbol to use precision in Literal and create a separate issue to indicate that this will be refactored? Or can I separate Precision into a class now and indicate that it will become property of a new class (e.g. DataProperties) in a separate issue?

P.S. Having had another look into DataSymbol, I think just using precision from DataSymbol and creating a new issue to indicate restructuring would be better for now - as @arporter said, taking out Precision needs to be handled carefully!

TeranIvy avatar Feb 13 '20 15:02 TeranIvy

On yet another look, I think pulling in DataSymbol to just use Precision will be clunky as Literal is a child of a general class Node. I guess this led to motivation to introduce DataNode in #685?

datatype, shape, precision and value are all data properties, where really only datatype is mandatory for all (even C++11/14 requires auto keyword when the type of a variable is deduced from its initialisation).

I think we can either go down the route of having separate classes for each property (e.g. in a separate psyir/properties directory) or put them all in a DataProperties class (with only datatype mandatory). It would then make sense to differentiate between ordinary Nodes and DataNodes, where the latter would inherit from both Node and DataProperties classes (I think we have some Transformations that inherit from two classes already).

TeranIvy avatar Feb 13 '20 16:02 TeranIvy

I think precision is also mandatory, even if it is "default"/"compiler-defined" - everything has a precision. Also, wouldn't a DataSymbol have DataProperties rather than be one? (I don't know what @rupertford was planning to do with DataNode!)

arporter avatar Feb 13 '20 20:02 arporter

Also, wouldn't a DataSymbol have DataProperties rather than be one?

Yes, this would make more sense. A DataNode could inherit from Node and have DataProperties. Or perhaps DataNode would not be needed if the properties are contained in a class that other classes can have?

TeranIvy avatar Feb 14 '20 09:02 TeranIvy

I will give my take on this:

Or perhaps DataNode would not be needed

We thought DataNode as a higher abstraction virtual PSyIR node (not a concrete node) to encapsulate everything that evaluates to a value. e.g. Loop(start=X, end=Y, step=Z, schedule), X,Y,Z can only be DataNodes . But this can have the form of reference, literal, operation, ... We still need this concept.

Regarding the data properties, I think DataNode can define some abstract methods to access the properties in a uniform way as all of them should be able to infer a type. But this properties should be in the specific object DataSymbol and Literal (and function in the future?).

I think precision should be mandatory for basic types (with an implicit or default precision) and I like the idea of composable types with structs and arrays. I also like the equivalence operator among them.

I am not sure if the basic types should have a datatype and precision or be the type themself. e.g.

DataType -> Struct | Array | BasicType
where:
Struct is a list of multiple DataType
Array has a DataType and a shape
BasicType just has a precision and we define INTEGER(precision), REAL(precision), ... subclasses

sergisiso avatar Feb 14 '20 10:02 sergisiso

To be clear by

... these properties should be in the specific object DataSymbol and Literal ...

I mean DataSymbol and Literal should have a self._type/datatype pointing to a DataType (or DataProperties but I don't like this name :)) class.

sergisiso avatar Feb 14 '20 10:02 sergisiso

Now that #721 is done, can this be closed?

arporter avatar Apr 30 '20 19:04 arporter

I think it is one for @TeranIvy to confirm. Certainly Literals now have precision but perhaps there are additional requirements for this issue, or other issues that need to be captured if this is closed?

rupertford avatar May 01 '20 09:05 rupertford

I repurposed the issue for adding precision to the use of Literals in LFRic API where appropriate. I do not think there will be additional requirements but we will see!

TeranIvy avatar May 01 '20 16:05 TeranIvy

Just come across this one again. Can it be closed now?

arporter avatar Mar 11 '21 12:03 arporter

I've hit a related issue in #1237 (PR #1239) where the reviewer pointed out that it would be good to specify the precision of loop variables. I therefore propose to use this Issue to add the necessary symbols to the Container symbol table (for the PSy layer). These will be:

  1. A ContainerSymbol for "constants_mod";
  2. A DataSymbol of DeferredType() for each kind parameter required (e.g. api_config.default_kind["integer"])

We could also define or have a utility that creates the appropriate DataType for each default kind. These will need to be available wherever we create PSy-layer code.

arporter avatar May 26 '21 08:05 arporter

Currently the names of LFRic infrastructure modules (e.g. "constants_mod") are hardwired in the code. These should be read from the config file.

arporter avatar May 26 '21 09:05 arporter

These names are now read from configuration but we do not yet create PSyIR symbols from them. I think this needs to be done in the DynInvokeSchedule constructor.

arporter avatar Dec 14 '21 21:12 arporter

I am assigning this to me for now to avoid anyone else working on it. While creating the PSyIR for a kernel call (for extract driver creation long term), I am adding proper precision types to any symbol I create. Hopefully, once I am done with kern_call_arg_list, we can use this class to first create all required symbols properly (including precision), and then we can remove any code that declares a symbol (without precision) in dynamo30p3.py

hiker avatar Sep 28 '22 04:09 hiker