power-grid-model icon indicating copy to clipboard operation
power-grid-model copied to clipboard

[FEATURE] Cleanup import statements

Open mgovers opened this issue 1 year ago • 12 comments

The module graph for the Python wrapper is fairly convoluted (see https://power-grid-model.readthedocs.io/en/stable/advanced_documentation/python-wrapper-design.html). This is a direct consequence of the fact that many

Background

Definitions

We make the following distinctions:

  • modules v.s. symbols:
    • modules: .py files (or, more correctly, anything that can be imported from)
    • symbols: anything that exists within a module: classes, functions, variables, imported modules, etc. within a file (that means, anything that can marked import from a module)
  • 'public' v.s. 'protected' v.s. 'private':
    • public: any module or symbol that is publicly available (appears in API)
    • protected: anything that is meant for internal usage, and that users should not directly use (prefixed _)
    • private: anything that is restricted to a module or class that outside users should not even have knowledge of (prefixed __; double underscore)

In the PGM, 'protected' and 'private' are used interchangably, as we don't really have actual symbols marked strictly private with double underscore.

Conventions on definitions

In addition, we have the following conventions:

  • 'public' functionality is considered stable and changes in name or behavior are considered breaking.
    • we intend to not do any breaking changes without a major version bump
      • therefore any such change within the same major version is either a bug or a bug fix
      • we will always communicate such changes
  • 'protected' or 'private' functionality is considered unstable and the behavior and names may change over time.
    • Using any such functionality is considered danger mode and the user is responsible for migrating and handling bugs that arise from it
  • any public symbol that lives in a private module (including public submodules of a private module) are considered private from the level of the private module

Note: in Python, nothing is truly private and anything can technically be publicly available if you really want it. However, there is an intentionality argument that says that we can treat protected and private symbols as such. Breaking with the intentionality of the functionality is danger mode.

Conventions on changes

  • Keep existing public symbols public, always; it's a one-way path
  • Keep private symbols private
  • Make new imports private as much as possible (rename if needed)
  • If you create a new module, either place it in the private _core module and/or make it private at least until it's stable
  • If you create a new symbol within a module, make it private at least until it's stable
  • If you have to make new public functionality, make sure to double-check; it's a one-way path
    • Make sure that the docstring is up-to-date
    • Make sure that the symbol is present in the API reference in the docs

Conventions in code

To make sure we maintain the intentionality across the code, we do the following tricks in the code base:

# safe importing modules
import module_a   # exposes publicly; only do this you know what you're doing
import _module_b  # reuse internally

# safe importing modules under a different name
import _module_b as _mod_b    # reuse internally
import module_a as _mod_a     # reuse internally under a new name

# unsafe importing modules; DON'T DO THIS, EVER!!!
import module_a as mod_a      # exposes as a new public symbol; weird
import _module_b as module_b  # exposes as a new public symbol; very dangerous

# unsafe reusing. BEWARE!!! this is a one-way path
from module_a import func_c    # exposes a public symbol from another module; usually bad practice
                               # because users may import from multiple files
from _module_b import func_d   # exposes a function from a private module publicly;
                               # useful when you want to prevent breaking changes now and in the future;
                               # only do this if if you know what you're doing

# unsafe reusing with renaming. BEWARE!!! this is a one-way path
from module_a import func_c as func_e   # expose a public symbol from another module under a different name;
                                        # why ever do this???
from _module_b import func_d as func_f  # expose a public symbol from a private module;
                                        # useful if the name in the private module does not suit this module

# unsafe exposing private functionality; DON'T DO THIS, EVER!!!
from module_a import _func_g as func_g   # expose a private symbol from another module
from _module_b import _func_h as func_h  # expose a public symbol from anot

# safe reusing and good practice; rename new private symbols as you see fit
from module_a import func_c as _func_g   # reuse a public symbol from another module;
from _module_b import func_d as _func_h  # reuse a public symbol from another private module;

# safe reusing but usually bad practice (unless in tests), but not harmful; change as you see fit
from module_a import _func_e              # reuse a private symbol from another module;
from _module_b import _func_f             # reuse a private symbol from another private module;
from module_a import _func_e as _func_g   # reuse a private symbol from another module;
from _module_b import _func_f as _func_h  # reuse a private symbol from another private module;

TODO

Try to find a way to make the module graph less convoluted by changing expositions. Make sure to follow the above conventions.

Feel free to create new private modules and private functions where necessary.

Relates to #869 .

mgovers avatar Jan 14 '25 10:01 mgovers

Hi @mgovers, This one seems to be still unassigned, so I could think of picking it up, unless you need it very quickly (when would you need it?), or you have other plans for this.

I also have a couple of technical questions:

  • there are cases of public symbols whose locations are not documented in the API. I guess that even in that case they should stay public at any cost, right? For example, ValidationException is not listed here https://power-grid-model.readthedocs.io/en/stable/api_reference/python-api-reference.html as part of the validation module, even though the code python -c 'from power_grid_model.validation import ValidationException' does not error out.
  • similarly, there are modules that import lots of symbols as public, but those symbols are importable by the user in other ways, too. Even in that case those symbols should remain public and importable from all the locations where they currently use to, right? As an example, consider that this code works python -c 'from power_grid_model.validation.assertions import CalculationType', but a user could also import it as python -c 'from power_grid_model.enum import CalculationType'.

Anyways, I don't think that changing the expositions of the symbols in the examples above from public to private would directly simplify the package diagram. The reason why I ask is just to have a basis to start some more deep reasoning. Thanks

emabre avatar Feb 06 '25 21:02 emabre

Hi @emabre ,

Thank you for contacting us again! This definitely is still available. There's no time pressure on this, so feel free to pick this up. Can you assign yourself to this issue?

On the technical questions:

  • there are cases of public symbols whose locations are not documented in the API. I guess that even in that case they should stay public at any cost, right? For example, ValidationException is not listed here https://power-grid-model.readthedocs.io/en/stable/api_reference/python-api-reference.html as part of the validation module, even though the code python -c 'from power_grid_model.validation import ValidationException' does not error out.

Oh that's not good. They should indeed be listed in the API reference and should definitely remain public. Can you please try to make an exhaustive list of things that are publicly available that are missing from documentation? We can then go over it on a case-by-case basis. You can do this in a separate PR to isolate the changes. Good find!

  • similarly, there are modules that import lots of symbols as public, but those symbols are importable by the user in other ways, too. Even in that case those symbols should remain public and importable from all the locations where they currently use to, right? As an example, consider that this code works python -c 'from power_grid_model.validation.assertions import CalculationType', but a user could also import it as python -c 'from power_grid_model.enum import CalculationType'.

In short, I believe that only things that are either input types or return types to public functions should remain public. The rest can become private. I think we can do another PR like #787 if needed (see also this announcement).

Anyways, I don't think that changing the expositions of the symbols in the examples above from public to private would directly simplify the package diagram. The reason why I ask is just to have a basis to start some more deep reasoning.

Changing exposition in itself doesn't necessarily improve the dependency graph, but it is an enabler for it. There are 2 ways you can go about this:

  • first change the exposition and then clean up
  • or the other way around: first clean the graph up and use temporary exposition imports for backwards compatibility, and then at a later stage fix the backwards compatibility

Cleaning the graph would e.g. amount to moving functions and class declarations to other modules and create another module that acts as an exposition. The import statement can then change to the exposition module. E.g. the following change can happen naturally (where D is e.g. the _core module and D/E is e.g. _core/data_handling):

A -- B --- C                   A -- B ---- C
| \   \   /  \                 | \   \      \
|  \-- D/E   |       ===>      |  \-- D/E -- D/C2
|          \ |                 |        \     \
 \-----------D/F               \-------------- D/F

Most of all, we just want to get rid of the links back from _core to the main power_grid_model module (D/E -> C in the above). Dependencies should go one way, not bidirectionally. I think if you can achieve that, then that would already be a great feat in itself. After that, we can always re-evaluate and decide if we want to go even further.

Hope this helps.

Martijn

mgovers avatar Feb 07 '25 07:02 mgovers

Hi @mgovers , thanks a lot for your explanation. Unfortunately, it seems that I lack the necessary permissions to assign myself to this issue, as I see no clickable button near the assignee's tab on the right. Thanks again, Ema

emabre avatar Feb 07 '25 22:02 emabre

Hi @mgovers , thanks a lot for your explanation. Unfortunately, it seems that I lack the necessary permissions to assign myself to this issue, as I see no clickable button near the assignee's tab on the right. Thanks again, Ema

Apologies, I didn't realize that that requires additional permissions. Done. Enjoy!

mgovers avatar Feb 07 '25 23:02 mgovers

Hi @mgovers here is a complete list of the symbols that are publicly exposed but are not documented in the API documentation page. I'm aware that:

  • In validation.utils only errors_to_string() is intended for end users, according to the comment in the file
  • In validation.rules there is no comment specifying that some components are private, but there are many publicly defined imports, that are not documented, maybe that's desired
  • In validation.validation it's written "Although all functions are 'public', you probably only need validate_input_data() and validate_batch_data().".
  • generically, some special conventions may apply to the symbols that belong to the validation module (I've read the announcement.

However, I did not remove from the list those symbols belonging or exposed by the modules inside the validation subfolder, as it's easy to filter them out from the attached list here below.

symbols_to_doc.txt

I generated the list right now, so it is currently up to date.

I'm proceeding with the cleanup of the import statements, I hope I will be able to give some more updates soon.

emabre avatar Feb 28 '25 21:02 emabre

Hi @emabre , thank you for your investigation! I am out of office until Thursday, so maybe @figueroa1395 can help you with this. Otherwise, I'll get back to you on Thursday.

mgovers avatar Mar 03 '25 10:03 mgovers

Hi @emabre, since @mgovers is out of office, I will give you a hand.

  • In validation.utils only errors_to_string() is intended for end users, according to the comment in the file

This is true.

  • In validation.rules there is no comment specifying that some components are private, but there are many publicly defined imports, that are not documented, maybe that's desired

I don't think this is intended to be public, hence making the module private via renaming to validation._rules is probably a good solution. However, before doing anything, let me double check with the team and I will get back to you once I am certain of the best way to continue.

  • In validation.validation it's written "Although all functions are 'public', you probably only need validate_input_data() and validate_batch_data().".

This is true as well, however, I will bring up the topic of the other public functions available in this module to the team, as things may have changed when we introduced columnar data support in 1.10. I will come back to you as soon as possible.

symbols_to_doc.txt

I generated the list right now, so it is currently up to date.

I checked out the list and I want to double check with the team about validation.validation, validation.rules and validation.errors. The rest that you mention in the list should be properly exposed. In any case, removing them from your list, as you stated already should be easy. I will come back to you as soon as I have a clear answer.

For the rest, thanks for your effort and feel free to let me know if I can be of assistance at any point.

Regards, Santiago.

figueroa1395 avatar Mar 03 '25 18:03 figueroa1395

Hi @emabre,

After discussion with the team, these are the conclusions:

  • Let's make validation.validation a private module (validation._validation) , and instead expose assert_valid_data_structure in the __init__.py. Hence, assert_valid_data_structure should now be a part of the API documentation.
  • Let's keep validation.errors an all-public module; in addition, for backwards-compatibility reasons, let's keep the base class (ValidationError) also in validation.__init__.py. Hence, all the functions within the validation.errors module should now be in the API documentation.
  • Let's make validation.rules a private module (validation._rules). It should NOT be exposed via the __init.py__ and it should NOT appear it the documentation.
  • Let´s add validation.assertions.ValidationException to the docs.
  • Since validation.utils.ComponentTypeLike is imported from the PGM core, it should be a private import here. Definitely don't add to the docs.
  • Let's keep utils.import_input_data and utils.import_update_data as they are, public in the utils module, but should not appear in the docs as they are deprecated.

If we missed anything or you need something else, don't hesitate to reach out. Hopefully this was helpful.

Santiago.

figueroa1395 avatar Mar 06 '25 12:03 figueroa1395

Thank you very much for the detailed explanation @figueroa1395 . I'll follow these prescriptions for this issue, and reach out in case anything else unclear shows up.

emabre avatar Mar 07 '25 06:03 emabre

Hi, here is an update on the status of this issue, with some questions.

Inside a branch in my own fork I have moved all those symbols that are imported by any _core.* module and were defined in the modules under power_grid_model (e.g. power_grid_model.typing) to separate private files (e.g. power_grid_model._typing). Then, I imported those symbols back into the corresponding original public modules (e.g. typing) to preserve user exposition, as @mgovers suggested. Unfortunately, the package diagram that I get after this change does not seem to be better ordered than the one I used to get earlier. However, I guess that improvements in structure do not necessarily imply improvements in the esthetics of the plot. The way the diagram looks with this reorganization is:

Image

(Consider that some arrows, like those between enum and _enum, look weird because pyreverse is unable to distinguish between homonymous standard and non-standard library modules).

For a fair comparison, I attached a colorized version of the original diagram, too.

Image

Another possible structural change that I am considering would be to edit the way modules import from power_grid_module, to have them import directly from the modules where the symbols are actually defined (this would turn power_grid_module into a terminal node). For instance, is there a reason why, power_grid_module.utils imports CalculationMethod from power_grid_module rather than from power_grid_model.enum?

In addition, I could work on improving the way the diagram looks by acting on the layout options. See for instance this other layout.

Image

However, I guess that you are more interested in structure, rather than in appearance.

In short my questions are:

  • Is the update to import structure that I've done so far satisfactory? Or do you suggest other improvements?
  • In case you are satisfied, do you prefer to have the change pushed as early as possible? Or do you prefer me to test some other changes in the layout of the diagram?

In case anybody wants to see in detail the way I reorganized imports, my branch is visible here https://github.com/emabre/power-grid-model/tree/issue-870-2. Please consider that I did not run the pytest tests yet as I had problems with building the package locally (C++ issues, not Python related), but I don't think the imports would change much after I fix any leftover bugs (in case there are).

Thanks, Ema

emabre avatar Apr 06 '25 20:04 emabre

Hi @emabre ,

Great results on the investigation.

  • Is the update to import structure that I've done so far satisfactory? Or do you suggest other improvements?

Although it may not be directly apparent to you, significant progress has been made. See, for instance, how power_grid_model.errors is now much higher in the ranking. Overall, the light-blue modules are more to the lower half, and the dark-blue modules more to the upper half of the graph. I think the main reason that the difference is not extremely obvious is the following:

  • power_grid_model.errors imports from power_grid_model._errors
  • power_grid_model._core.* imports from power_grid_model._errors
  • Therefore, there is a dark-blue node (power_grid_model._errors) below a light-blue node (power_grid_model._core.*) below a dark-blue node (power_grid_model.errors), causing a mixture of the graph, and therefore less obvious things.
  • Instead, if you move power_grid_model._errors to power_grid_model._core.errors, it becomes a light-blue node. That resolves another bidirectional dependency between light-blue nodes and dark-blue nodes.
    • Since you already have the exposition-only module power_grid_model.errors, there is no user impact.
  • If you do that for all power_grid_model._* modules that are imported by some power_grid_model._core.* module, then light-blue nodes only depend on other light-blue nodes, while dark-blue nodes can depend on both light-blue and dark-blue ones.

Like you mentioned, I think structure is more important than appearance. The coloring of the nodes is probably the single most clear way to see the progress.

  • In case you are satisfied, do you prefer to have the change pushed as early as possible? Or do you prefer me to test some other changes in the layout of the diagram?

We follow a continuous improvement/live-at-head release cycle. The longer a branch lives, the more likely merge conflicts are, and the more annoying they can become to fix. Since the improvement you have done is already extremely valuable, I propose to merge when it's "good enough" (that is: light-blue nodes only depend on other light-blue nodes). If you are really excited about doing more improvements, feel free to do so afterwards, of course, but definitely don't feel too pressured by it. You have already improved the quality of the power grid model by a significant amount.

Hope that helps. Thank you and take care!

Martijn

mgovers avatar Apr 07 '25 07:04 mgovers

Hi @mgovers Thank you very much for your analysis and suggestions! Based on that, I will fix the remaining things so that the code is good enough and prepare a PR as soon as possible.

emabre avatar Apr 08 '25 06:04 emabre