trio icon indicating copy to clipboard operation
trio copied to clipboard

Use less magic in constructing our public API exports, to make trio more intelligible to static analysis

Open ksamuel opened this issue 7 years ago • 25 comments

Despite the well defined __all__, several important tools (e.g: pylint, mypy, jedi) cannot understand the code structure of trio. They then report error in the user code.

E.G:

On the following snippet, pylint will report trio not having the run() member, jedi won't allow code completion, vscode intellisense won't let you go to definition and mypy won't report any error if I try to add +1 after run()

import trio
...
trio.run(parent)

I understand this is inconvenient, as those tools should be able to pick up the __all__ declaration and use that instead of requiring your to type it all again, but they don't. Since they are the most popular in the Python ecosystem, and are used by a lot of editors (vi, sublime text, vscode, etc) but also in tasks (tox, travis, git hooks, etc), it's important that they work.

For now, that means putting # noqa a bit everywhere or disabling a lot of warning, which is not practical.

ksamuel avatar May 30 '18 22:05 ksamuel

FYI, there has been some work on this in the past. Maybe there could be a program to read import modules, introspect and spit out a new set of sources that are more static?

python make-nice-for.py --tool=pylint --output-path=new-trio-sources/

parity3 avatar Jun 02 '18 15:06 parity3

Yeah, we should probably give up on most of this cleverness and switch to doing the more boring approach.

There was some discussion of this in the chat today also, starting about here: https://gitter.im/python-trio/general?at=5b1842e099fa7f4c064e93a0

This is probably best handled as an incremental set of cleanups, since there are a lot of different subtle bits of trickery here. "Does this cleanup help pylint/mypy/pycharm/jedi understand what's going on" is probably a good metric to use to decide where to start...

Also relevant: #475, which I think is a change we probably should do, and will reduce the number of auto-generated wrapper functions in trio/_core/_run.py.

njsmith avatar Jun 06 '18 21:06 njsmith

This comment describes how to write a strong test for pylint being able to understand our public API: https://github.com/python-trio/trio/pull/594#issuecomment-415336167

njsmith avatar Aug 23 '18 08:08 njsmith

Closed by #594 :tada:

Zac-HD avatar Sep 08 '18 09:09 Zac-HD

I like your enthusiasm, but I'm afraid we're not done yet :-). #594 made the main trio namespace work with tab completion, but we still need to figure out what to do with the submodules, and think about whether we can get rid of any of the other runtime namespace manipulation that we do...

njsmith avatar Sep 08 '18 09:09 njsmith

Haha, I actually just went and updated my "add type hints" branch (https://github.com/python-trio/trio/compare/master...Zac-HD:type-hints)... it's still mostly about ignoring analysis failures, so I was definitely too enthusiastic :rofl:

Zac-HD avatar Sep 09 '18 01:09 Zac-HD

So if I understand the status correctly: the main trio, trio.testing, trio._core namespaces are now amenable to static analysis. Meaning: common tools can look at them and (a) generate a list of exported symbols, for tab completion, and (b) figure out where those symbols came from, so deeper analysis is possible if the place they're coming from is itself amenable to static analysis.

There are a few more public namespaces to look at:

  • trio.hazmat – has some hacks to make the list of symbols visible, but there's no way this is intelligible to tools that need to do deeper analysis like mypy
  • trio.abc, trio.ssl – probably need the same treatment as trio/__init__.py
  • trio.socket – this is messy right now. Probably we should replace most of the dynamic __all__ generation inside trio/_socket.py with a static list import list in trio/socket.py, similar to what we did for trio/__init__.py. The special case is constants like AF_INET, which more-or-less have to be dynamic. Maybe: move the dynamic re-export loop out of trio/_socket.py and into trio/socket.py; inside trio/_socket.py refer to these constants as _stdlib_socket.AF_INET etc. so mypy knows what's going on within that file anyway; and in trio/socket.py statically list some of the key constants like AF_INET, AF_INET6, etc.?
  • trio/_core/*.py – the big issue is the dynamic code generation in trio/_core/_run.py. (Are there any other issues?) Implementing #475 would reduce the number of dynamically-generated functions, but not eliminate them. Maybe we should switch to static code generation? For example, we could have a script we run occasionally to generate a trio/_core/_autogenerated.py file that we check into git. (We might also want to do this for the methods on trio.socket._SocketType, since the current metaprogramming approach is confusing to both humans and tools, and also adds overhead to methods that get called a lot in inner loops.)

Does that sounds like the current status?

njsmith avatar Sep 10 '18 20:09 njsmith

but there's no way

No way what?

Fuyukai avatar Sep 10 '18 22:09 Fuyukai

Yep, that's my understanding too.

trio.socket – this is messy right now. Probably we should replace most of the dynamic __all__ generation inside trio/_socket.py with a static list import list in trio/socket.py, similar to what we did for trio/__init__.py.

Sounds good to me. We might also need to make the namespace construction less magical, but we need static imports first anyway and can see what actually has to change after that.

The special case is constants like AF_INET, which more-or-less have to be dynamic. Maybe: move the dynamic re-export loop out of trio/_socket.py and into trio/socket.py; inside trio/_socket.py refer to these constants as _stdlib_socket.AF_INET etc. so mypy knows what's going on within that file anyway; and in trio/socket.py statically list some of the key constants like AF_INET, AF_INET6, etc.?

This also sounds good, though I'd be very cautious about statically listing dynamic constants - it might be the best way currently available, but we should leave some detailed comments about what's going on.

Maybe we should switch to static code generation?

Yes, but only if we have CI always check that there is no difference between the current output of the generator and the file tracked by git. This catches both stale files when the generator has been updated, and contributors trying to update the files by hand - both recipes for bugs down the line!

Zac-HD avatar Sep 10 '18 23:09 Zac-HD

@fuyukai good catch, thanks, I edited the post to add the missing words

njsmith avatar Sep 11 '18 02:09 njsmith

Just dropping a quick update on my type-hints branch: moving in the right direction, but plenty left to do.
Note that Mypy is only part of the scope of this issue!

Current Mypy output
trio/_path.py:183: error: "Type[Path]" has no attribute "absolute"
trio/_path.py:187: error: "Type[_PathLike[Any]]" has no attribute "register"
trio/_highlevel_open_tcp_stream.py:4: error: Module 'trio.socket' has no attribute 'getaddrinfo'
trio/_highlevel_open_tcp_stream.py:4: error: Module 'trio.socket' has no attribute 'SOCK_STREAM'
trio/_highlevel_open_tcp_stream.py:4: error: Module 'trio.socket' has no attribute 'socket'
trio/_highlevel_open_unix_stream.py:3: error: Module 'trio.socket' has no attribute 'socket'
trio/_highlevel_open_unix_stream.py:3: error: Module 'trio.socket' has no attribute 'SOCK_STREAM'
trio/_highlevel_open_unix_stream.py:6: error: Module 'trio.socket' has no attribute 'AF_UNIX'
trio/__init__.py:18: error: Module 'trio._core' has no attribute 'current_time'
trio/__init__.py:98: error: Module has no attribute "__deprecated_attributes__"
trio/testing/__init__.py:1: error: Module 'trio._core' has no attribute 'wait_all_tasks_blocked'
trio/tests/test_util.py:90: error: Invalid base class
trio/tests/test_socket.py:359: error: Module has no attribute "AF_INET"
trio/tests/test_socket.py:360: error: Module has no attribute "AF_INET6"
trio/tests/test_path.py:183: error: "Type[Path]" has no attribute "joinpath"
trio/tests/test_highlevel_open_tcp_stream.py:6: error: Module 'trio.socket' has no attribute 'AF_INET'
trio/tests/test_highlevel_open_tcp_stream.py:6: error: Module 'trio.socket' has no attribute 'AF_INET6'
trio/tests/test_highlevel_open_tcp_stream.py:6: error: Module 'trio.socket' has no attribute 'SOCK_STREAM'
trio/tests/test_highlevel_open_tcp_stream.py:6: error: Module 'trio.socket' has no attribute 'IPPROTO_TCP'
trio/tests/test_highlevel_open_tcp_stream.py:101: error: Name 'trio.socket.SocketType' is not defined
trio/tests/test_highlevel_open_tcp_listeners.py:165: error: Name 'tsocket.SocketType' is not defined
trio/tests/module_with_deprecations.py:12: error: Module has no attribute "regular"
trio/_subprocess/unix_pipes.py:60: error: "memoryview" has no attribute "__enter__"; maybe "__iter__"?
trio/_subprocess/unix_pipes.py:60: error: "memoryview" has no attribute "__exit__"
trio/_subprocess/unix_pipes.py:78: error: Module has no attribute "wait_writable"
trio/_subprocess/unix_pipes.py:109: error: Module has no attribute "wait_readable"
trio/_subprocess/linux_waitpid.py:52: error: Module has no attribute "spawn_system_task"
trio/tests/test_highlevel_ssl_helpers.py:8: error: Module 'trio.socket' has no attribute 'AF_INET'
trio/tests/test_highlevel_ssl_helpers.py:8: error: Module 'trio.socket' has no attribute 'SOCK_STREAM'
trio/tests/test_highlevel_ssl_helpers.py:8: error: Module 'trio.socket' has no attribute 'IPPROTO_TCP'
  • the socket and _run dynamism discussed above are the biggest contributors by numbers
  • trip/_path.py also has some over-magical re-exporting and wrapping
  • I've fixed the memoryview issue upstream, so Mypy 6.30 will know it's a context manager

So once those things are sorted out, we can run Mypy in CI! Then:

  1. remove as many type: ignore comments as possible (forcing the Any type hides errors elsewhere)
  2. ship the py.typed file, so downstream code can be checked for consistency with Trio's type hints
  3. add non-trivial type hints to Trio; profit.

Zac-HD avatar Sep 11 '18 12:09 Zac-HD

Gah, misclick - sorry!

Zac-HD avatar Sep 11 '18 12:09 Zac-HD

@njsmith says: Staircase review: [#656] should have fixed tab completion for trio.testing and trio._core... it'd be nice to extend the jedi/pylint tests to make sure that stays true :-)

Extending the tests from #594 to cover the changes in #656 would be a nice, self-contained contribution.

Zac-HD avatar Sep 13 '18 06:09 Zac-HD

Opened #699 to discuss the sub-issue of how to handle socket module constants.

njsmith avatar Sep 29 '18 10:09 njsmith

Updated todo list:

  • Need to replace __all__ hacking by explicit import list:
    • [x] trio/abc.py (#751)
    • [x] trio/hazmat.py (#753)
    • [x] trio/ssl.py (#752)
  • Need to rework constant re-export to make it statically analyzable (see #699, and esp. https://github.com/python-trio/trio/issues/699#issuecomment-427544894):
    • [x] trio/socket.py (#730)
    • [x] trio/ssl.py (#752)
  • [x] Need to add some code generation machinery for @_public methods on Runner
  • [ ] ~~Need to replace the @_public methods on IOManager with a current_io_manager fetcher (see #475)~~

I think that's it for module-level exports. Then there are some classes that use dynamic shenanigans on the attribute level:

  • [ ] trio._ssl.SSLStream: uses __getattr__/__setattr__ to directly forward a bunch of attributes to self._ssl_object. Replace with... lots of manually-written forwarding methods and/or @propertys? Or use code generation to write them? Use a .pyi file?
  • [ ] trio._socket._SocketType: uses __getattr__ to forward several attributes to self._sock. Also uses more elaborate metaprogramming to generate most of the major methods (_make_simple_sock_method_wrapper, _nonblocking_helper...). These should probably be replaced by code generation for efficiency as well as analyzability – the abstraction layers show up in runtime profiling. The SocketType / _SocketType distinction will also be fun for mypy, and all the if hasattr(stdlib_socket.socket, ...). This will need a dedicated issue to discuss.
  • [ ] The async path/file wrappers are the most dynamic thing in trio, and going to be super annoying to deal with. Mypy plugin? (Our open is even worse than the builtin open, and mypy needs a plugin to handle the builtin open.) This will need a dedicated issue to discuss.

njsmith avatar Oct 07 '18 00:10 njsmith

Hi I think I at least nominally covered the list. So if someone would like to take a look they are #751 #752 and #753 The tests seem to go fine, but we have to keep track of the symbols twice, once in the try/except import and second on the dynamic import. But since they are in one module next to each other that might be acceptable. We might add a comment for this.

jmfrank63 avatar Oct 28 '18 10:10 jmfrank63

#805 has landed! I think this means all our top-level stuff is now statically understandable, and all that's left are a few classes with dynamic attributes.

Can any of our static analysis experts here give us an update on what you think the next step should be?

njsmith avatar Jun 23 '19 01:06 njsmith

@njsmith Just browsed again through the thread, but could not make out any task left. _abc.py seems to have been converted already as well as _ssl.py. Python is a great library, I would like to clean up if anything is left here. Otherwise I think I am turning to the docs and try to write some meaningful examples, looks like there are plenty of docs issues out there. I am not strong at this currently but I guess I'll grow taking on this. If you could point me to something that is either burning or considered a good issue I'd be glad.

jmfrank63 avatar Oct 27 '19 14:10 jmfrank63

@jmfrank63 I think the next steps on static analysis stuff are:

  • Getting mypy running on trio (and then incrementally adding type annotations). Current status is in #1254

  • Using static code generation to replace the dynamic magic in _socket.py

  • Using static code generation to replace the dynamic magic in trio.Path and trio.wrap_file

njsmith avatar Nov 01 '19 22:11 njsmith

Just leaving a note here because I was trying to figure out why the urwid tests wouldn't run. The upshot is an attribute error: AttributeError: module 'trio' has no attribute 'hazmat' breaks the tests with 0.18.0 but 0.17.0 works fine.

sarnold avatar Feb 07 '21 23:02 sarnold

@sarnold I don't think that's related to the static analysis that this thread is discussion -- it's just that trio.hazmat got renamed to trio.lowlevel back in v0.15, and in v0.18 we removed the backcompat support. v0.15/v0.16/v0.17 gave a deprecation warning everytime you accessed trio.hazmat, so I guess the urwid devs didn't notice the warning? Running tests with -Werror or similar can help with that.

njsmith avatar Feb 09 '21 04:02 njsmith

(And there's even a urwid PR: https://github.com/urwid/urwid/pull/439)

pquentin avatar Feb 09 '21 07:02 pquentin

The Trio loop was contributed to urwid by me; I guess that the urwid devs don't use Trio regularly. I have sent a PR months ago but it was not merged at the time of writing.

ntamas avatar Feb 09 '21 11:02 ntamas

I've just run into the distinction between SocketType and _SocketType - there seems to be no way to use it as a type annotation so that PyCharm is happy.

Is the distinction just to prevent people constructing SocketType()? Would something like this be acceptable:

class _SocketType(SocketType):
    def __init__(self, sock, _internal=False):
        if not _internal:
            raise TypeError(...)

Of course this makes it easier for someone to do what they're not meant to, but hopefully it's still clear that that's 'off label' usage and not to be relied on.

takluyver avatar Jun 15 '21 19:06 takluyver

Is the distinction just to prevent people constructing SocketType()? Would something like this be acceptable:

No, we actually have a more standard solution for that. IIRC the weird SocketType/_SocketType contortions were to support our built-in network "monkeypatching" system: https://trio.readthedocs.io/en/stable/reference-testing.html#trio.abc.SocketFactory

So the goals were:

  • Let people define their own fake "socket classes"
  • Make them behaviorally match the real socket class, so e.g. isinstance(fake_socket, trio.socket.SocketType) should return True
  • Don't let people inherit implementation from our real socket class, because that's both unhelpful for the intended testing use cases, and also asking for backcompat problems.
  • Don't force people to implement the full socket API in their fake socket classes, because it's extremely large and complex and your average test only needs to fake like 10% of it. (This is why I didn't make SocketType an ABC with the whole API defined on it. Well, that + it's annoying to maintain two copies of all those method signatures.)

On further thought, maybe an alternative would be:

  • Rename _SocketType to SocketType, so it's the real class, using NoPublicConstructor to prevent actual subclassing and keep the constructor private
  • Tell people to use ABCMeta.register on SocketType to fake isinstance into working without inheriting any implementation.

I guess this would need a deprecation b/c it would break any existing code that's doing class MyFakeSocket(trio.socket.SocketType), but the fix is trivial and I doubt there are many people doing that, so it could probably be a pretty quick transition.

njsmith avatar Jun 16 '21 01:06 njsmith