cpython Readline completion of module names in import statements

BPO 25419

Nosy @vadmium, @serhiy-storchaka, @MojoVampire

Dependencies
bpo-16182: readline: Wrong tab completion scope indices in Unicode terminals

Files
complete-import.patch
complete-import.v2.patch

BPO	25419
Nosy	@vadmium, @serhiy-storchaka, @MojoVampire
Dependencies	bpo-16182: readline: Wrong tab completion scope indices in Unicode terminals
Files	complete-import.patch complete-import.v2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2015-10-16.11:16:48.120>
labels = ['type-feature', 'library']
title = 'Readline completion of module names in import statements'
updated_at = <Date 2016-05-25.08:03:01.876>
user = 'https://github.com/vadmium'

bugs.python.org fields:

activity = <Date 2016-05-25.08:03:01.876>
actor = 'martin.panter'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2015-10-16.11:16:48.120>
creator = 'martin.panter'
dependencies = ['16182']
files = ['40794', '42986']
hgrepos = []
issue_num = 25419
keywords = ['patch']
message_count = 6.0
messages = ['253072', '253095', '253104', '253115', '253132', '266320']
nosy_count = 3.0
nosy_names = ['martin.panter', 'serhiy.storchaka', 'josh.r']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue25419'
versions = ['Python 3.6']

Linked PRs

gh-129329
gh-133358
gh-134181
gh-134680
gh-138268
gh-138943
gh-139461
gh-143436
gh-143438

Oct 16 '15 11:10 vadmium

As mentioned in bpo-25209, here is a patch to complete module and attribute names in import statements. This is a rewrite of some personal code I have had kicking around for a few years, but I have added extra features (parsing “import x as alias”, supporting “from .relative import x”).

All the existing completions should continue to work as long as the statement immediately preceding the cursor does not look like an “import” statement. When an import statement is detected, these other completions are disabled.

When alternative completions are displayed, my code appends a dot (.) indicator to package names, but this is not inserted in the input line. Maybe people think this is a good idea or maybe not.

This illustrates what it looks like in action. Text in square brackets was automatically completed:

Python 3.6.0a0 (qbase qtip readline/complete-import.patch tip:65d2b721034a, Oct 16 2015, 10:02:3) 
[GCC 5.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import io, h<Tab><Tab>
hashlib  heapq    hmac     html.    http.    
>>> import io, htm[l].<Tab><Tab>
html.entities  html.parser    
>>> import io, html.e[ntities]
>>> from html import e<Tab><Tab>
entities  escape    
>>> from html import es[cape]

The code works around bpo-16182 (byte vs code point indexing) in a very simplistic way. It probably won’t give useful completions (if any) for non-ASCII input involving imports.

The patch currently does not complete built-in module names (sys.builtin_module_names). This should not be hard to add, but I need to decide whether the support should be added specially in rlcompleter, or generally in pkgutil.iter_modules() instead.

I also have code to complete the following, which I could try to make patches for if there is interest:

keyword argument names to function calls, e.g. open("file", enc[oding=]
attributes on constructor return types, e.g. ValueError("msg").__t[raceback__]

Oct 16 '15 11:10 vadmium

I'm reviewing the patch, but it will take a time.

Wouldn't be simpler to use regular expressions instead of tokenizer?

For now Completer doesn't depends on the readline module, nor other global state. It can be used for completing in other environment, for example to complete words in IDE. Patched Completer retrieves additional information from the readline module. This can break existing code. It would be nice to decouple Completer from readline. In contrary to user .pythonrc.py file, we are free to invent new interfaces. May be add methods to the Completer class that provides needed additional information (nothing by default), and add Completer's subclass ReadlineCompleter that implements these methods using readline?

Found presumable bugs:

"import sy" doesn't suggest completion "sys".

"import os.p" doesn't suggest completion "os.path".

Oct 16 '15 20:10 serhiy-storchaka

Is there a reason ipython's import completer couldn't be borrowed in its entirety? At work, I use a lightly adapted version of the code from ipython to do completion when I'm using the plain interactive interpreter (for whatever reason), and it works just fine.

Oct 16 '15 22:10 99ffcaa5-b43b-4e8e-a35e-9c890007b9cd

Josh: Thanks for pointing out I Python. I haven’t used it much myself, but it does seem to do a similar thing to my proposal. Looks like the relevant code may be around module_completion() at <https://github.com/ipython/ipython/blob/ab929fe/IPython/core/completerlib.py#L209>.

The “sys” module is one of the builtin modules that I mentioned above. I plan to discuss changing pkgutil.iter_modules() to include it, in a separate bug report.

The “os.path” case is more awkward. The “os” module is not actually a package. I believe “import os.path” only works because executing the “os” module modifies sys.modules. My code currently avoids importing non-packages, because I thought it would be annoying to accidentally run a script via tab completion (anything not protected by ‘if __name__ == "__main__" ’). On the other hand, I Python happily completes “os.path” (and many more non-submodule names). It also can be tricked into running scripts, e.g. if you do “import idlelib.__main__.” and press Tab. But maybe this is not a real problem, and I should stop being paranoid.

I tend to avoid regular expressions if practical. But Serhiy you may be right that some simple string matching rules would reduce the need for tokenizing. It looks like I Python only has a few simple rules for the entire input line being “import x” and “from x import y”. The disadvantage is less accurate understanding of more complicated syntax, like “from x import y; from z import (a, bb as b, . . .”. It is a tradeoff between simpler code that only supports basic functionality versus complex code that supports more complete functionality.

I hear your points about decoupling from Readline and backwards compatibility. I will consider the overall architecture more in a future update. It would be good to allow this stuff to be used in e.g. Idle (though I wouldn’t know where to wire it in myself).

Oct 17 '15 02:10 vadmium

Added few cursory comments on Rietveld.

"sys" and "os.path" cases are not critical and we can ignore them if it complicates the code. But preserving Completer independence from readline looks more important to me. We can just add methods get_line_buffer(), get_begidx(), get_endidx() and get_completion_type() to Completer, but may be better to add more high level methods.

Oct 17 '15 15:10 serhiy-storchaka

I moved all the calls targetting the readline module into a ReadlineCompleter subclass. However the logic for parsing “import” statements still exists in the base Completer class in private methods. An overview of the two classes:

class Completer:
    def complete(self, text, state):
        self._get_matches(text)
    def _get_matches(text):
        # Only completes global and object.attr names, like before
    def _code_matches(self, code, ...):
        # Completes import statements, otherwise returns (None, ...)

class ReadlineCompleter(Completer):  # New public class
    def complete(self, text, state):
        # Moved Yury’s Tab insertion logic here
        return super().complete(...)
    def _get_matches(text):
        code = readline.get_line_buffer()[:readline.get_endidx()]
        self._code_matches(code)
        super()._get_matches(text)  # Fallback to existing behaviour

Perhaps the _code_matches() and related methods could be turned into more general public APIs, e.g. complete_code(code) -> list of modules, attributes, globals, etc. But that would affect global_matches() and attr_matches(), which I have not touched so far.

Other changes:

Limit underscore-prefixed completions, consistent with bpo-25011; see new _filter_identifiers() method
Changed the demo in the documentation; attributes like __doc__ are omitted by default
Removed workaround for non-ASCII input in favour of fixing bpo-16182

May 25 '16 08:05 vadmium

I changed my mind. I thought it would be easier to implement this with simple regular expressions and string operations, but handling the corner cases makes it difficult. I now think that using the tokenize module is the only correct solution.

My apologies, @vadmium. Are you still interesting in continuing this work?

Mar 28 '24 19:03 serhiy-storchaka

I don’t want to spend much time on this, but I’m happy for other people to build on my code.

May 12 '24 05:05 vadmium