gh-67041: Allow to distinguish between empty and not defined URI components
Changes in the urllib.parse module:
- Add option ~allow_none~ missing_as_none in
urlparse(),urlsplit()andurldefrag(). If it is true, represent not defined components as None instead of an empty string. - Add option keep_empty in
urlunparse()andurlunsplit(). If it is true, keep empty non-None components in the resulting string. By default it is the same as the allow_none value for the result of theurlparse()andurlsplit()calls. - ~Add option keep_empty in the
geturl()method ofDefragResult,SplitResult,ParseResultand the corresponding bytes counterparts.~
- Issue: gh-67041
It is now ready to review. The status of allow_none is now saved in the DefragResult, SplitResult and ParseResult objects, so in most cases there is no need to pass the keep_empty argument. geturl() no longer needs the keep_empty parameter.
Unfortunately, these objects now have __dict__ and no longer immutable. This is because non-empty __slots__ is not compatible with tuple subclasses. This is a separate complex issue. I'll try to find a solution of it, but it may be difficult.
The long term plan is to make allow_none True by default, and later deprecate allow_none=False. keep_empty=False can still be useful.
I am sorry, I forget to copy the _keep_empty attribute in copying/encoding/decoding methods. Now the PR is ready for review.
@orsenthil, @barneygale, could you please make a review?
I think that allow_none is not the correct name: this is not about allowing nones in the input (like how allow_fragments allows fragment in the input), but about controlling the data returned.
Possible alternatives:
-
use_none(short) -
missing_as_none -
use_none_for_empty
Maybe missing_as_empty? In future the behavior will be changed to use None for missing by default.
I think it should be missing_as_empty_string (short for return missing values as empty strings), which is long!
To change the default in the future, do you plan on adding a warning first? I could see many projects not paying attention to the new param and getting surprised (or silent bugs) a few years after.
To change the default in the future, do you plan on adding a warning first?
On one hand, a warning will inform everyone about the change (it should be a FutureWarning). You will have to pass explicit True or False to silence it and get your behavior. This how we normally do. On other hand, the warning will unnecessary disturb those who are fine with any behavior. We will discuss this when the time come.
I think it should be
missing_as_empty_string(short for return missing values as empty strings), which is long!
Currently, missing and empty component are not distinguishable. This parameter will allow to distinguish them. This PR adds also the keeps_empty parameter. So, "empty" refers not only to string, but to component. missing_component_as_empty_component, or simply missing_as_empty. Will it work?
Ah your’re right, there is empty component and empty string.
missing_as_none is good!
When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.