cpython icon indicating copy to clipboard operation
cpython copied to clipboard

gh-67041: Allow to distinguish between empty and not defined URI components

Open serhiy-storchaka opened this issue 1 year ago • 9 comments

Changes in the urllib.parse module:

  • Add option ~allow_none~ missing_as_none in urlparse(), urlsplit() and urldefrag(). If it is true, represent not defined components as None instead of an empty string.
  • Add option keep_empty in urlunparse() and urlunsplit(). If it is true, keep empty non-None components in the resulting string. By default it is the same as the allow_none value for the result of the urlparse() and urlsplit() calls.
  • ~Add option keep_empty in the geturl() method of DefragResult, SplitResult, ParseResult and the corresponding bytes counterparts.~
  • Issue: gh-67041

serhiy-storchaka avatar Aug 25 '24 07:08 serhiy-storchaka

It is now ready to review. The status of allow_none is now saved in the DefragResult, SplitResult and ParseResult objects, so in most cases there is no need to pass the keep_empty argument. geturl() no longer needs the keep_empty parameter.

Unfortunately, these objects now have __dict__ and no longer immutable. This is because non-empty __slots__ is not compatible with tuple subclasses. This is a separate complex issue. I'll try to find a solution of it, but it may be difficult.

The long term plan is to make allow_none True by default, and later deprecate allow_none=False. keep_empty=False can still be useful.

serhiy-storchaka avatar Nov 27 '24 11:11 serhiy-storchaka

I am sorry, I forget to copy the _keep_empty attribute in copying/encoding/decoding methods. Now the PR is ready for review.

@orsenthil, @barneygale, could you please make a review?

serhiy-storchaka avatar Dec 05 '24 11:12 serhiy-storchaka

I think that allow_none is not the correct name: this is not about allowing nones in the input (like how allow_fragments allows fragment in the input), but about controlling the data returned.

Possible alternatives:

  • use_none (short)
  • missing_as_none
  • use_none_for_empty

merwok avatar Nov 17 '25 02:11 merwok

Maybe missing_as_empty? In future the behavior will be changed to use None for missing by default.

serhiy-storchaka avatar Nov 17 '25 09:11 serhiy-storchaka

I think it should be missing_as_empty_string (short for return missing values as empty strings), which is long!

To change the default in the future, do you plan on adding a warning first? I could see many projects not paying attention to the new param and getting surprised (or silent bugs) a few years after.

merwok avatar Nov 17 '25 14:11 merwok

To change the default in the future, do you plan on adding a warning first?

On one hand, a warning will inform everyone about the change (it should be a FutureWarning). You will have to pass explicit True or False to silence it and get your behavior. This how we normally do. On other hand, the warning will unnecessary disturb those who are fine with any behavior. We will discuss this when the time come.

serhiy-storchaka avatar Nov 17 '25 14:11 serhiy-storchaka

I think it should be missing_as_empty_string (short for return missing values as empty strings), which is long!

Currently, missing and empty component are not distinguishable. This parameter will allow to distinguish them. This PR adds also the keeps_empty parameter. So, "empty" refers not only to string, but to component. missing_component_as_empty_component, or simply missing_as_empty. Will it work?

serhiy-storchaka avatar Nov 17 '25 14:11 serhiy-storchaka

Ah your’re right, there is empty component and empty string.

missing_as_none is good!

merwok avatar Nov 17 '25 19:11 merwok

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

bedevere-app[bot] avatar Nov 19 '25 03:11 bedevere-app[bot]