wordpress-develop icon indicating copy to clipboard operation
wordpress-develop copied to clipboard

Early filter invalid hosts in wp_http_validate_url()

Open manhphuc opened this issue 1 month ago • 10 comments

Trac ticket: https://core.trac.wordpress.org/ticket/64457

Adds early hostname validation using the Filter extension when available, while falling back to the existing behavior when it’s not. Includes a test case for underscore hostnames.

manhphuc avatar Dec 30 '25 04:12 manhphuc

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props manhphucofficial, westonruter, sirlouen.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

github-actions[bot] avatar Dec 30 '25 04:12 github-actions[bot]

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance, it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

github-actions[bot] avatar Dec 30 '25 04:12 github-actions[bot]

Thanks for the review!

I’ve updated the patch to address all the points raised:

  • hostname validation now only applies when the host is not an IPv4 address
  • removed the FILTER_VALIDATE_IP check and related constant assumptions
  • added test coverage for underscores in hostnames

Please let me know if anything should be adjusted further. Appreciate you taking a look!

manhphuc avatar Dec 30 '25 05:12 manhphuc

Thanks for the feedback!

I’ve updated the patch to address all the points:

  • switched the check to extension_loaded( 'filter' ) as suggested
  • kept IPv4 handling separate to avoid affecting IP-based hosts
  • added a test case for a valid IP host (https://1.1.1.1/)

Happy to adjust further if there’s anything else you’d like me to refine.

manhphuc avatar Dec 30 '25 07:12 manhphuc

@SirLouen what do you think?

westonruter avatar Dec 30 '25 19:12 westonruter

I asked Gemini to review the changes and it had some helpful feedback:


I have completed the review of the changes.

The changes introduce stricter validation for hostnames in wp_http_validate_url() using filter_var() with FILTER_VALIDATE_DOMAIN, which is a good improvement. However, I identified a critical regression regarding IPv6 support and a minor edge case with numeric hostnames.

Review Findings

  1. Critical Issue: IPv6 Support Regression

    • Observation: The new validation block runs when $is_ipv4 is false. If the URL contains an IPv6 literal (e.g., http://[::1]/), $is_ipv4 will be false. The filter_var( '[::1]', FILTER_VALIDATE_DOMAIN, ... ) call returns false for bracketed IPv6 addresses, causing wp_http_validate_url to return false immediately.
    • Impact: Valid IPv6 URLs will be rejected.
    • Recommendation: The check should be skipped if the host appears to be an IPv6 literal. Since parse_url preserves brackets for IPv6 hosts, checking if $host starts with [ would be sufficient (e.g., && strpos( $host, '[' ) === false).
  2. Minor Issue: Handling of "0" Hostname

    • Observation: filter_var( '0', FILTER_VALIDATE_DOMAIN, ... ) returns the string "0". In PHP, ! "0" evaluates to true. This causes the check ! filter_var(...) to pass (evaluating as "invalid") for the hostname "0", returning false.
    • Recommendation: Use strict comparison === false to ensure only actual validation failures trigger the early return.
    • Code: && false === filter_var( ... )
  3. Tests

    • The added test case underscore_in_hostname correctly asserts that underscores are now invalid in domain names, which aligns with the FILTER_FLAG_HOSTNAME behavior.
    • Suggestion: It would be beneficial to add a test case for an IPv6 literal (e.g., http://[::1]/) to ensure this functionality is preserved and to prevent future regressions.
  4. Code Style & Compatibility

    • The code adheres to WordPress coding standards (indentation, spacing).
    • PHP 7.2 compatibility is maintained (FILTER_VALIDATE_DOMAIN is available since 7.0).

Summary

The logic improvement is sound but needs to account for IPv6 literals to avoid breaking support for them. I recommend adjusting the condition to exclude IPv6 hosts and using strict comparison for the filter_var result.

I will not modify the code myself but I present these findings for the user to act upon.

westonruter avatar Jan 01 '26 22:01 westonruter

@manhphuc check some additional suggestions in the Core Trac thread.

SirLouen avatar Jan 02 '26 00:01 SirLouen

@westonruter I was suspicious that filter adoption is 100% by now and I have been confirmed

We can still be conservative but I believe its time to update the Core docs and simply add filter in the pack of mandatory (and still no one will notice anything).

SirLouen avatar Jan 05 '26 01:01 SirLouen

@manhphuc check the core ticket. Specially because the particular case you used for the unit test, seemed to be conflictive. You can add a bunch of extra unit tests as I commented in the reply. I think we can move this forward.

SirLouen avatar Jan 05 '26 01:01 SirLouen

Thanks everyone for the detailed feedback and edge-case examples.

I’ve updated the hostname validation logic to avoid regressing legacy hosts that include underscores in subdomains (e.g. Blogspot), while still rejecting underscores in the registrable domain / TLD.

The implementation now:

  • Skips FILTER_VALIDATE_DOMAIN for IPv6 literals
  • Allows underscores in subdomains, but not in the registrable domain portion
  • Preserves existing behavior for valid legacy hosts

I’ve also added unit tests covering the cases discussed in the Trac thread:

  • h_ttp://example.org (invalid)
  • https://hey_ho_lets_go._example.org (invalid)
  • https://omg.c_om (invalid)
  • https://peter_is_amazing.example.org (valid)

All HTTP-related PHPUnit tests are passing locally.

manhphuc avatar Jan 05 '26 07:01 manhphuc