Early filter invalid hosts in wp_http_validate_url()
Trac ticket: https://core.trac.wordpress.org/ticket/64457
Adds early hostname validation using the Filter extension when available, while falling back to the existing behavior when it’s not. Includes a test case for underscore hostnames.
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.
Core Committers: Use this line as a base for the props when committing in SVN:
Props manhphucofficial, westonruter, sirlouen.
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.
Test using WordPress Playground
The changes in this pull request can previewed and tested using a WordPress Playground instance.
WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.
Some things to be aware of
- The Plugin and Theme Directories cannot be accessed within Playground.
- All changes will be lost when closing a tab with a Playground instance.
- All changes will be lost when refreshing the page.
- A fresh instance is created each time the link below is clicked.
- Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance, it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.
Thanks for the review!
I’ve updated the patch to address all the points raised:
- hostname validation now only applies when the host is not an IPv4 address
- removed the FILTER_VALIDATE_IP check and related constant assumptions
- added test coverage for underscores in hostnames
Please let me know if anything should be adjusted further. Appreciate you taking a look!
Thanks for the feedback!
I’ve updated the patch to address all the points:
- switched the check to
extension_loaded( 'filter' )as suggested - kept IPv4 handling separate to avoid affecting IP-based hosts
- added a test case for a valid IP host (
https://1.1.1.1/)
Happy to adjust further if there’s anything else you’d like me to refine.
@SirLouen what do you think?
I asked Gemini to review the changes and it had some helpful feedback:
I have completed the review of the changes.
The changes introduce stricter validation for hostnames in wp_http_validate_url() using filter_var() with FILTER_VALIDATE_DOMAIN, which is a good improvement. However, I identified a critical regression regarding IPv6 support and a minor edge case with numeric hostnames.
Review Findings
-
Critical Issue: IPv6 Support Regression
-
Observation: The new validation block runs when
$is_ipv4is false. If the URL contains an IPv6 literal (e.g.,http://[::1]/),$is_ipv4will be false. Thefilter_var( '[::1]', FILTER_VALIDATE_DOMAIN, ... )call returnsfalsefor bracketed IPv6 addresses, causingwp_http_validate_urlto returnfalseimmediately. - Impact: Valid IPv6 URLs will be rejected.
-
Recommendation: The check should be skipped if the host appears to be an IPv6 literal. Since
parse_urlpreserves brackets for IPv6 hosts, checking if$hoststarts with[would be sufficient (e.g.,&& strpos( $host, '[' ) === false).
-
Observation: The new validation block runs when
-
Minor Issue: Handling of "0" Hostname
-
Observation:
filter_var( '0', FILTER_VALIDATE_DOMAIN, ... )returns the string"0". In PHP,! "0"evaluates totrue. This causes the check! filter_var(...)to pass (evaluating as "invalid") for the hostname"0", returningfalse. -
Recommendation: Use strict comparison
=== falseto ensure only actual validation failures trigger the early return. -
Code:
&& false === filter_var( ... )
-
Observation:
-
Tests
- The added test case
underscore_in_hostnamecorrectly asserts that underscores are now invalid in domain names, which aligns with theFILTER_FLAG_HOSTNAMEbehavior. -
Suggestion: It would be beneficial to add a test case for an IPv6 literal (e.g.,
http://[::1]/) to ensure this functionality is preserved and to prevent future regressions.
- The added test case
-
Code Style & Compatibility
- The code adheres to WordPress coding standards (indentation, spacing).
- PHP 7.2 compatibility is maintained (
FILTER_VALIDATE_DOMAINis available since 7.0).
Summary
The logic improvement is sound but needs to account for IPv6 literals to avoid breaking support for them. I recommend adjusting the condition to exclude IPv6 hosts and using strict comparison for the filter_var result.
I will not modify the code myself but I present these findings for the user to act upon.
@manhphuc check some additional suggestions in the Core Trac thread.
@westonruter I was suspicious that filter adoption is 100% by now and I have been confirmed
We can still be conservative but I believe its time to update the Core docs and simply add filter in the pack of mandatory (and still no one will notice anything).
@manhphuc check the core ticket. Specially because the particular case you used for the unit test, seemed to be conflictive. You can add a bunch of extra unit tests as I commented in the reply. I think we can move this forward.
Thanks everyone for the detailed feedback and edge-case examples.
I’ve updated the hostname validation logic to avoid regressing legacy hosts that include underscores in subdomains (e.g. Blogspot), while still rejecting underscores in the registrable domain / TLD.
The implementation now:
- Skips FILTER_VALIDATE_DOMAIN for IPv6 literals
- Allows underscores in subdomains, but not in the registrable domain portion
- Preserves existing behavior for valid legacy hosts
I’ve also added unit tests covering the cases discussed in the Trac thread:
- h_ttp://example.org (invalid)
- https://hey_ho_lets_go._example.org (invalid)
- https://omg.c_om (invalid)
- https://peter_is_amazing.example.org (valid)
All HTTP-related PHPUnit tests are passing locally.