parseurl icon indicating copy to clipboard operation
parseurl copied to clipboard

feat: implement custom URL parser to replace deprecated url.parse

Open jiji-hoon96 opened this issue 4 months ago • 7 comments

Node.js v24 has deprecated url.parse(), causing deprecation warnings. This is related to security issues (hostname spoofing) and will be removed in future versions.

This PR replaces the deprecated url.parse() with the WHATWG URL API (new URL()) as recommended by Node.js, avoiding custom regex implementations for better reliability and maintainability.

Changes

  • Replaced deprecated url.parse() with native WHATWG URL API (new URL())
  • Implemented parseUrlString() wrapper function using native URL constructor
  • Added relative URL handling with base URL (http://localhost) for compatibility
  • Fixed hash fragment handling in fast-path parsing for Node.js 0.8 compatibility
  • Maintained existing behavior with whitespace trimming
  • All existing tests pass (20 tests)
  • ESLint code style compliance

Performance

Benchmark results show performance is maintained or improved compared to the original:

Test Case parseurl (new) url.parse() (legacy) WHATWG URL Performance vs Legacy
Full URL 1,181,856 ops/sec 650,146 ops/sec 1,611,026 ops/sec 1.8x faster
Path + Query 9,291,785 ops/sec 2,558,726 ops/sec 925,643 ops/sec 3.6x faster
Same Request (cached) 20,029,556 ops/sec 3,020,408 ops/sec 998,215 ops/sec 6.6x faster
Simple Path 68,538,878 ops/sec 7,547,222 ops/sec 1,249,746 ops/sec 9.1x faster
Root Path (/) 170,181,125 ops/sec 10,888,984 ops/sec 1,719,375 ops/sec 15.6x faster

jiji-hoon96 avatar Oct 13 '25 05:10 jiji-hoon96

Hey @jiji-hoon96! Thanks for investing time and effort in helping the project! :heart:

You are right the url.parse() seems deprecated:

var url = require('url')
var parse = url.parse 
console.log(parse('https://expressjs.com/en/4x/api.html#req.protocol'))
(node:4158863) [DEP0169] DeprecationWarning: `url.parse()` behavior is not standardized and prone to errors that have security implications. Use the WHATWG URL API instead. CVEs are not issued for `url.parse()` vulnerabilities.
(Use `node --trace-deprecation ...` to show where the warning was created)

I was wondering if you explored the option to use the URL constructor directly like:

var data = new URL('https://expressjs.com/en/4x/api.html#req.protocol');
console.log(data);

The output is quite similar, but as mentioned here (url.parse supports a few use cases that are not supported with new URL().) not sure about the final impact for us.

// url.parse()

Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'expressjs.com',
  port: null,
  hostname: 'expressjs.com',
  hash: '#req.protocol',
  search: null,
  query: null,
  pathname: '/en/4x/api.html',
  path: '/en/4x/api.html',
  href: 'https://expressjs.com/en/4x/api.html#req.protocol'
}
//new URL()
URL {
  href: 'https://expressjs.com/en/4x/api.html#req.protocol',
  origin: 'https://expressjs.com',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'expressjs.com',
  hostname: 'expressjs.com',
  port: '',
  pathname: '/en/4x/api.html',
  search: '',
  searchParams: URLSearchParams {},
  hash: '#req.protocol'
}

The regex you proposed in your parser seems to follow a linear complexity, but I prefer to avoid include custom implementations if we can use built-in utils or existing APIs.

wdyt @blakeembrey?

UlisesGascon avatar Oct 13 '25 09:10 UlisesGascon

Thanks for the suggestion! You're right - using the URL constructor is a better approach than a custom implementation. I'll update the code to use new URL() instead. 👍

jiji-hoon96 avatar Oct 13 '25 09:10 jiji-hoon96

ref: https://github.com/nodejs/web-server-frameworks/issues/71

bjohansebas avatar Oct 13 '25 21:10 bjohansebas

Given the constraints, I believe is the best available option because

  • It removes the deprecation warnings (the main issue)
  • It uses Node.js built-in APIs as you prefer

However, I'm open to any direction you'd prefer. Should we

  • Keep the current WHATWG URL approach?
  • Go back to the custom parser?
  • Wait for a potential TolerantURL API in Node.js core?

Let me know your thoughts! 🙏

jiji-hoon96 avatar Oct 14 '25 00:10 jiji-hoon96

Proposal: Modernizing parseurl by Embracing Native Node.js APIs

Hello everyone, I've been following the critical discussion around managing URL parsing in this package, particularly the need to address the deprecated url.parse method (as referenced in PR #44). I'd like to propose a solution that leverages modern Node.js features while preserving the package's existing API contract, making the transition seamless for its extensive user base. The Core Proposal: Plain Object URL Representation The key change is to shift the implementation internally to use the native URL class but have parseurl return a plain JavaScript object with properties matching the expected URL structure. This approach resolves our technical debt while maintaining stability:

  • Eliminates Deprecation Warnings: We stop using the deprecated url.parse method, resolving the primary technical issue.
  • Utilizes Built-in APIs: We lean on Node.js's optimized, modern URL constructor, which is generally preferred over custom parsing logic.
  • Retains Current API Structure: The returned value is explicitly not an instance of url.Url or global.URL, but a simple data object. This ensures consumers relying on direct property access will not break.

Implementation Preview

I have worked on a draft implementation of this change:

  • Proposed index.js Implementation:::https://github.com/raji-mahmud-a/parseurl/blob/master/index.js

Impact Analysis: Backward Compatibility is Key Given that Express.js accounts for about 90% of this package's weekly downloads, stability is paramount.

This update is designed to be fully backward compatible. Because the output retains the expected API structure as a plain object, existing consumers of the package should not require any modifications, ensuring a smooth transition for the entire ecosystem.

Critical CI/CD Pipeline Fixes

In parallel with the feature update, I've addressed critical maintenance issues in the existing CI/CD workflow (ci.yml) to ensure reliable testing on modern GitHub Actions runners:

Issue Recommended Fix
Critical Output Syntax Change all ::set-output commands to use the required $GITHUB_OUTPUT file format.
Coveralls Action Update to use the stable coverallsapp/github-action@v2 instead of the unstable @master tag.
Token Reference Correct the token reference in the coverage job to use the standard ${{ secrets.GITHUB_TOKEN }}.
Node Setup Simplification Replace the custom NVM/shell logic with the modern and recommended uses: actions/setup-node@v4.

CI/CD Fixes Implementation

  • Proposed ci.yml Workflow Fixes::: https://github.com/raji-mahmud-a/parseurl/blob/master/.github/workflows/ci.yml

I'll like to know what you think. Thanks.

raji-mahmud-a avatar Nov 23 '25 23:11 raji-mahmud-a

@UlisesGascon @bjohansebas

Thank you for the constructive feedback!

I have updated the PR to address the comments and fix the CI failures:

  1. Dropped Legacy Support: Removed Node.js 0.8, 0.10, and 0.12 from the CI matrix as suggested.
  2. Updated CI Workflow: Upgraded GitHub Actions (checkout, setup-node) to v4 and fixed the deprecated set-output syntax.
  3. Fixed Typo

The CI checks are now passing. Ready for review!

jiji-hoon96 avatar Nov 24 '25 01:11 jiji-hoon96

Nice 💜,

To remove the deprecated url.parse() method, I suggest removing all instances of url.Url or new URL() and converting it to a plain object and this makes it considerably faster since there is no object construction

raji-mahmud-a avatar Nov 24 '25 04:11 raji-mahmud-a