cgi icon indicating copy to clipboard operation
cgi copied to clipboard

Allow GCI.escapeHTML to take a custom escape table

Open byroot opened this issue 11 months ago • 0 comments

In some cases you may want to escape a string in a different way than the default behavior.

For instance, if you are trying to make some JSON save to include in a <script> tag, you may want to escape less, and using JavaScript codepoints:

>> CGI.escapeHTML('Hello </script>', ">" => '\u003e', "<" => '\u003c', "&" => '\u0026')
=> "Hello \\u003c/script\\u003e"

Of course you can always use gsub for that, but CGI.escapeHTML being specialized is able to be very significantly faster:

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                gsub    82.135k i/100ms
          escapeHTML   221.405k i/100ms
Calculating -------------------------------------
                gsub    821.890k (± 2.2%) i/s    (1.22 μs/i) -      4.189M in   5.099152s
          escapeHTML      2.330M (± 0.5%) i/s  (429.22 ns/i) -     11.734M in   5.036770s

Comparison:
          escapeHTML:  2329816.5 i/s
                gsub:   821889.7 i/s - 2.83x  slower

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                gsub    36.235k i/100ms
          escapeHTML   171.347k i/100ms
Calculating -------------------------------------
                gsub    359.528k (± 1.5%) i/s    (2.78 μs/i) -      1.812M in   5.040422s
          escapeHTML      1.812M (± 0.7%) i/s  (551.84 ns/i) -      9.081M in   5.011762s

Comparison:
          escapeHTML:  1812105.3 i/s
                gsub:   359527.5 i/s - 5.04x  slower

require "benchmark/ips"
require "cgi"

ESCAPE = {
  ">" => '\u003e', "<" => '\u003c', "&" => '\u0026',
}

ESCAPE_B = {
  ">".b => '\u003e'.b,
  "<".b => '\u003c'.b,
  "&".b => '\u0026'.b,
}
ESCAPE_REGEX = Regexp.union(ESCAPE_B.keys)

str = ("a" * 1024).freeze
Benchmark.ips do |x|
  x.report("gsub") do
    b = str.b
    b.gsub!(ESCAPE_REGEX, ESCAPE_B)
    b.force_encoding(str.encoding)
  end

  x.report("escapeHTML") do
    CGI.escapeHTML(str, ESCAPE)
  end

  x.compare!
end

str = (("a" * 1023) + "<").freeze
Benchmark.ips do |x|
  x.report("gsub") do
    b = str.b
    b.gsub!(ESCAPE_REGEX, ESCAPE_B)
    b.force_encoding(str.encoding)
  end

  x.report("escapeHTML") do
    CGI.escapeHTML(str, ESCAPE)
  end

  x.compare!
end

NB: I haven't implemented the Java version, but can do it if there is interest in this feature.

byroot avatar Feb 26 '25 18:02 byroot