[#110736] Can't sign in to bugs.ruby-lang.org — Daniel Berger <djberg96@...>
Hi,
4 messages
2022/11/13
[ruby-core:111020] [Ruby master Feature#19102] Optimize ERB::Util.html_escape more than CGI.escapeHTML for template engines
From:
"Eregon (Benoit Daloze)" <noreply@...>
Date:
2022-11-26 13:03:14 UTC
List:
ruby-core #111020
Issue #19102 has been updated by Eregon (Benoit Daloze).
I think it is unfortunate to add a C extension for ERB for that, ERB was al=
ways pure-Ruby and that was nice.
Also the C extension is slower on TruffleRuby, the Regexp is actually JIT-c=
ompiled and can use vectorization, unlike that C code. Also part of it is R=
STRING_PTR() basically forces a copy from managed memory (byte[]) to native=
memory (char*) on TruffleRuby.
```
truffleruby 23.0.0-dev-57e53f8a, like ruby 3.1.2, GraalVM CE Native [x86_64=
-linux]
CGI.escapeHTML 31.985M (=B1 1.2%) i/s - 160.093M in 5.006001s
ERB::Util.html_escape 7.427M (=B1 3.3%) i/s - 37.162M in 5.009721s
```
and CRuby 3.1 is:
```
ERB::Util.html_escape 14.551M (=B1 0.8%) i/s - 73.308M in 5.038335s
CGI.escapeHTML 10.065M (=B1 0.6%) i/s - 51.054M in 5.072629s
```
Given those results, could you build the C extension only for CRuby?
I think it would also be much nicer to keep the optimized HTML escape in CG=
I which is in stdlib, so it can be used by all templates engines.
In #19090 I did not expect `rb_str_dup()` is so costly on CRuby, I guess th=
e allocation is slow and of course CRuby can't escape-analyze it.
A new method in CGI sounds best, or probably nicer an optional argument whe=
ther to always return a copy.
It seems tricky to change the existing method to return a mutable string as=
-is, I guess the person who found #11858 actually in that incompatibility.
Ruby users seem to assume in general if a core/stdlib method returns a stri=
ng either it's frozen or they are free to modify it, but here it would actu=
ally also mutate the original String passed to `CGI.escapeHTML`.
`String#to_s` really sounds like something every Ruby JIT/VM should be able=
to trivially optimize.
So that I think should be an insignificant cost, even on CRuby (and if it i=
sn't it should be easy to fix).
----------------------------------------
Feature #19102: Optimize ERB::Util.html_escape more than CGI.escapeHTML for=
template engines
https://bugs.ruby-lang.org/issues/19102#change-100275
* Author: k0kubun (Takashi Kokubun)
* Status: Closed
* Priority: Normal
----------------------------------------
## Proposal
Change the behavior of `ERB::Util.html_escape` in the following two parts:
1. Skip converting an argument with `#to_s` if the argument is already a `T=
_STRING`.
2. Do not allocate and return a new String when nothing needs to be escaped.
## Background
The current `ERB::Util.html_escape` is implemented as `CGI.escapeHTML(s.to_=
s)`. So the performance is almost equal to `CGI.escapeHTML` except for the =
`to_s` call. Because it's common to embed non-String expressions in templat=
e engines, a template engine typically calls `to_s` to convert non-String e=
xpressions to a String before escaping it, which is why the difference exis=
ts. Proposal (1) is useful for optimizing the case that something that's al=
ready a `String` is embedded. We ignore the extreme case that `String#to_s`=
is weirdly monkey-patched.
As to proposal (2), my original implementation of `CGI.escapeHTML` https://=
github.com/ruby/ruby/pull/1164 was not calling `rb_str_dup` for that case. =
However, because [Bug #11858] claimed returning the argument object for non=
-escaped cases is a backward incompatibility with the old `gsub`-based impl=
ementation, we added the unneeded `rb_str_dup` call and the performance for=
that case has been compromised. This behavior is completely unnecessary fo=
r template engines. On the other hand, because `ERB::Util.html_escape` is a=
helper for ERB, we should not need to consider any backward compatibility =
that is not relevant to ERB or any template engines. So proposal (2) should=
be possible in `ERB::Util.html_escape` unlike `CGI.escapeHTML`.
## Benchmark
Implementation: https://github.com/ruby/erb/pull/27
```rb
require 'benchmark/ips'
require 'erb'
class << ERB::Util
def html_escape_old(s)
CGI.escapeHTML(s.to_s)
end
end
Benchmark.ips do |x|
s =3D 'hello world'
x.report('before') { ERB::Util.html_escape_old(s) }
x.report('after') { ERB::Util.html_escape(s) }
x.compare!
end
```
```
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
before 1.066M i/100ms
after 1.879M i/100ms
Calculating -------------------------------------
before 10.615M (=B1 0.3%) i/s - 53.320M in 5.023083s
after 18.742M (=B1 0.4%) i/s - 93.929M in 5.011847s
Comparison:
after: 18741747.6 i/s
before: 10615137.1 i/s - 1.77x (=B1 0.00) slower
```
--=20
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-c=
ore.ml.ruby-lang.org/