[ruby-core:117332] [Ruby master Feature#20394] Add an offset parameter to `String#to_i`
From:
"shan (Shannon Skipper) via ruby-core" <ruby-core@...>
Date:
2024-03-26 19:32:19 UTC
List:
ruby-core #117332
Issue #20394 has been updated by shan (Shannon Skipper).
Dan0042 (Daniel DeLorme) wrote in #note-4:
> It doesn't seem like String#getbyte is much faster than File#getbyte, and=
StringIO#getbyte is fastest of all.=20
I'm seeing a similar result to what you show above with YJIT disabled, but =
`str.getbyte(i)` seems to pull ahead substantially with YJIT enabled on mac=
OS and Linux with both Ruby 3.3 and nightly.
```
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
Calculating -------------------------------------
fd.getbyte 114.407 (=B1 0.9%) i/s - 575.000 in 5.026157s
io.getbyte 148.602 (=B1 0.7%) i/s - 756.000 in 5.087645s
str.getbyte(i) 261.846 (=B1 0.8%) i/s - 1.310k in 5.003151s
Comparison:
str.getbyte(i): 261.8 i/s
io.getbyte: 148.6 i/s - 1.76x slower
fd.getbyte: 114.4 i/s - 2.29x slower
```
----------------------------------------
Feature #20394: Add an offset parameter to `String#to_i`
https://bugs.ruby-lang.org/issues/20394#change-107476
* Author: byroot (Jean Boussier)
* Status: Open
----------------------------------------
### Context
I maintain the `redis-client` gem, and it comes with an optional swapable i=
mplementation in C that binds the `hiredis` C client, [which used to perfor=
ms up to 5 times faster in some cases](https://github.com/redis-rb/redis-cl=
ient/commit/9fabd57c6786a03fe0c6021eab5b181d9316d9d7).
I recently paired with @tenderlovemaking to try to close this gap, or even =
try to make the pure Ruby version faster, and we came up with several optim=
izations that now almost make both version on par (assuming YJIT is enabled=
).
An important source of performance loss, is that the Redis protocol is line=
based and to parse it in Ruby requires to slice a lot of small strings fro=
m the buffer. To give an example, here's how an Array with two String (`["f=
oo", "plop"]`) is serialized in RESP3 (Redis protocol):
```
*2\r\n
$3\r\n
foo\r\n
$4\r\n
plop\r\n
```
>From this you can understand that a big hotspot in the parser is essentiall=
y `Integer(gets)`.
With @tenderlovemaking we managed to get [a fairly significant perf boost](=
https://github.com/redis-rb/redis-client/commit/41b3abe94243d2598211d448c4e=
457a3585ff9d5#diff-a8b5ce23fb9396492f56bf0bd23090910918a488416cfb488cef8b5b=
34877328) by avoiding these string allocation using `String#getbyte` and [b=
asically implementing a rudimentary `String#to_i(offset: )` in Ruby](https:=
//github.com/redis-rb/redis-client/commit/41b3abe94243d2598211d448c4e457a35=
85ff9d5#diff-5f15c6483e788ee14f367f65fb951800d52341726f528bcddff1e2cd3e62ca=
b9R105-R115).
But while the gains are huge with YJIT enabled, they are much more tame wit=
h the interpreter. And it feels a bit wrong to have to implement this sorts=
of things for performance reasons.
### `String#to_i(offset: )`
Similar to `String#unpack(offset:)` ([Feature #18254]), I believe `String#t=
o_i(offset: )` would be useful.
### Alternative new `String#unpack` format
Another possibility would be to add a new format to `String#pack` `String#u=
npack` for decimal numbers. It sounds a bit weird at first, but given it su=
pports things like Base64 and hexadecimal, perhaps it's not that much of a =
stretch?
--=20
https://bugs.ruby-lang.org/
______________________________________________
ruby-core mailing list -- ruby-core@ml.ruby-lang.org
To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-c=
ore.ml.ruby-lang.org/