[#107430] [Ruby master Feature#18566] Merge `io-wait` gem into core IO — "byroot (Jean Boussier)" <noreply@...>

Issue #18566 has been reported by byroot (Jean Boussier).

22 messages 2022/02/02

[#107434] [Ruby master Bug#18567] Depending on default gems when not needed considered harmful — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18567 has been reported by Eregon (Benoit Daloze).

31 messages 2022/02/02

[#107443] [Ruby master Feature#18568] Explore lazy RubyGems boot to reduce need for --disable-gems — "headius (Charles Nutter)" <noreply@...>

Issue #18568 has been reported by headius (Charles Nutter).

13 messages 2022/02/02

[#107481] [Ruby master Feature#18571] Removed the bundled sources from release package after Ruby 3.2 — "hsbt (Hiroshi SHIBATA)" <noreply@...>

Issue #18571 has been reported by hsbt (Hiroshi SHIBATA).

9 messages 2022/02/04

[#107490] [Ruby master Bug#18572] Performance regression when invoking refined methods — "palkan (Vladimir Dementyev)" <noreply@...>

Issue #18572 has been reported by palkan (Vladimir Dementyev).

12 messages 2022/02/05

[#107514] [Ruby master Feature#18576] Rename `ASCII-8BIT` encoding to `BINARY` — "byroot (Jean Boussier)" <noreply@...>

Issue #18576 has been reported by byroot (Jean Boussier).

47 messages 2022/02/08

[#107536] [Ruby master Feature#18579] Concatenation of ASCII-8BIT strings shouldn't behave differently depending on string contents — "tenderlovemaking (Aaron Patterson)" <noreply@...>

Issue #18579 has been reported by tenderlovemaking (Aaron Patterson).

11 messages 2022/02/09

[#107547] [Ruby master Bug#18580] Range#include? inconsistency for String ranges — "zverok (Victor Shepelev)" <noreply@...>

Issue #18580 has been reported by zverok (Victor Shepelev).

10 messages 2022/02/10

[#107603] [Ruby master Feature#18589] Finer-grained constant invalidation — "kddeisz (Kevin Newton)" <noreply@...>

Issue #18589 has been reported by kddeisz (Kevin Newton).

17 messages 2022/02/16

[#107624] [Ruby master Bug#18590] String#downcase and CAPITAL LETTER I WITH DOT ABOVE — "andrykonchin (Andrew Konchin)" <noreply@...>

Issue #18590 has been reported by andrykonchin (Andrew Konchin).

13 messages 2022/02/17

[#107651] [Ruby master Misc#18591] DevMeeting-2022-03-17 — "mame (Yusuke Endoh)" <noreply@...>

Issue #18591 has been reported by mame (Yusuke Endoh).

11 messages 2022/02/18

[#107682] [Ruby master Feature#18595] Alias `String#-@` as `String#dedup` — "byroot (Jean Boussier)" <noreply@...>

Issue #18595 has been reported by byroot (Jean Boussier).

15 messages 2022/02/21

[#107699] [Ruby master Feature#18597] Strings need a named method like `dup` that doesn't duplicate if receiver is mutable — "danh337 (Dan H)" <noreply@...>

Issue #18597 has been reported by danh337 (Dan H).

18 messages 2022/02/21

[ruby-core:107709] [Ruby master Feature#18598] Add String#bytesplice

From: "Eregon (Benoit Daloze)" <noreply@...>
Date: 2022-02-22 10:48:43 UTC
List: ruby-core #107709
Issue #18598 has been updated by Eregon (Benoit Daloze).


Shouldn't a text editor use the ropes representation for Strings instead? ( https://en.wikipedia.org/wiki/Rope_(data_structure) )
This sounds very inefficient because bytesplice will need to copy everything after the insert if the `inserted_bytes.length != length`.

That's more of a personal opinion but I always found `splice` arguments and semantics confusing, also in JavaScript.
`[]=` at least makes it much clearer, but `s.bytesplice(2, 3, "x")` sounds like a C API to me.
If we do add this I would suggest only adding the Range version for simplicity.

I think for byteindex & byteoffset in #13110 there was good motivation, and Ruby internally would anyway need to use byte offsets so exposing those to the user seemed relatively harmless, and it needed as you showed very complex hacks.
But here I question the need for it, because the code before bytesplice seems reasonable enough, i.e., the code before https://github.com/shugo/textbringer/pull/31/files seems fine enough.
It's also a very specific use case, I would like to see other use cases if we add a core method to String.

There are also other ways to solve this, where I think you semantically want a byte array/buffer which can be shown as text and searched:
* Use UTF-32LE/UTF-32BE to have constant indexing of Strings, then `[]=` works fine
* Can the String be kept as Encoding::BINARY all the time, why does it need to be UTF-8? Can it just be reencoded to UTF-8 in the few places which really need it?
* Do not use String and e.g. use an Array of byte values or a C extension
* Use Ropes or similar implemented in Ruby, which would avoid extra copying and might not need to use byte offsets at all
* Add some way to have a "cursor object" in a String, which knows both the byte index and the character index, and have its own methods, that would be much more general and could help improve the performance in far more cases (e.g., could also yield such a cursor in some `each_char_with_cursor` method). It's probably too tricky to implement correctly when the String is mutable though.

----------------------------------------
Feature #18598: Add String#bytesplice
https://bugs.ruby-lang.org/issues/18598#change-96630

* Author: shugo (Shugo Maeda)
* Status: Open
* Priority: Normal
----------------------------------------
I withdrew the proposal of String#bytesplice in #13110 because it may cause problems if the specified offset does not land on character boundary.
But how about to raise IndexError in such cases?

```
# encoding: utf-8

s = "あいうえおかきくけこ"
s.bytesplice(9, 6, "xx")
p s #=> "あいうxxかきくけこ"
s.bytesplice(2, 3, "x") #=> offset 2 does not land on character boundary (IndexError)
s.bytesplice(3, 4, "x") #=> offset 7 does not land on character boundary (IndexError)
```

## Pull request

https://github.com/ruby/ruby/pull/5584

## Spec

```
bytesplice(index, length, str) -> string
bytesplice(range, str)         -> string
```

Replaces some or all of the content of +self+ with +str+, and returns +str+.
The portion of the string affected is determined using the same criteria as String#byteslice, except that +length+ cannot be omitted.
If the replacement string is not the same length as the text it is replacing, the string will be adjusted accordingly.
The form that take an Integer will raise an IndexError if the value is out of range; the Range form will raise a RangeError.
If the beginning or ending offset does not land on character (codepoint) boundary, an IndexError will be raised.

## Motivation

On a text editor [Textbringer](https://github.com/shugo/textbringer/pull/31/files), the content of a buffer is represented by a String whose encoding is ASCII-8BIT, and `force_encoding(Encoding::UTF_8)` is called when necessary.
It's because point (cursor position) and marks are represented by byte offsets for performance, and currently there is no way to modify UTF-8 strings with byte offsets.
If String#bytesplice is introduced, the content of a text buffer can be represented by a UTF-8 string, and force_encoding can be removed: https://github.com/shugo/textbringer/pull/31/files




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread