[#105882] [Ruby master Bug#18280] Segmentation Fault in rb_utf8_str_new_cstr(NULL) — "ukolovda (Dmitry Ukolov)" <noreply@...>

Issue #18280 has been reported by ukolovda (Dmitry Ukolov).

13 messages 2021/11/01

[#105897] [Ruby master Bug#18282] Rails CI raises Segmentation fault with ruby 3.1.0dev supporting `Class#descendants` — "yahonda (Yasuo Honda)" <noreply@...>

Issue #18282 has been reported by yahonda (Yasuo Honda).

12 messages 2021/11/02

[#105909] [Ruby master Misc#18285] NoMethodError#message uses a lot of CPU/is really expensive to call — "ivoanjo (Ivo Anjo)" <noreply@...>

Issue #18285 has been reported by ivoanjo (Ivo Anjo).

37 messages 2021/11/02

[#105920] [Ruby master Bug#18286] Universal arm64/x86_84 binary built on an x86_64 machine segfaults/is killed on arm64 — "ccaviness (Clay Caviness)" <noreply@...>

Issue #18286 has been reported by ccaviness (Clay Caviness).

16 messages 2021/11/03

[#105928] [Ruby master Feature#18287] Support nil value for sort in Dir.glob — "Strech (Sergey Fedorov)" <noreply@...>

Issue #18287 has been reported by Strech (Sergey Fedorov).

16 messages 2021/11/04

[#105944] [Ruby master Bug#18289] Enumerable#to_a should delegate keyword arguments to #each — "Ethan (Ethan -)" <noreply@...>

Issue #18289 has been reported by Ethan (Ethan -).

8 messages 2021/11/05

[#105967] [Ruby master Bug#18293] Time.at in master branch was 25% slower then Ruby 3.0 — "watson1978 (Shizuo Fujita)" <noreply@...>

Issue #18293 has been reported by watson1978 (Shizuo Fujita).

17 messages 2021/11/08

[#106008] [Ruby master Bug#18296] Custom exception formatting should override `Exception#full_message`. — "ioquatix (Samuel Williams)" <noreply@...>

Issue #18296 has been reported by ioquatix (Samuel Williams).

14 messages 2021/11/10

[#106033] [Ruby master Bug#18330] Make failure on 32-bit Linux (Android) with Clang due to implicit 64-to-32-bit integer truncation — "xtkoba (Tee KOBAYASHI)" <noreply@...>

Issue #18330 has been reported by xtkoba (Tee KOBAYASHI).

10 messages 2021/11/11

[#106053] [Ruby master Misc#18335] openindiana ruby 3.1 preview needs --disable-dtrace — "stes (David Stes)" <noreply@...>

Issue #18335 has been reported by stes (David Stes).

14 messages 2021/11/14

[#106069] [Ruby master Feature#18339] GVL instrumentation API — "byroot (Jean Boussier)" <noreply@...>

Issue #18339 has been reported by byroot (Jean Boussier).

13 messages 2021/11/15

[#106145] [Ruby master Misc#18346] DevelopersMeeting20211209Japan — "mame (Yusuke Endoh)" <noreply@...>

Issue #18346 has been reported by mame (Yusuke Endoh).

11 messages 2021/11/18

[#106173] [Ruby master Feature#18349] Let --jit enable YJIT — "k0kubun (Takashi Kokubun)" <noreply@...>

Issue #18349 has been reported by k0kubun (Takashi Kokubun).

8 messages 2021/11/19

[#106175] [Ruby master Feature#18351] Support anonymous rest and keyword rest argument forwarding — "jeremyevans0 (Jeremy Evans)" <noreply@...>

Issue #18351 has been reported by jeremyevans0 (Jeremy Evans).

10 messages 2021/11/19

[#106279] [Ruby master Feature#18364] Add GC.stat_size_pool for Variable Width Allocation — "peterzhu2118 (Peter Zhu)" <noreply@...>

Issue #18364 has been reported by peterzhu2118 (Peter Zhu).

14 messages 2021/11/25

[#106308] [Ruby master Feature#18367] Stop the interpreter from escaping error messages — "mame (Yusuke Endoh)" <noreply@...>

Issue #18367 has been reported by mame (Yusuke Endoh).

13 messages 2021/11/29

[#106314] [Ruby master Feature#18368] Range#step semantics for non-Numeric ranges — "zverok (Victor Shepelev)" <noreply@...>

Issue #18368 has been reported by zverok (Victor Shepelev).

39 messages 2021/11/29

[#106341] [Ruby master Bug#18369] users.detect(:name, "Dorian") as shorthand for users.detect { |user| user.name == "Dorian" } — dorianmariefr <noreply@...>

Issue #18369 has been reported by dorianmariefr (Dorian Mari辿).

14 messages 2021/11/30

[#106347] [Ruby master Feature#18370] Call Exception#full_message to print exceptions reaching the top-level — "Eregon (Benoit Daloze)" <noreply@...>

Issue #18370 has been reported by Eregon (Benoit Daloze).

10 messages 2021/11/30

[ruby-core:105998] [Ruby master Feature#18020] Introduce `IO::Buffer` for fiber scheduler.

From: "mame (Yusuke Endoh)" <noreply@...>
Date: 2021-11-10 08:06:27 UTC
List: ruby-core #105998
Issue #18020 has been updated by mame (Yusuke Endoh).


The change caused SEGV on Solaris.

http://rubyci.s3.amazonaws.com/solaris10-gcc/ruby-master/log/20211110T070003Z.fail.html.gz
```
Thread 8 (Thread 84 (LWP 84)):
#0  0xfef2054c in __systemcall6 () from /lib/libc.so.1
#1  0xfef05d18 in __lwp_sigmask () from /lib/libc.so.1
#2  0xfef0f6d4 in call_user_handler () from /lib/libc.so.1
#3  <signal handler called>
#4  0xfef1ef94 in _waitid () from /lib/libc.so.1
#5  0xfeebfa48 in _waitpid () from /lib/libc.so.1
#6  0xfef0e920 in waitpid () from /lib/libc.so.1
#7  0xfef0163c in system () from /lib/libc.so.1
#8  0x0025f854 in rb_vm_bugreport (ctx=ctx@entry=0x5f18880) at vm_dump.c:1016
#9  0x0002ba9c in rb_bug_for_fatal_signal (default_sighandler=0x0, sig=sig@entry=11, ctx=ctx@entry=0x5f18880, fmt=0x347590 "Segmentation fault at %p") at error.c:820
#10 0x001ae5dc in sigsegv (sig=11, info=0x5f18b38, ctx=0x5f18880) at signal.c:964
#11 <signal handler called>
#12 0x001f5534 in rb_fd_set (n=<optimized out>, fds=fds@entry=0xf87cf6fc) at thread.c:4019
#13 0x00079ae0 in nogvl_wait_for (events=1, fptr=0x4c2de78, th=<optimized out>) at io.c:11294
#14 internal_read_func (ptr=ptr@entry=0xf87cf840) at io.c:1096
#15 0x001fa4a0 in rb_thread_io_blocking_region (func=0x799d4 <internal_read_func>, data1=data1@entry=0xf87cf840, fd=12) at thread.c:1824
#16 0x0007dc04 in rb_read_internal (count=8192, buf=0x5671ce0, fptr=0x4c2de78) at io.c:1160
#17 io_fillbuf (fptr=0x4c2de78) at io.c:2352
#18 0x00083318 in rb_io_getline_fast (chomp=0, enc=0x460658, fptr=0x4c2de78) at io.c:3616
#19 rb_io_getline_0 (rs=rs@entry=4274416760, limit=limit@entry=-1, chomp=chomp@entry=0, fptr=fptr@entry=0x4c2de78) at io.c:3731
#20 0x000836fc in rb_io_getline_1 (rs=4274416760, limit=-1, chomp=0, io=4113323920) at io.c:3827
#21 0x0008384c in rb_io_getline (io=4113323920, argv=0xf87cfe70, argc=0) at io.c:3847
#22 rb_io_gets_m (argc=argc@entry=0, argv=argv@entry=0xf87cfe70, io=io@entry=4113323920) at io.c:3902
#23 0x00231d58 in ractor_safe_call_cfunc_m1 (recv=4113323920, argc=0, argv=0xf87cfe70, func=0x837f8 <rb_io_gets_m>) at vm_insnhelper.c:2835
#24 0x0023a080 in vm_call_cfunc_with_frame (ec=0x5359674, reg_cfp=0xf884fe10, calling=<optimized out>) at vm_insnhelper.c:3025
#25 0x0023e16c in vm_sendish (ec=0x5359674, reg_cfp=0xf884fe10, cd=0x2b80030, block_handler=<optimized out>, method_explorer=mexp_search_method) at vm_insnhelper.c:4651
#26 0x0024e63c in vm_exec_core (ec=0x5359674, initial=4294967292) at insns.def:777
#27 0x00245f44 in rb_vm_exec (ec=<optimized out>, mjit_enable_p=<optimized out>) at vm.c:2196
#28 0x002486fc in rb_vm_invoke_proc (ec=0x5359674, proc=proc@entry=0x6251d38, argc=argc@entry=0, argv=argv@entry=0xf87cfde0, kw_splat=0, passed_block_handler=passed_block_handler@entry=0) at vm.c:1519
#29 0x001f74b0 in thread_do_start_proc (th=th@entry=0x45c8c08) at thread.c:735
#30 0x001f930c in thread_do_start (th=0x45c8c08) at thread.c:754
#31 thread_start_func_2 (th=th@entry=0x45c8c08, stack_start=0xf884ff60) at thread.c:828
#32 0x001f94f4 in thread_start_func_1 (th_ptr=<optimized out>) at thread_pthread.c:1047
#33 0xfef1afa0 in _lwp_start () from /lib/libc.so.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
```

The commit not only introduces IO::Buffer but also includes many changes against IO internal. It heavily uses `rb_io_t *` instead of a file descriptor, but accessing `rb_io_t *` without GVL requires special care, and IO internal heavily uses "nogvl" code.

----------------------------------------
Feature #18020: Introduce `IO::Buffer` for fiber scheduler.
https://bugs.ruby-lang.org/issues/18020#change-94544

* Author: ioquatix (Samuel Williams)
* Status: Closed
* Priority: Normal
----------------------------------------
After continuing to build out the fiber scheduler interface and the specific hooks required for `io_uring`, I found some trouble within the implementation of `IO`.

I found that in some cases, we need to read into the internal IO buffers directly. I tried creating a "fake string" in order to transit back into the Ruby fiber scheduler interface and this did work to a certain extent, but I was told we cannot expose fake string to Ruby scheduler interface.

So, after this, and many other frustrations with using `String` as a IO buffer, I decided to implement a low level `IO::Buffer` based on my needs for high performance IO, and as part of the fiber scheduler interface.

Here is roughly the interface implemented by the scheduler w.r.t. the buffer:

```ruby
class Scheduler
  # @parameter buffer [IO::Buffer] Buffer for reading into.
  def io_read(io, buffer, length)
    # implementation provided by `read` system call, IO_URING_READV, etc.
  end

  # @parameter buffer [IO::Buffer] Buffer for writing from.
  def io_write(io, buffer, length)
    # implementation provided by `write` system call, IO_URING_WRITEV, etc.
  end

  # Potential new hooks (Socket#recvmsg, sendmsg, etc):
  def io_recvmsg(io, buffer, length)
  end
end
```

In reviewing other language designs, I found that this design is very similar to Crystal's IO buffering strategy.

The proposed implementation provides enough of an interface to implement both native schedulers as well as pure Ruby schedulers. It also provides some extra functionality for interpreting the data in the buffer. This is mostly for testing and experimentation, although it might make sense to expose this interface for binary protocols like HTTP/2, QUIC, WebSockets, etc.

## Proposed Solution

We introduce new class `IO::Buffer`.

```ruby
class IO::Buffer
  # @returns [IO::Buffer] A buffer with the contents of the string data.
  def self.for(string)
  end

  PAGE_SIZE = # ... operating system page size

  # @returns [IO::Buffer] A buffer with the contents of the file mapped to memory.
  def self.map(file)
  end

  # Flags for buffer state.
  EXTERNAL = # The buffer is from external memory.
  INTERNAL = # The buffer is from internal memory (malloc).
  MAPPED = # The buffer is from mapped memory (mmap, VirtualAlloc, etc)
  LOCKED = # The buffer is locked for usage (cannot be resized)
  PRIVATE = # The buffer is mapped as copy-on-write.
  IMMUTABLE = # The buffer cannot be modified.

  # @returns [IO::Buffer] A buffer with the specified size, allocated according to the given flags.
  def initialize(size, flags)
  end

  # @returns [Integral] The size of the buffer
  attr :size

  # @returns [String] A brief summary and hex dump of the buffer.
  def inspect
  end

  # @returns [String] A brief summary of the buffer.
  def to_s
  end

  # Flag predicates:
  def external?
  end

  def internal?
  end

  def mapped?
  end

  def locked?
  end

  def immutable?
  end

  # Flags for endian/byte order:
  LITTLE_ENDIAN = # ...
  BIG_ENDIAN = # ...
  HOST_ENDIAN = # ...
  NETWORK_ENDIAN= # ...

  # Lock the buffer (prevent resize, unmap, changes to base and size).
  def lock
    raise "Already locked!" if flags & LOCKED
    
    flags |= LOCKED
  end

  # Unlock the buffer.
  def unlock
    raise "Not locked!" unless flags & LOCKED
    
    flags |= ~LOCKED
  end

  // Manipulation:
  # @returns [IO::Buffer] A slice of the buffer's data. Does not copy.
  def slice(offset, length)
  end

  # @returns [String] A binary string starting at offset, length bytes.
  def to_str(offset, length)
  end

  # Copy the specified string into the buffer at the given offset.
  def copy(string, offset)
  end

  # Compare two buffers.
  def <=>(other)
  end

  include Comparable

  # Resize the buffer, preserving the given length (if non-zero).
  def resize(size, preserve = 0)
  end

  # Clear the buffer to the specified value.
  def clear(value = 0, offset = 0, length = (@size - offset))
  end

  # Data Types:
  # Lower case: little endian.
  # Upper case: big endian (network endian).
  #
  # :U8        | unsigned 8-bit integer.
  # :S8        | signed 8-bit integer.
  #
  # :u16, :U16 | unsigned 16-bit integer.
  # :s16, :S16 | signed 16-bit integer.
  #
  # :u32, :U32 | unsigned 32-bit integer.
  # :s32, :S32 | signed 32-bit integer.
  #
  # :u64, :U64 | unsigned 64-bit integer.
  # :s64, :S64 | signed 64-bit integer.
  #
  # :f32, :F32 | 32-bit floating point number.
  # :f64, :F64 | 64-bit floating point number.

  # Get the given data type at the specified offset.
  def get(type, offset)
  end

  # Set the given value as the specified data type at the specified offset.
  def set(type, offset, value)
  end
end
```

The C interface provides a few convenient methods for accessing the underlying data buffer:

```c
void rb_io_buffer_get_mutable(VALUE self, void **base, size_t *size);
void rb_io_buffer_get_immutable(VALUE self, const void **base, size_t *size);
```

In the fiber scheduler, it is used like this:

```c
VALUE
rb_fiber_scheduler_io_read_memory(VALUE scheduler, VALUE io, void *base, size_t size, size_t length)
{
    VALUE buffer = rb_io_buffer_new(base, size, RB_IO_BUFFER_LOCKED);

    VALUE result = rb_fiber_scheduler_io_read(scheduler, io, buffer, length);

    rb_io_buffer_free(buffer);

    return result;
}
```

This function is invoked from `io.c` at various places to fill the buffer. We specifically the `(base, size)` tuple, along with `length` which is the *minimum* length required and assists with efficient non-blocking implementation.

The `uring.c` implementation in the event gem uses this interface like so:

```c
VALUE Event_Backend_URing_io_read(VALUE self, VALUE fiber, VALUE io, VALUE buffer, VALUE _length) {
	struct Event_Backend_URing *data = NULL;
	TypedData_Get_Struct(self, struct Event_Backend_URing, &Event_Backend_URing_Type, data);
	
	int descriptor = RB_NUM2INT(rb_funcall(io, id_fileno, 0));
	
	void *base;
	size_t size;
	rb_io_buffer_get_mutable(buffer, &base, &size);
	
	size_t offset = 0;
	size_t length = NUM2SIZET(_length);
	
	while (length > 0) {
		size_t maximum_size = size - offset;
		int result = io_read(data, fiber, descriptor, (char*)base+offset, maximum_size);
		
		if (result == 0) {
			break;
		} else if (result > 0) {
			offset += result;
			if ((size_t)result > length) break;
			length -= result;
		} else if (-result == EAGAIN || -result == EWOULDBLOCK) {
			Event_Backend_URing_io_wait(self, fiber, io, RB_INT2NUM(READABLE));
		} else {
			rb_syserr_fail(-result, strerror(-result));
		}
	}
	
	return SIZET2NUM(offset);
}
```

## Buffer Allocation

The Linux kernel provides some advanced mechanisms for registering buffers for asynchronous I/O to reduce per-operation overhead.

> The io_uring_register() system call registers user buffers or files for use in an io_uring(7) instance referenced by fd. Registering files or user buffers allows the kernel to take long term references to internal data structures or create long term mappings of application memory, greatly reducing per-I/O overhead.

With appropriate support, we can use `IORING_OP_PROVIDE_BUFFERS` to efficiently manage buffers in applications which are dealing with lots of sockets. See <https://lore.kernel.org/io-uring/20200228203053.25023-1-axboe@kernel.dk/T/> for more details about how it works. I'm still exploring the performance implications of this, but the proposed implementation provides sufficient meta-data for us to explore this in real world schedulers.

PR: https://github.com/ruby/ruby/pull/4621



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread