[ruby-core:109564] [Ruby master Feature#18965] Further Thread::Queue improvements
From:
"byroot (Jean Boussier)" <noreply@...>
Date:
2022-08-18 19:59:35 UTC
List:
ruby-core #109564
Issue #18965 has been updated by byroot (Jean Boussier).
> That would mean basically ignoring the timeout, not great.
Maybe my writing wasn't clear. I don't suggest to ignore the timeout, but to either append all items or none. If there isn't enough space available, wait for as long as the timeout allows you to.
> I think for these batch push/pop, they should be motivated with benchmarks (ideally based on some real use-case) and a proof-of-concept PR.
Like the previous changes I requested on queue, this is motivated by a real world use case, basically I'd like to use a SizedQueue in this code: https://github.com/Shopify/statsd-instrument/blob/da8a74c7fbe8fdd421aa1df7eda2a996bc7c3f11/lib/statsd/instrument/batched_udp_sink.rb#L93-L123
This library has a background thread that monitors a queue, everytime the queue has X items or that Y seconds have passed, it flushes the queue and emit `statsd` events.
I can try implementing the suggested feature and then benchmark that code using it.
> The kwarg seems a bit confusing given the existing positional arg (Queue#pop(non_block=false)).
Indeed, in my mind the positional argument would be somewhat deprecated.
> Is there any core method currently with a nonblock keyword argument?
I don't think so.
> Why not exception: false which seems more standard and established, i.e., queue.pop(true, exception: false)? Too verbose/inconvenient?
I think the main reason is that I don't like that positional argument. But yes `exception: false` would be consistent with `IO#read_nonblock`, except that if passed alone it does nothing, which is awkward.
> so it's just the default implementation adding one by one, hence addAll() is not atomic
I don't think that's really a problem, at least not for the use cases I envision, of a background thread collecting work and processing it in batches. Worst case if it's not atomic, the background thread will get the first element, and then get the rest on its next pop.
----------------------------------------
Feature #18965: Further Thread::Queue improvements
https://bugs.ruby-lang.org/issues/18965#change-98735
* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
Following the recent addition of a `timeout` parameter to `Queue#pop`, there are a handful of other improvements I'd like to make.
### Batch insert
When using the queue for batch processing, it would be good to be able to push multiple elements at once:
Currently you have to call `push` repeatedly
```ruby
items.each do |item|
queue.push(item)
end
```
That's wasteful because on each call we check wether the queue is closed, try to wakeup blocked threads, etc.
It would be much better if you could do:
```ruby
queue.concat(items)
```
With of course both `nonblock` and `timeout` support.
Then there's the question of how `SizedQueue` would behave if it's not full, but still doesn't have space for all the elements. e.g.
```ruby
queue = SizedQueue.new(10)
queue.concat(6.times.to_a)
queue.concat(6.times.to_a) # Block until there is 6 free slots?
```
I think the simplest would be to wait for enough space to append the entire set, because combined with a timeout, it would be awkward if only part of the array was concatenated.
### Batch pop
Similarly, sometimes the consumer of a queue is capable of batching, and right now it's not efficient:
```ruby
loop do
items = [queue.pop]
begin
99.times do
items << queue.pop(true) # true is for nonblock
end
rescue ThreadError # empty queue
end
process_items(items)
end
```
It would be much more efficient if `pop` accepted a `count` parameter:
```ruby
loop do
items = queue.pop(count: 100)
process_items(items)
end
```
The behavior would be:
- Block if the queue is empty
- If it's not empty, return **up to** `count` items (Just like `Array#pop`)
### Non blocking mode, without exception
As shown above, the current `nonblock` parameter is a bit awkward, because:
- It raises an exception, which is very expensive for a construct often used in "low level" code.
- The exception is `ThreadError`, so you may have to match the error message for `"queue empty"`, to make sure it doesn't come from a Mutex issue or something like that.
I believe that we could introduce a keyword argument:
```ruby
Queue.new.pop(nonblock: true) # => nil
```
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>