[ruby-core:85462] Re: [Ruby trunk Bug#14357] thread_safe tests suite segfaults

From: Vladimir Makarov <vmakarov@...>
Date: 2018-02-07 15:36:32 UTC
List: ruby-core #85462

On 02/06/2018 02:38 PM, Eric Wong wrote:
> Vladimir Makarov <vmakarov@redhat.com> wrote:
>> On 02/06/2018 05:00 AM, Eric Wong wrote:
>>> during rebuild.  Disabling the free(tab->entries) at line
>>> st.c:792 (patch below) seems to indicate success with the
>>> thread_safe test suite (letting it loop overnight).
> It still crashed after four runs :<   It might run longer with
> the simplecov/coveralls stuff commented out in spec_helper.rb
> since coverage creates a giant hash and might increase the chance
> of failure.
>
>> Eric, thank you for working on the problem and analyzing it. I'll look at
>> this and try to fix it as soon as possible.
>
I reproduced this crash although the reproducing is not stable with or 
without valgrind.

It is a typical data race.  The same problem existed in the **old hash 
tables**.  It also rebuilt tables and freed old data structure.

**File st.c was never thread-safe**.  The data races are/were possible 
in many places.

We could make st.c thread-safe.  But I don't think it is a right way.  
It is not a trivial task and it also will hurt performance 
considerably.  We still needs thread-unaware level to work with hash 
tables (st.c) for cases when tables are used internally in one thread.

So I think the crash should be fixed in other places where calls of st.c 
happen.

I don't know how it should be fixed because I don't know Ruby thread 
semantics.  Does Ruby guarantee that there are no data races or should a 
ruby programmer still provides thread synchronization despite GIL?  If 
it is later, thread_safe gem is probably buggy because one thread 
reading a table and another thread inserting elements while process 
table in a Ruby block.  If there is no sync it is a typical data race 
and the result is unpredictable.  In this case it is a segfault crash.  
We could just give a better message about the data races if segfault 
happens in st.c.

Also I don't know how GIL works.  Where the thread switching can 
happen.  Is the switch possible in find_table_ind or  we just read 
unsync cashed value in the thread because st.c never used atomics.

Unfortunately I am not well familiar with Ruby threads so it is hard for 
me to say how to fix it.  I only think that we should keep st.c 
thread-unaware as it always was.


Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

In This Thread