[ruby-core:93022] [Ruby trunk Feature#14912] Introduce pattern matching syntax
From:
email@...
Date:
2019-06-08 11:56:42 UTC
List:
ruby-core #93022
Issue #14912 has been updated by pitr.ch (Petr Chalupa).
Hi, I am really looking forward to this feature. Looks great!
However I'd like to make few suggestions which I believe should be part of the first pattern matching experimental release. I'll include use-cases and try to explain why it would be better to do so.
### (1) Pattern matching as first-class citizen
Everything in Ruby is dynamically accessible (methods, classes, blocks, etc.) so it would be pity if patterns would be an exception from that. There should be an object which will represent the pattern and which can be lifted from the pattern literal.
It may seem that just wrapping the pattern in a lambda as follows is enough to get an object which represents the pattern.
```ruby
-> value do
case value
in (1..10) => i
do_something_with i
end
end
```
In some cases it is sufficient however lets explore some interesting use cases which cannot be implemented without the first-class pattern-matching.
**First use-case** to consider is searching for a value in a data structure. Let's assume we have a data-structure (e.g. some in memory database) and we want to provide an API to search for an element with a pattern matching e.g. `#search`. The structure stores log messages as follows `["severity", "message"]`. Then something as follows would be desirable.
```ruby
def drain_erros(data)
# pops all messages matching at least one pattern
# and evalueates the appropriate branch with the destructured log message
# for each matched message
data.pop_all case
in ["fatal", message]
deal_with_fatal message
in ["error", message]
deal_with_error message
end
end
```
There are few things to consider. Compared to the already working implementation there is no message given to the case since that will be later provided in the pop_all method. Therefore the case in here has to evaluate to an object which encapsulates the pattern matching allowing to match candidates from the data-structure later in the pop_all implementation. Another important feature is that the object has to allow to match a candidate without immediately evaluating the appropriate branch. It has to give the pop_all method a chance to remove the element from the data-structure first before the arbitrary user code from the branch is evaluated. That is especially important if the data-structure is thread-safe and does locking, then it cannot hold the lock while it runs arbitrary user code. Firstly it limits the concurrency since no other operation can be executed on the data-structure and secondly it can lead to deadlocks since the common recommendation is to never call a user code w
hile still holding an internal lock.
Probably the simplest implementation which would allow the use-case work is to make case in without a message given behave as a syntax sugar for following.
```ruby
case
in [/A/, b]
b.succ
end
# turns into
-> value do
case value
in [/A/, b]
-> { b.succ }
end
end
```
Then the implementation of pop_all could then look as follows.
```ruby
def pop_all(pattern)
each do |candidate|
# assuming each locks the structure to read the candidate
# but releases the lock while executing the block which could
# be arbitrary user code
branch_continuation = pattern.call(candidate)
if branch_continuation
# candidate matched
delete candidate # takes a lock internally to delete the element
branck_continuation.call
end
end
end
```
In this example it never leaks the inner lock.
**Second use case** which somewhat expands the first one is to be able to implement `receive` method of the concurrent abstraction called Actors. (`receive` blocks until matching message is received.) Let's consider an actor which receives 2 Integers adds them together and then replies to an actor which asks for a result with `[:sum, myself]` message then it terminates.
```ruby
Actor.act do
# block until frist number is received
first = receive case
in Numeric => value
value
end
# block until second number is received, then add them
sum = first + receive case
in Numeric => value
value
end
# when a :sum command is received with the sender reference
# send sum back
receive case
in [:sum, sender]
sender.send sum
end
end
```
It would be great if we could use pattern matching for messages as it is used in Erlang and in Elixir.
The receive method as the `pop_all` method needs to first find the first matching message in the mailbox without running the user code immediately, then it needs to take the matching message from the Actor's mailbox (while locking the mailbox temporarily) before it can be passed to the arbitrary user code in the case branch (without the lock held).
If `case in` without message is first class it could be useful to also have shortcut to define simple mono patterns.
```ruby
case
in [:sum, sender]
sender.send sum
end
# could be just
in [:sum, sender] { sender.send sum }
```
```ruby
case
in ["fatal", _] -> message
message
end
# could be just, default block being identity function
in ["fatal", _]
```
Then the Actor example could be written only as follows:
```ruby
Actor.act do
# block until frist number is received
first = receive in Numeric
# block until second number is received, then add them
sum = first + receive in Numeric
# when a :sum command is received with the sender reference
# send sum back
receive in [:sum, sender] { sender.send sum }
end
```
### (2) Matching of non symbol key Hashes
This was already mentioned as one of the problems to be looked at in future in the RubyKaigi's talk. If `=>` is taken for as pattern then it cannot be used to match hashes with non-Symbol keys. I would suggest to use just `=` instead, so `var = pat`. Supporting non-Symbol hashes is important for use cases like:
1. Matching data loaded from JSON where keys are strings
```ruby
case { "name" => "Gustav", **other_data }
in "name" => (name = /^Gu.*/), **other
name #=> "Gustav"
other #=> other_data
end
```
2. Using pattern to match the key
```ruby
# let's assume v1 of a protocol sends massege {foo: data}
# but v2 sends {FOO: data},
# where data stays the same in both versions,
# then it is desirable to have one not 2 branches
case message_as_hash
in (:foo | :FOO) => data
process data
end
```
Could that work or is there a problem with parsing `=` in the pattern?
## Note about `in [:sum, sender] { sender.send sum }`
`in [:sum, sender] { sender.send sum }` is quite similar to `->` syntax for lambdas. However in this suggestion above it would be de-sugared to `-> value { case value; in [:sum, sender]; -> { sender.send sum }}` which is not intuitive. A solution to consider would be to not to de-sugar the branch into another inner lambda but allow to check if an object matches the pattern (basically asking if the partial function represented by the block with a pattern match accepts the object). Then the example of implementing pop_all would look as follows.
```ruby
def pop_all(pattern)
each do |candidate|
# assuming each locks the structure to read the candidate
# but releases the lock while executing the block which could
# be arbitrary user code
# does not execute the branches only returns true/false
if pattern.matches?(candidate)
# candidate matched
delete candidate # takes a lock internally to delete the element
pattern.call candidate
end
end
end
```
What are your thoughts?
Do you think this could become part of the first pattern matching release?
----------------------------------------
Feature #14912: Introduce pattern matching syntax
https://bugs.ruby-lang.org/issues/14912#change-78398
* Author: ktsj (Kazuki Tsujimoto)
* Status: Assigned
* Priority: Normal
* Assignee: ktsj (Kazuki Tsujimoto)
* Target version: 2.7
----------------------------------------
I propose new pattern matching syntax.
# Pattern syntax
Here's a summary of pattern syntax.
```
# case version
case expr
in pat [if|unless cond]
...
in pat [if|unless cond]
...
else
...
end
pat: var # Variable pattern. It matches any value, and binds the variable name to that value.
| literal # Value pattern. The pattern matches an object such that pattern === object.
| Constant # Ditto.
| var_ # Ditto. It is equivalent to pin operator in Elixir.
| (pat, ..., *var, pat, ..., id:, id: pat, ..., **var) # Deconstructing pattern. See below for more details.
| pat(pat, ...) # Ditto. Syntactic sugar of (pat, pat, ...).
| pat, ... # Ditto. You can omit the parenthesis (top-level only).
| pat | pat | ... # Alternative pattern. The pattern matches if any of pats match.
| pat => var # As pattern. Bind the variable to the value if pat match.
# one-liner version
$(pat, ...) = expr # Deconstructing pattern.
```
The patterns are run in sequence until the first one that matches.
If no pattern matches and no else clause, NoMatchingPatternError exception is raised.
## Deconstructing pattern
This is similar to Extractor in Scala.
The patten matches if:
* An object have #deconstruct method
* Return value of #deconstruct method must be Array or Hash, and it matches sub patterns of this
```
class Array
alias deconstruct itself
end
case [1, 2, 3, d: 4, e: 5, f: 6]
in a, *b, c, d:, e: Integer | Float => i, **f
p a #=> 1
p b #=> [2]
p c #=> 3
p d #=> 4
p i #=> 5
p f #=> {f: 6}
e #=> NameError
end
```
This pattern can be used as one-liner version like destructuring assignment.
```
class Hash
alias deconstruct itself
end
$(x:, y: (_, z)) = {x: 0, y: [1, 2]}
p x #=> 0
p z #=> 2
```
# Sample code
```
class Struct
def deconstruct; [self] + values; end
end
A = Struct.new(:a, :b)
case A[0, 1]
in (A, 1, 1)
:not_match
in A(x, 1) # Syntactic sugar of above
p x #=> 0
end
```
```
require 'json'
$(x:, y: (_, z)) = JSON.parse('{"x": 0, "y": [1, 2]}', symbolize_names: true)
p x #=> 0
p z #=> 2
```
# Implementation
* https://github.com/k-tsj/ruby/tree/pm2.7-prototype
* Test code: https://github.com/k-tsj/ruby/blob/pm2.7-prototype/test_syntax.rb
# Design policy
* Keep compatibility
* Don't define new reserved words
* 0 conflict in parse.y. It passes test/test-all
* Be Ruby-ish
* Powerful Array, Hash support
* Encourage duck typing style
* etc
* Optimize syntax for major use case
* You can see several real use cases of pattern matching at following links :)
* https://github.com/k-tsj/power_assert/blob/8e9e0399a032936e3e3f3c1f06e0d038565f8044/lib/power_assert.rb#L106
* https://github.com/k-tsj/pattern-match/network/dependents
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>