Using regex named capture groups to process lines in a CSV file in ruby.

Imagine you are processing a CSV file with information from weather stations. single line might look like this Hamburg;22.0,Berlin;18.45,Tokyo;11.23,New York;4.20\n The regex library in ruby allows for named captures. What this means is that you can specify a name for the text being matched. For instance /(?[\w\s]+)/ The ? is the name of the capture group w+(\s\w+). If we run the code "Hamburg;22.0".match(/(?[\w\s]+)/).named_captures This will return a ruby Hash that looks like this {"city"=>"Hamburg"} We can use multiple named captures like this. However, if we want to return all matched captures in the string, we use the .scan method instead. This will return an array of arrays, with each internal array being a matched capture. For instance: "Hamburg;22.0,Berlin;18.45,Tokyo;11.23,New York;4.20\n".scan(/(?[\w\s]+);(?\d+(\.\d+)?)/) will return [["Hamburg", "22.0"], ["Berlin", "18.45"], ["Tokyo", "11.23"], ["New York", "4.20"]] This makes it markedly easier to process the data. Named captures are pretty cool. Reference: Regex Named Captures

Mar 16, 2025 - 08:29
 0
Using regex named capture groups to process lines in a CSV file in ruby.

Imagine you are processing a CSV file with information from weather stations. single line might look like this

Hamburg;22.0,Berlin;18.45,Tokyo;11.23,New York;4.20\n

The regex library in ruby allows for named captures. What this means is that you can specify a name for the text being matched. For instance

/(?[\w\s]+)/

The ? is the name of the capture group w+(\s\w+).

If we run the code

"Hamburg;22.0".match(/(?[\w\s]+)/).named_captures

This will return a ruby Hash that looks like this

{"city"=>"Hamburg"}

We can use multiple named captures like this. However, if we want to return all matched captures in the string, we use the .scan method instead. This will return an array of arrays, with each internal array being a matched capture.

For instance:

"Hamburg;22.0,Berlin;18.45,Tokyo;11.23,New York;4.20\n".scan(/(?[\w\s]+);(?\d+(\.\d+)?)/)

will return

[["Hamburg", "22.0"], ["Berlin", "18.45"], ["Tokyo", "11.23"], ["New York", "4.20"]]

This makes it markedly easier to process the data.

Named captures are pretty cool.

Reference: Regex Named Captures