Ruby : Regular Expression for Quick Extraction

For those who are not familiar with Ruby, it is an object-oriented, open-source and general-purpose programming language which aims for focus and simplicity.


Many programming languages today utilize regular expressions to simplify things but Ruby tops them all.  It has prepared a number of tricks that can make it much easier to work with.  In other languages, you may find yourself frequently matching data in an object or from a few hundred global variables from where you can pull out the data that you need.  Nevertheless, Ruby makes your life easier if you just know even a few of its tricks.



First, is what we call a capture group, a part of the regular expression confined within parentheses.  The point of a capture group is to match a larger string, however this only refers to only a smaller portion of the string.  For instance, you want to match a number followed by a string.  Your regular expression should look something like: /\d+\w+/.


Just remember that you can use capture groups to refer number and word portions such as /(\d+)(\w+)/ if the purpose is not to match the whole string but also divide it into its components.  Next, the thread-local and method-local variables $1 and $2 will then have the digits and word respectively, but this requires two calls such as: “123test”.match(/(\d+)(\w+)/); puts $1.


However, there is a downside for this: 1) If something needs two statements like this, it’s will be very tough to put into a method call chain. 2) If you refactor this code later and put another match between two statements, the $1 variable will then be overwritten.  You are then required to store the actual MatchData object from match, requiring you again of another variable and method call.  Now, to avoid this complication, we can fix allof this by using the string index operator.


The string index operator, an extremely versatile method, will let you give a regular expression and return anything that matches that regular expression.  Most of the time though, string index operator is used with integers and ranges to approach substrings.


For instance, you have “123test”[/\d+/] that will return 123 (pretty handy for fast procurement of a portion of the string.  However, if what you want to do is to grab the digits in a string only when they’re followed by some word characters, then observe this: “123 test” won’t work since there is a space in the way.  This kind of 123 is not what you have in mind, but the 123 from strings that look like “123test”. To obtain, you must put something like “123test”[/\d+\w+] but possibly it reflects both the digits and the word.


On the other hand, if you want to use another variant of the string index method which involves both regular expression and an integer.  This variant will match the regular expression with the string and pull a capture group out of the match data, so we have to put the part we want in a capture group to begin with.  The code will now be something like: “123test”[/(\d+)\w+/, 1].


That’s it—you were able to grab the part you needed from a non-trivial regular expression match in a single statement through Ruby’s flexible string index operator.