Bash Scripting: Learn to use REGEX (Part 2- Intermediate)
In our earlier tutorial, we learned to use regex with some basic concepts & we learned about meta-characters & learned to use those meta-chracters to create some easy but effective regex terms. We will move onto some advanced concepts of regex.
(Recommended Read: Working with Vi/Vim Editor : Advanced concepts)
In this tutorial, we will be learning about shorthand characters, word boundaries & anchors. Below mentioned is the list of these,
These are actually shortcuts for most used range regex.
\s will match whitespaces i.e. a space, a tab or line break,
\d will match digits i.e. 0-9, we can also use [0-9] instead
\w will match all the word characters(A-z a-z) also includes _ (underscore)
\S opposite of \s, will match all that are not whitespaces
\D opposite of \d, will match all that are not digits
\W opposite of \w, will match all that are not word characters.
These identifies the boundaries associated with word.
\< used for beginning of the word
\> used for end of the word
\b used for either beginning or end of the word
Similar to word boundaries, these are associated with lines.
^ used to beginning of the line
$ used for end of the line
We will now discuss these with some examples,
\s & \S
As mentioned above, \s will locate all the whitespaces from the file. To use it,
$ grep “\s” file1
& to locate all that are not whitespaces, we use \S
$ grep “\S” file1
\d & \D
Now these are interesting metacharacters to use. We have been using range metacharacter i.e. [0-9], whenever we needed to have search related to numbers. Instead we can use \d to locate all the words that have numbers in it.
$ grep -p “\d” file2
You can also use following regex for the same search term,
$ grep “[0-9]” file2
Now if we need to locate opposite of above i.e. search all that are not digits ,
$ grep -p “\D” file2
Note:– You might have noticed that we used ‘-p’ option with grep. That is because ‘\d’ is a pcre (Pearl compatible regular expression) term & grep does not identify it by default, so we need to use ‘-p ’ option with grep to make is understand ‘\d’.
\w & \W
\w is used to locate all the word characters i.e. a-z , A-Z & it also includes _ (underscore) . Underscore is included with \w, as it’s commonly used in programming especially for defining a variable or function name. To use it
$ grep “\w” file3
& to do opposite, search all the non-word characters,
$ grep “\W” file3
To locate a word with a character at the start , use
$ grep “/<S” file4
To locate a word with a character at the end, use
$ grep “/>t” file4
& to locate a word with a character either at the start or at the end of the word, use
$ grep “/bt” file4
These work in same way as word boundaries but they are used to lines rather than words. To search for a word with particular character the start of the line, use
$ grep “^s” file5
Another example would be
$ grep -P “^\d” file5
This will locate the lines that are starting with digits. To search for all the lines that end with a particular character at end, use
$ grep “y$” file5
This will locate all the lines that end with letter y.
With this we end our tutorial on how to use regex . In the next & final tutorial on regex concepts, we will learn some other advanced regex concepts after which we should be able to create & use regex with ease. We will also share some examples of how to use regex properly after we have completed the tutorials on regex concepts .
If have any queries/questions regarding this tutorial, please leave them in the comment box below.