Bash Scripting: Learn to use Basic REGEX (Part 2)

In our earlier tutorial, we learned to use basic regex concepts & we learned about meta-characters & learned to use those meta-characters to create some easy but effective regex terms. We will move onto some advanced concepts of regex.

(Recommended Read: Working with Vi/Vim Editor: Advanced concepts)

In this tutorial, we will be learning about shorthand characters, word boundaries & anchors. Below mentioned is the list of these,

Shorthand Characters

These are actually shortcuts for the most used range regex.

\s will match whitespaces i.e. space, a tab or line break,

\d will match digits i.e. 0-9, we can also use [0-9] instead

\w will match all the word characters(A-z a-z) also includes _ (underscore)

\S opposite of \s, will match all that are not whitespaces

\D opposite of \d, will match all that are not digits

\W opposite of \w, will match all that is not word characters.

Word Boundaries

These identify the boundaries associated with words.

\< used for the beginning of the word

\> used for end of the word

\b used for either beginning or end of the word

Anchor

Similar to word boundaries, these are associated with lines.

^ used to beginning of the line

$ used for end of the line

We will now discuss these with some examples,

\s & \S

As mentioned above, \s will locate all the whitespaces from the file. To use it,

$ grep “\s” file1

& to locate all that are not whitespaces, we use \S

$ grep “\S” file1

\d & \D

Now, these are interesting metacharacters to use. We have been using range metacharacter i.e. [0-9], whenever we needed to have a search related to numbers. Instead, we can use \d to locate all the words that have numbers in them.

$ grep -p “\d” file2

You can also use the following regex for the same search term,

$ grep “[0-9]” file2

Now if we need to locate opposite of above i.e. search all that are not digits,

$ grep -p “\D” file2

Note:-- You might have noticed that we used ‘-p’ option with grep. That is because ‘\d’ is a pcre (Pearl compatible regular expression) term & grep does not identify it by default, so we need to use ‘-p ’ option with grep to make is understand ‘\d’.

\w & \W

\w is used to locate all the word characters i.e. a-z, A-Z & it also includes _ (underscore). Underscore is included with \w, as it’s commonly used in programming especially for defining a variable or function name. To use it

$ grep “\w” file3

& to do the opposite, search all the non-word characters,

$ grep “\W” file3

Word Boundaries

To locate a word with a character at the start, use

$ grep “\<S” file4

To locate a word with a character at the end, use

$ grep “\>t” file4

& to locate a word with a character either at the start or at the end of the word, use

$ grep “\bt” file4

Anchors

These work in the same way as word boundaries but they are used to lines rather than words. To search for a word with the particular character at the start of the line, use

$ grep “^s” file5

Another example would be

$ grep -P “^\d” file5

This will locate the lines that are starting with digits. To search for all the lines that end with a particular character at the end, use

$ grep “y$” file5

This will locate all the lines that end with the letter y.

Note:- You can also use this REGEX tester tool from EXTENDSCLASS to check your regex expressions.

With this, we end our tutorial on how to use basic Regex. In the next & final tutorial on regex concepts, we will learn some other advanced regex concepts after which we should be able to create & use regex with ease. We will also share some examples of how to use regex properly after we have completed the tutorials on regex concepts.

If have any queries/questions regarding this tutorial, please leave them in the comment box below.

If you think we have helped you or just want to support us, please consider these:-

Connect to us: Facebook | Twitter

Linux TechLab is thankful for your continued support.

3 Comments

Joe Smith on December 21, 2018

‘$ grep -p “\D” file2’ should be ‘$ grep -P “\D” file2’ with a capitol P. -P = PCRE, -p = unrecognized.
‘$ grep “/<S” file4' and '$ grep “/<t” file4' and '$ grep “/bt” file4' all need backslashes.

Robert on September 10, 2020

$ grep “/<S” file4 ther is used wrong slash

- Shusain on September 11, 2020
  
  thanks for pointing that out.

Press ESC to close

Shorthand Characters

Word Boundaries

Anchor

\s & \S

\d & \D

\w & \W

Word Boundaries

Anchors

Share Article:

How can we free up Drive space on Macbooks ?

Monitoring Linux system resources using SAR (System Activity Report)

3 Comments

Leave a Reply Cancel reply