Regular expressions or regex or regexp are basically strings of character that define a search pattern, they can be used for performing ‘Search’ or ‘Search & Replace’ operations as well as can be used to validate a condition like a password policy etc.
Regex is a very powerful tool that is available at our disposal & the best thing about using regex is that they can be used in almost every computer language. So if you are Bash Scripting or creating a Python program, we can use regex or we can also write a single line search query.
(Recommended Read: Bash Scripting: Learn to use REGEX (Part 2- Intermediate) )
Also Read: Important BASH tips tricks for Beginners
For this tutorial, we are going to learn some of regex basics concepts & how we can use them in Bash using ‘grep’, but if you wish to use them on other languages like python or C, you can just use the regex part. So let’s start by showing an example for regex,
Ex- A regex looks like
/t[aeiou]l/
But what does this mean? It means that the mentioned regex is going to look for a word that starts with ‘t’ , have any of the letters ‘a e I o u ’ in the middle & letter ‘l’ as the last word. It can be ‘tel’ ‘tal’ or ‘til’ / Match can be a separate word or part of another word like ‘tilt’, ‘brutal’ or ‘telephone’.
Syntax for using regex with grep is
$ grep “regex_search_term” file_location
Don't worry if it's getting over the mind, this was just an example to show what can be achieved with regex & believe me this was the simplest of the example. We can achieve much much more from the regex. We will now start regex with basics.
Regex Basics
So to learn about regex basics, We need to start learning about some special characters that are known as MetaCharacters. They help us in creating more complex regex search term. Mentioned below is the list of basic metacharacters,
. or Dot will match any character
[ ] will match a range of characters
[^ ] will match all character except for the one mentioned in braces
* will match zero or more of the preceding items
+ will match one or more of the preceding items
? will match zero or one of the preceding items
{n} will match ‘n’ numbers of preceding items
{n,} will match ‘n’ number of or more of preceding items
{n m} will match between ‘n’ & ‘m’ number of items
{ ,m} will match less than or equal to m number of items
\ is an escape character, used when we need to include one of the metacharacters is our search.
We will now discuss all these metacharacters with examples.
. or Dot
It is used to match any character that occurs in our search term. For example, we can use dot-like,
$ grep “d.g” file1
This regex means we are looking for a word that starts with ‘d’, ends with ‘g’ & can have any character in the middle in the file named ‘file_name’. Similarly, we can use dot character any number of times for our search pattern, like
T…...h
This search term will look for a word that starts with ‘T’, ends with ‘h’ & can have any six characters in the middle.
[ ]
Square braces are used to define a range of characters. For example, we need to search for some words in particular rather than matching any character,
$ grep “N[oen]n” file2
here, we are looking for a word that starts with ‘N’, ends with ‘n’ & can only have either of ‘o’ or ‘e’ or ‘n’ in the middle. We can mention from a single to any number of characters inside the square braces.
We can also define ranges like ‘a-e’ or ‘1-18’ as the list of matching characters inside square braces.
[^ ]
This is like the not operator for regex. While using [^ ], it means that our search will include all the characters except the ones mentioned inside the square braces. Example,
$ grep “St[^1-9]d” file3
This means that we can have all the words that start with ‘St’, ends with letter ‘d’ & must not contain any number from 1 to 9.
Now up until now, we were only using examples of regex that only need to look for single character in middle but what if we need to look to more than that. Let’s say we need to locate all words that start & ends with a character & can have any number of characters in the middle. That’s where we use multiplier metacharacters i.e. + * & ?.
{n}, {n. m}, {n , } or { ,m} are also some other multipliers metacharacters that we can use in our regex terms.
* (asterisk)
The following example matches any number of occurrences of the letter k, including none:
$ grep “lak*” file4
it means we can have a match with ‘lake’ or ‘la’ or ‘lakkkkk’
+
The following pattern requires that at least one occurrence of the letter k in the string be matched:
$ grep “lak+” file5
here, k at least should occur once in our search, so our results can be ‘lake’ or ‘lakkkkk’ but not ‘la’.
?
In the following pattern matches
$ grep “ba?b” file6
the string bb or bab as with ‘?’ multiplier we can have one or zero occurrences of the character.
Very important Note:
This is pretty important while using multipliers, suppose we have a regex
$ grep “S.*l” file7
And we get results with ‘small’, silly & than we also got ‘Shane is a little to play ball’. But why did we get ‘Shane is a little to play ball’, we were only looking to words in our search so why did we get the complete sentence as our output.
That’s because it satisfies our search criteria, it starts with letter ‘S’, has any number of characters in the middle & ends with letter ‘l’. So what can we do to correct our regex, so that we only get words instead of whole sentences as our output.
We need to add Meta character in the regex,
$ grep “S.*?l” file7
This will correct the behavior of our regex.
\ or Escape characters
\ is used when we need to include a character that is a metacharacter or has special meaning to regex. For example, we need to locate all the words ending with a dot, so we can use
$ grep “S.*\.” file8
This will search and match all the words that ends with a dot character.
We now have some basic idea of how the regex works with this regex basics tutorial. In our next tutorial, we will learn some advanced concepts of regex. In meanwhile practice as much as you can, create regex and try to en-corporate them in your work as much as you can. & if having any queries or questions you can leave them in the comments below.
the text uses a forward slash when discussing backslash escapes. Note that the example is correct, but not the text accompanying it.
Thanks for pointing it out. Have corrected it now.
To understand the difference between the two commands
$ grep “S.*l” file7
and
$ grep “S.*?l” file7
you must understand the difference between greedy, reluctant, and possessive quantifiers in regular expressions, e.g.,
https://docs.oracle.com/javase/tutorial/essential/regex/quant.html#difs