Linux/UNIX - Using grep With Regular Expressions

Searching for Lines Containing Patterns 

There will be many occasions when you are trying locate a specific set of lines in a file, such as a log file, or perhaps you are trying filter the results that have come back from a Linux or Unix command to just the ones relevant to your specific needs.

The grep command is perfect in these situations and we explore some of it’s capabilities here.

grep – Global Regular Expression Print

Linux and UNIX systems offer three variants of the grep command:

  • grep
  • egrep
  • fgrep

grep supports basic regular expression characters and the other two support some of the more more advanced regular expression characters.

The basic characters supported by grep are:

  • [….], [^….], ^, $, ., *, \

Here is a brief description of these special characters

  • list of characters enclosed by [ and ] matches any single character in that list (if first  character  is the caret ^ then it matches any character not in the list)
  • The caret ^ at the start of a string matches and the empty string at the beginning of the line
  • The dollar sign $ at the end of a string matches the empty string at the end of a line 
  • The period .  matches any single character.  
  • The asterisk * matches zero or more occurrences of the previous character
  • The back slash \ is an escape character

 

Search for a pattern anywhere in a line

The following example matches all lines in the ps -ef output that have sh anywhere in them:

[ptr@srva ~]$ ps -ef | grep "sh"
root       139     7  0 13:40 ?        00:00:00 [pdflush]
root       140     7  0 13:40 ?        00:00:00 [pdflush]
root      2393     1  0 13:41 ?        00:00:00 /usr/sbin/sshd
root      2849  2779  0 14:00 ?        00:00:00 /bin/sh /usr/bin/startkde
root      2903  2849  0 14:00 ?        00:00:00 /usr/bin/ssh-agent /bin/sh -c exec -l /bin/bash -c "/usr/bin/dbus-launch 
--exit-with-session /etc/X11/xinit/Xclients"
root      3062  3061  0 14:00 pts/1    00:00:00 /bin/bash
root      3089  2393  0 14:01 ?        00:00:02 sshd: root@pts/2
root      3093  3089  0 14:01 pts/2    00:00:00 -bash
ptr       3123  3122  0 14:02 pts/2    00:00:00 -bash
root      5055  2393  0 14:28 ?        00:00:00 sshd: root@pts/3
root      5063  5055  0 14:28 pts/3    00:00:00 -bash
ptr      15980  3123  0 15:11 pts/2    00:00:00 grep sh
[ptr@srva ~]$

 

Search for a pattern at the beginning of a line

The following example matches all lines in the ps -ef output that start with the string ptr:

[ptr@srva ~]$ ps -ef | grep "^ptr"
ptr       3123  3122  0 14:02 pts/2    00:00:00 -bash
ptr       3256  3123  0 14:17 pts/2    00:00:00 ps -ef
ptr       3257  3123  0 14:17 pts/2    00:00:00 grep ^ptr
[ptr@srva ~]$

Search for a pattern at the end of a line

The following example matches all lines in the ps -ef output that end in bash:

[ptr@srva ~]$ ps -ef | grep "sh$"
root      3062  3061  0 14:00 pts/1    00:00:00 /bin/bash
root      3093  3089  0 14:01 pts/2    00:00:00 -bash
ptr       3123  3122  0 14:02 pts/2    00:00:00 -bash
root      5063  5055  0 14:28 pts/3    00:00:00 -bash
[ptr@srva ~]$

Search for a pattern containing a range of characters

The following example matches all lines that contain a number in the range 1 to 6, followed by any single character, followed by a “d”.

[ptr@srva ~]$ ls -l /etc | grep "[0-6].d"
drwxr-xr-x  4 root root    4096 May 10  2012 dbus-1
drwxr-xr-x  2 root root    4096 Feb  2 17:35 default
drwxr-xr-x  2 root root    4096 May 10  2012 depmod.d
drwxr-xr-x  3 root root    4096 May 10  2012 dev.d
-rw-r--r--  1 root root     178 Mar  6  2011 dhcp6c.conf
-rw-rw-r--  1 root disk       0 Mar  6  2011 dumpdates
lrwxrwxrwx  1 root root      10 May 10  2012 rc0.d -> rc.d/rc0.d
lrwxrwxrwx  1 root root      10 May 10  2012 rc1.d -> rc.d/rc1.d
lrwxrwxrwx  1 root root      10 May 10  2012 rc2.d -> rc.d/rc2.d
lrwxrwxrwx  1 root root      10 May 10  2012 rc3.d -> rc.d/rc3.d
lrwxrwxrwx  1 root root      10 May 10  2012 rc4.d -> rc.d/rc4.d
lrwxrwxrwx  1 root root      10 May 10  2012 rc5.d -> rc.d/rc5.d
lrwxrwxrwx  1 root root      10 May 10  2012 rc6.d -> rc.d/rc6.d
[ptr@srva ~]$

We can see that the  first 6 matching lines are matching on the number at the end of the modification time follwed by a space and the d from the first letter of the file/directory name.

Search for a pattern containinga dot

If we wanted to match just the lines that contain a number followed by “.d” then we need to escape the dot “.

[ptr@srva ~]$ ls -l /etc | grep "[0-6]\.d"
lrwxrwxrwx  1 root root      10 May 10  2012 rc0.d -> rc.d/rc0.d
lrwxrwxrwx  1 root root      10 May 10  2012 rc1.d -> rc.d/rc1.d
lrwxrwxrwx  1 root root      10 May 10  2012 rc2.d -> rc.d/rc2.d
lrwxrwxrwx  1 root root      10 May 10  2012 rc3.d -> rc.d/rc3.d
lrwxrwxrwx  1 root root      10 May 10  2012 rc4.d -> rc.d/rc4.d
lrwxrwxrwx  1 root root      10 May 10  2012 rc5.d -> rc.d/rc5.d
lrwxrwxrwx  1 root root      10 May 10  2012 rc6.d -> rc.d/rc6.d
[ptr@srva ~]$

 

Search for a pattern in a specific “field”

In the following scenario we would like to match all long listing entries for files in /etc that have a size beginning with a 2.  The files in /etc/ that matched this requirement at the time of carryimng out this challenge were as follows:


-rw-r--r--  1 root root    2562 May 24  2008 a2ps-site.cfg
-rw-r--r--  1 root root     298 Mar 28  2007 anacrontab
-rw-r--r--  1 root root    2518 Jul 22  2011 DIR_COLORS
-rw-r--r--  1 root root    2420 Jul 22  2011 DIR_COLORS.xterm
-rw-r--r--  1 root root   22060 Jan  7  2007 fb.modes
lrwxrwxrwx  1 root root      22 May 10  2012 grub.conf -> ../boot/grub/grub.conf
-rw-r--r--  2 root root     241 Feb 16 13:36 hosts
-rw-r--r--  1 root root     235 Feb  3 09:47 hosts.allow
-rw-r--r--  1 root root     293 Jul 22  2011 idmapd.conf
-rw-r--r--  1 root root      28 Oct  8  2006 ld.so.conf
-rw-r--r--  1 root root    2506 Jan 31 16:54 libuser.conf
-rw-r--r--  1 root root     262 Jul  4  2011 lisarc
-rw-r--r--  1 root root     293 Jan  7  2007 mailcap
-rw-r--r--  1 root root    2706 Jul 22  2011 multipath.conf
-rw-r--r--  1 root root      25 Jan 31 13:28 pam_smb.conf
-rw-r--r--  1 root root    2431 Feb  2 17:45 passwd
-rw-------  1 root root    2489 Feb  2 13:37 passwd-
-rw-r--r--  1 root root    2875 Jan  7  2007 pinforc
-rw-r--r--  1 root root     220 May  4  2011 quotagrpadmins
-rw-r--r--  1 root root     290 May  4  2011 quotatab
-rw-r--r--  1 root root      27 Aug 29  2011 redhat-release
-rw-r--r--  1 root root     216 Apr  3  2010 sestatus.conf
-rw-r--r--  1 root root   21851 Jan  6  2007 slrn.rc
-rw-r--r--  1 root root    2643 Jan  7  2007 tux.mime.types
-rw-r--r--  1 root root    2657 May  4  2011 warnquota.conf

The first command we put together is:

[ptr@srva ~]$ ls -l /etc | grep "root   2"
-rw-r--r--  1 root root   22060 Jan  7  2007 fb.modes
-rw-r--r--  1 root root   21851 Jan  6  2007 slrn.rc
[ptr@srva ~]$

This matches only two of the lines we are after. The pattern “root   2” has exactly 3 spaces between the string root and 2. The challenge we have here is that we need the string root to indicate which number in the line we are trying to match (otherwise it would potentially match a 2 anywhere in the line and not just the size column), but we then have a varying number of spaces between the string root and the 2. Some have 3, some have 4, some have 5, and so on.

This is a job for asterisk *. Asterisk is effectively a padding character as it applies a replication to the previous character. The following example will match the string root followed by 0 or more spaces:

[ptr@srva ~]$ ls -l /etc | grep "root *2"
-rw-r--r--  1 root root    2562 May 24  2008 a2ps-site.cfg
-rw-r--r--  1 root root     298 Mar 28  2007 anacrontab
-rw-r--r--  1 root root    2518 Jul 22  2011 DIR_COLORS
-rw-r--r--  1 root root    2420 Jul 22  2011 DIR_COLORS.xterm
-rw-r--r--  1 root root   22060 Jan  7  2007 fb.modes
lrwxrwxrwx  1 root root      22 May 10  2012 grub.conf -> ../boot/grub/grub.conf
-rw-r--r--  2 root root     241 Feb 16 13:36 hosts
-rw-r--r--  1 root root     235 Feb  3 09:47 hosts.allow
-rw-r--r--  1 root root     293 Jul 22  2011 idmapd.conf
-rw-r--r--  1 root root      28 Oct  8  2006 ld.so.conf
-rw-r--r--  1 root root    2506 Jan 31 16:54 libuser.conf
-rw-r--r--  1 root root     262 Jul  4  2011 lisarc
-rw-r--r--  1 root root     293 Jan  7  2007 mailcap
-rw-r--r--  1 root root    2706 Jul 22  2011 multipath.conf
-rw-r--r--  1 root root      25 Jan 31 13:28 pam_smb.conf
-rw-r--r--  1 root root    2431 Feb  2 17:45 passwd
-rw-------  1 root root    2489 Feb  2 13:37 passwd-
-rw-r--r--  1 root root    2875 Jan  7  2007 pinforc
-rw-r--r--  1 root root     220 May  4  2011 quotagrpadmins
-rw-r--r--  1 root root     290 May  4  2011 quotatab
-rw-r--r--  1 root root      27 Aug 29  2011 redhat-release
-rw-r--r--  1 root root     216 Apr  3  2010 sestatus.conf
-rw-r--r--  1 root root   21851 Jan  6  2007 slrn.rc
-rw-r--r--  1 root root    2643 Jan  7  2007 tux.mime.types
-rw-r--r--  1 root root    2657 May  4  2011 warnquota.conf
[ptr@srva ~]$

Now we get all of the files we wanted to match.

Now we add a new file to /etc that is called root2. Running the same command as above will result in this file being matched too:

[ptr@srva ~]$ ls -l /etc | grep "root *2"
-rw-r--r--  1 root root    2562 May 24  2008 a2ps-site.cfg
-rw-r--r--  1 root root     298 Mar 28  2007 anacrontab
-rw-r--r--  1 root root    2518 Jul 22  2011 DIR_COLORS
-rw-r--r--  1 root root    2420 Jul 22  2011 DIR_COLORS.xterm
-rw-r--r--  1 root root   22060 Jan  7  2007 fb.modes
lrwxrwxrwx  1 root root      22 May 10  2012 grub.conf -> ../boot/grub/grub.conf
-rw-r--r--  2 root root     241 Feb 16 13:36 hosts
-rw-r--r--  1 root root     235 Feb  3 09:47 hosts.allow
-rw-r--r--  1 root root     293 Jul 22  2011 idmapd.conf
-rw-r--r--  1 root root      28 Oct  8  2006 ld.so.conf
-rw-r--r--  1 root root    2506 Jan 31 16:54 libuser.conf
-rw-r--r--  1 root root     262 Jul  4  2011 lisarc
-rw-r--r--  1 root root     293 Jan  7  2007 mailcap
-rw-r--r--  1 root root    2706 Jul 22  2011 multipath.conf
-rw-r--r--  1 root root      25 Jan 31 13:28 pam_smb.conf
-rw-r--r--  1 root root    2431 Feb  2 17:45 passwd
-rw-------  1 root root    2489 Feb  2 13:37 passwd-
-rw-r--r--  1 root root    2875 Jan  7  2007 pinforc
-rw-r--r--  1 root root     220 May  4  2011 quotagrpadmins
-rw-r--r--  1 root root     290 May  4  2011 quotatab
-rw-r--r--  1 root root      27 Aug 29  2011 redhat-release
-rw-r--r--  1 root root       0 Mar 17 15:34 root2
-rw-r--r--  1 root root     216 Apr  3  2010 sestatus.conf
-rw-r--r--  1 root root   21851 Jan  6  2007 slrn.rc
-rw-r--r--  1 root root    2643 Jan  7  2007 tux.mime.types
-rw-r--r--  1 root root    2657 May  4  2011 warnquota.conf
[ptr@srva ~]$

This is because the asterisk (*) represents zero or more of the previous character. To ensure that we get at least one space before the 2 we must add an extra space (space, space, asterisk):

ls -l /etc | grep "root  *2"

Now we will get the correct lines.

This command line could be improved further to cater for other directories where there may be varying owners of files:

[ptr@srva ~]$ ls -l | grep "[a-z]  *2[0-9]* [A-Z]"
-rw-rw-rw- 1 ptr  ptr      29 Feb  1  2016 f10
-rw-r--r-- 1 ptr  ptr   27719 Mar 17 17:33 rpmpkgs
-rw-r--r-- 1 root root  27719 Mar 17 17:33 rpmpkgs.1
-rw-r--r-- 1 root root  27719 Mar 17 17:33 rpmpkgs.2
-rw-r--r-- 1 root root  27719 Mar 17 17:33 rpmpkgs.3
-rw-r--r-- 1 root root  29989 Mar 17 17:33 rpmpkgs.4
-rw-r--r-- 1 ptr  ptr     261 Mar 17 17:33 vboxadd-install.log
[ptr@srva ~]$

The above pattern looks for lines that contain a lowercase letter (from the end of the group owner column), followed by one or more spaces, followed by a 2 and then zero or more digits (sizes of single or more digits beginning with a 2), followed by one space (the column separator between the size column and the modification time column, and finally followed by an uppercase letter to ensure it is the modification time column rather than the owner column that is matched.

 

Command Line Options for grep

The grep command offers a lot of options, here are a few of them:

-r        Search a directory recursively
-l        Display names of files with matching lines
-i        Ignore case
-v        Match lines that do not contain the pattern
-c        Display Matching lines with a count of how many occurrences

The following example shows a list of filenames for files in the directory /etc that contain the pattern centos1:

[root@centos1 ~]# grep -rl centos1 /etc/*
/etc/default/grub
/etc/fstab
/etc/grub2.cfg
/etc/hostname
/etc/lvm/archive/centos_centos1_00000-1405482984.vg
/etc/lvm/archive/SalesVG_00000-1684602635.vg
/etc/lvm/archive/SalesVG_00001-1891684174.vg
/etc/lvm/archive/SalesVG_00002-134759568.vg
/etc/lvm/backup/centos_centos1
/etc/lvm/backup/SalesVG
/etc/mtab
[root@centos1 ~]#

The following example shows matching lines from the set output that contain the string name in any case:

[root@centos1 ~]# set | grep -i name=
HOSTNAME=centos1.ptr.local
LOGNAME=root
        local remote_opts="--username= --config-dir= --no-auth-cache";
                                --no-auth-cache --username=
[root@centos1 ~]#

The following example shows all who output lines that do not contain the pattern root:

[root@centos1 ~]# who
ptr      :0           2017-03-31 11:01 (:0)
ptr      pts/0        2017-03-31 11:02 (:0)
root     pts/1        2017-03-31 11:07 (1.0.0.116)
[root@centos1 ~]# who | grep -v root
ptr      :0           2017-03-31 11:01 (:0)
ptr      pts/0        2017-03-31 11:02 (:0)
[root@centos1 ~]#

The following example shows how many lines in each file in and below the /etc/lvm directory contain the pattern centos1:

[root@centos1 ~]# grep -cr centos1 /etc/lvm
/etc/lvm/archive/centos_centos1_00000-1405482984.vg:4
/etc/lvm/archive/SalesVG_00000-1684602635.vg:1
/etc/lvm/archive/SalesVG_00001-1891684174.vg:1
/etc/lvm/archive/SalesVG_00002-134759568.vg:2
/etc/lvm/backup/centos_centos1:4
/etc/lvm/backup/SalesVG:7
/etc/lvm/lvm.conf:0
/etc/lvm/lvmlocal.conf:0
/etc/lvm/profile/cache-mq.profile:0
/etc/lvm/profile/cache-smq.profile:0
/etc/lvm/profile/command_profile_template.profile:0
/etc/lvm/profile/metadata_profile_template.profile:0
/etc/lvm/profile/thin-generic.profile:0
/etc/lvm/profile/thin-performance.profile:0
[root@centos1 ~]#

 

Boost Your Linux/Unix System Administrator Toolbox

grep is a hugely powerful tool that a Linux or UNIX system administrator cannot live without. egrep extends this to provide even more potential.

We will take a look at egrep and fgrep in some later articles.

 

If you have any questions do email us at info@ptr.co.uk and if you would like to learn more about Linux and Unix take a look at our Linux and UNIX Training Courses.

Share this post