Black Friday/Cyber Monday Sale

Ends Monday, Nov. 27th CST

#Using Filtering Commands to Process Files in Linux

This guide aims to educate users of Linux on how to manipulate data with the help of various filtering commands such as `cat`, `fmt`, `pr`, and others in Linux.  

However, before we delve straight into how we can combine two files together or extract a particular group of data, let’s examine the architecture of the Linux operating system and review the relationship between the shell and kernel. Linux.

We will then focus on how to use redirection and pipes to filter text or data files in linux.

##Identifying the Linux Architecture

The Linux operating system consists of three main components, namely:

+ Kernel

+ Shell

+ Linux utilities and Application Programs

The shell is an interface that hides intricate details from the user; therefore, it is not necessary for users to have knowledge of every service, such as the networking processes beneath the hardware.

Instead, Linux utilities and application programs are a collection of programs that service day-to-day processing requirements. These programs are invoked through the shell.

##STREAMS, REDIRECTION AND PIPES

Streams, redirection, and pipes are some of the most powerful Linux command tools. In Linux, streams simply mean the movement of an input from one program to another program as an output.

Usually, input comes from the keyboard and output from the screen. Moreover, with pipes, we can simplify complex task and make them less daunting to execute. Let’s take a look at the types of streams in Linux. Afterwards, we will look at redirection and pipes and address how to combine redirection and pipes with filtering commands.

##TYPES OF STREAMS

**STANDARD INPUT**: Programs accept keyboard input via the standard input or STDIN. Usually, data from comes the through the keyboard. Hence, Linux refers to the keyboard as the STDIN.

**STANDARD OUTPUT**: Data from programs is displayed to users via the screen. Therefore, the screen acts as the STOUT in Linux.

**STANDARD ERROR**: This particular stream is similar to the standard output because both display data on the screen to the users, but the standard error carries more information and messages to the user.

##REDIRECTION: INPUT AND OUTPUT

In Linux, to redirect input and output, the following symbols (redirectors) are used :><>><<###REDIRECTING OUTPUT

Now, let’s try out few examples for the output redirector on the bash shell.

Create a directory via the `mkdir`command, and then create a file in the directory. Finally, input or `echo` a message to the specified file:

             ```mkdir LinuxAcademy```

Now, create a file in the directory LinuxAcademy by echoing a message to the specified file.

 ```echo “How to use filtering commands in Linux” > Tutorial.txt ```

First, we create the directory /Linux Academy. We are going to use the LinuxAcademy directory as our current directory for this project; however, you can change the directory name to whatever name you want to use.

Afterwards, in the current directory, pass or echo the message “Linux Academy is here”

    echo “Linux Academy is here” > Tutorial.txt  The input redirector creates a new file containing the standard output to a specified file. If the specified file exists, it is overwritten.

There is alternative way of passing the output to the file. Instead of outputting the message straight to the file disk via the ` echo` command, we can also use the `cat` command to pass the output to the receiving file.

Now let’s use the`cat` command to push the output of the file Tutorials.txt into Lessons.txt.

         ```cat Tutorial.txt > Lessons.txt```

To verify whether we have indeed successfully done so, we use the `cat` command to echo the content in Lessons.txt.

Also we can append output to a file using this redirector: >> .

Add new content in Tutorial.txt, then use the cat command to display the output of the Tutorial.txt on the screen.

echo “ ```There many filtering commands in Linux such as the cat command, uniq, head, tail, pr, and others in Linux for manipulating texts```.”

>>Tutorial.txt

Let’s add the content in Tutorials.txt to existing content in Lessons.txt.

```cat Tutorial.txt >> Lessons.txt```

The above command simply appends the content in Tutorial.txt to existing content in Lessons.txt

```cat Lessons.txt```

##REDIRECTING OUTPUT (ERROR)

Now, let’s see how we can differentiate both output redirectors from each other. With the redirector output for error messages, we need to place the number 2before the redirector.

For instance, if you are not sure whether the file you are searching for is in the working directory, you can safely refer the output to another specified file in order to prevent error message shown on the screen directly.

  ```cat Linuxbox.txt 2>> Dumpbin.txt``` 

###REDIRECTING INPUT

With the input redirector, instead of input coming from the keyboard, it comes from a file which is then passed to a command such as cat to review the contents of the specified file on the screen.

        ```cat < Tutorial.txt```

##PIPES

Let’s take a couple of seconds to understand piping in Linux. Understanding piping is critical to understanding filtering process commands in Linux.

Piping simply means trimming the output of the first or initial command to make the final outcome look simple and customized.

With piping, utilities do not have to be written to perform complex tasks. Piping simplifies complex tasks.

For instance, assume you want to review or display the content of the file CountryDatabase.txt, which holds the information of countries in some parts of the world. We can use the combination of the `cat` command and the `more` or `less` commands to do so.

The more command allows you to view content.

Below is the content of the CountryDatabase.txt:

Ghana Accra

Nigeria Lagos

Liberia Monrovia

U.S.A Washington

Mali Bamako

France Paris

Germany Berlin

Togo Lome

Italy Rome

England London

Spain Madrid

Russia Moscow

Ireland Dublin

Zambia Lusaka

Sweden Stockholm

        ```cat CountryDatabase.txt | more```

##FILTERING COMMANDS

In this final section, we are going to learn how we can use filtering commands in Linux to process text or content in files. We will cover some of most commonly used filtering commands, as well as some rarer ones.

Filtering commands in Linux helps us to manipulate text in one or the other. These commands can be used for combining files, transforming data in files, formatting text, displaying text, and summarizing data.

##How to Combine Files with cat

While the cat command is commonly used to review or display contents of files, you can also use it combine two files together.

Now, let’s combine two files via the cat command and output redirection. We will use the content in Countrystats.txt and content in Countryratio.txt. The content of Countryratio.txt is shown below: 10,000

11,000

12,000

13,000

22,000

34,000

45,000

23,000

1,000

23,000

34,000

23, 678

23, 000

23, 456

15, 600

Now, we can combine two files via the cat command and output redirection. The cat command combines two files together but do not join files by side. We can use the join command to do that.

   ```cat Countrystats.txt Countryratio.txt > Combined.txt```

Note that while scrolling through the output of combined files can be tedious, we can use commands such as `more` or `less` to make it more viewer-friendly. The `more`and `less` commands show data one screen at a time. Scroll to the next part by using the spacebar or the arrow keys. `less` also allows you to scroll backwards through the text.

There are other options that you can use with alongside the `cat` command. For instance, use`-n`or`--number` to add a number at every beginning of a line:

```cat -n Combined.txt ```

If you want to see where line ends, just include `-E` or `--show-ends option`. This adds the dollar sign at every end of a line:

```cat -E Countrystats.txt```

We can use the command `paste` to merge files line by line:

``` paste City.txt Countryratio.txt```

The output of the above command is shown below:

Accra 10,000

Lagos 11,000

Monrovia12,000

Bamako 13,000

Paris 22,000

Berlin 34,000

Lome 45,000

Rome 23,000

London 1,000

Madrid 23,000

Moscow 34,000

Dublin 23, 678

Lusaka 23, 000

##Reversing the Order of File Content

There is one command known as `tac`, which works as a reverse `cat` – It reverses the order of lines in a file. Type:

tac City.txt

The output of the above command is shown below:

Dublin

Moscow

Madrid

London

Rome

Lome

Berlin

Paris

Bamako

Monrovia

Lagos

Accra

##Delete repeated lines with uniq

We can use the `uniq` command to remove duplicated lines in a file. Let’s modify our file a bit and repeat a couple of cities in the file City.txt .Bamako and Lagos are repeated in the file.

Accra

Lagos

Lagos

Berlin

Bamako

Monrovia

Bamako

Paris

Berlin

Lome

Rome

London

Madrid

Moscow

Dublin

Lusaka

Stockholm

The output of uniq City.txt command is show below:

Accra

Lagos

Monrovia

Bamako

Paris

Berlin

Lome

Rome

London

Madrid

Moscow

Dublin

Lusaka

## View Octal Dump of File

There is also another filtering command in Linux known as `od`, which allows files to be displayed in Octal. `od` simply stands for **octal dump**.

Octal dump is needed when you want to investigate the structure of a data file, graphics file, audio files and so on.

Apart from the octal outptut, the `od` command can also generate output in hexadecimal, ASCII and decimal output. For detailed information on how to use od just type `man od`.

```od City.txt```

The output of the od command is shown below:

0000000 061501 071143 020141 005015 005015 060514 067547 006563

0000020 006412 046412 067157 067562 064566 020141 005015 005015

0000040 060502 060555 067553 006440 006412 050012 071141 071551

0000060 006440 006412 041012 071145 064554 020156 005015 005015

0000100 067514 062555 006440 020012 020040 005015 067522 062555

0000120 006440 020012 005015 067514 062156 067157 006440 020012

0000140 020040 005015 060515 071144 062151 005015 005015 067515

0000160 061563 073557 006440 006412 042012 061165 064554 020156

0000200 005015 020040 005015 072514 060563 060553 020040 005015

0000220 005015 072123 061557 064153 066157 020155

## Sort File

Sometimes we want to sort files in our own way. The `sort` command allows file sorting in diverse ways via option such as `-i` or `--ignore-case`, `-M` or `--month-sort`,`-n` or `--numeric-sort`,`-r` or `--reverse` and others. As usual, check the man page of sort command for more options.

Now let’s try to sort out the City.txt using the `-M` option. The `-M` option sorts data file using the three-letter month. i.e. JAN-DEC.

     ```sort -M City.txt```

The output of the above command is shown below:

Accra

Bamako

Berlin

Dublin

Lagos

Lome

London

Lusaka

Madrid

Monrovia

Moscow

Paris

Rome

Stockholm

##File-formatting commands

Let’s try our hands some file-formatting commands in Linux.

###Using the pr Command to Prepare Files for Printing

The `pr` command in Linux allows you to preparing data files for printing by preparing it with headers, footers, and page breaks.

```pr Countrydata.txt```

2017-05-26 15:28 Countrydata.txt Page 1

Accra 10,000

Lagos 11,000

Monrovia12,000

Bamako 13,000

Paris 22,000

Berlin 34,000

Lome 45,000

Rome 23,000

London 1,000

Madrid 23,000

Moscow 34,000

Dublin 23, 678

Lusaka 23, 000


As you can see, the moment we execute the `pr` command, it prepares our data file by implementing headers, footers and so on. The `pr` command has changed the structure of our data file. There are many options that accompanies with the `pr` command.

##Using the fmt Command

Now, let’s try to use the `fmt` command to format our data file. The `fmt` command helps us to format irregular lengths, paragraphs, and lines in data files:

```fmt Countryratio.txt```

##Using the n1 for numbering lines

Now let’s try to use the **n1 command** to number lines in the City.txt

```n1 City.txt```

The output of the command is shown below:

1 Accra

2 Lagos

3 Monrovia

4 Bamako

5 Paris

6 Berlin

7 Lome

8 Rome

9 London

10 Madrid

11 Moscow

12 Dublin

13 Lusaka

14 Stockholm

## Word Count Command

Finally, let’s check out the word counting commands such as `wc`. We can use the wc command to count the number of words in a data file. The syntax for the `wc` is quite simple:

```wc City.txt```

The output of the above command is shown below:

```26  14  156 City.txt```

As you can see, the `wc` command displayed 26 lines, 14 words, and 156 bytes.

However, instead of displaying the numbers of words (`--words`), newlines or lines (`--lines`) andbytes (`--bytes`), we can restrict or limit the `wc` command to just one of its options.

```wc --words City.txt```

## Conclusion

These are not all the list of filtering commands in Linux. There are other commands including the `head`, `tail` and `cut` commands.

Having knowledge of filtering commands in relation to pipes and redirection in Linux, you can simplify complex tasks to simpler, basic tasks.

Before we end, let’s look at a few examples of how we can combine redirectors, pipes and filtering commands to produce customised output.

Now, let’s assume we want to erase duplicates in a file, count the number of words remaining, and redirect the output to another file. To erase duplicates in a file, we need to use the `uniq` command. To count the number of words in a file, we would use`wc --word`. The output redirector must also be included. 

```uniq [filename1] |wc--words>[filename2]```

In addition, we can also redirect the output into filename2 as well as display the output immediately on the screen via the `Tee` command.

```uniq[filename1] | Tee [filename2]```

Once you keep on practicing using these filters in line with redirectors and pipes, you can easily include them in your scripts to simplify complex task and produce suitable outputs.

































   

























  • post-author-pic
    Kevin J
    06-26-2017

    Nice.  Well written.  Thank you.

Looking For Team Training?

Learn More