Bash Tips #4 – Error Handling in Bash Scripts

There is no single way to handle unexpected behavior in our bash scripts. By default, bash simply ignores any failed calls and proceeds with the execution. Therefore we have to explicitly implement behavior that would help us handle such events. I would like to present a few simple techniques that can be used in these situations.

Let’s start with a simple example

mv /non/existent/path /tmp/file1

mv process exits with a return code of 1 due to the file not existing:

mv: cannot stat '/non/existent/path': No such file or directory

Checking the return code

We check the return code of the part that could fail, if it is not zero then we print a message and exit.

mv /non/existent/path /tmp/file1
if [[ $? != 0 ]]; then
    echo "mv failed"
    exit 1
fi

The example above can be simplified by using an || (or) operator:

mv /non/existent/path /tmp/file1 || exit 1

It works in such a way that if the command on the left of the || fails, the command on the right gets executed. We can extend it with command grouping:

mv /non/existent/path /tmp/file1 || { echo "mv failed"; exit 1; }

This functionality is equivalent to the example using if, but it is much more compact and easy to read.

Setting ‘e’ bash option

By using set builtin command and setting e option, whenever a command fails the whole bash process exits with the failed command’s exit code. Of course, such behavior does not occur in until or while loops, if-tests, and list constructs. With a proper logging setup, there is no need to explicitly print any information about failed steps. We can rely on programs printing error messages to stderr and having them logged, so we can just abort the execution.

Here is our example extended with a logging setup presented in one of the previous articles

source includes/logging.sh "script4.log"
set -e
mv /non/existent/path /tmp/file1

when run it produces a nice log and stops the script execution on mv failure:

[2023-01-17T13:35:34+01:00]  + set -e
[2023-01-17T13:35:34+01:00]  + mv /non/existent/path /tmp/file1
mv: cannot stat '/non/existent/path': No such file or directory

But what if we expect a step to fail and we want to perform certain actions if this happens? With e option enabled we can use or and and operators to achieve functionality similar to try-catch:

mv /non/existent/path /tmp/file1 || {
    echo "mv failed, but it's expected to happen"
}

This way the script would not exit on mv failure, but the block after || would be executed. There is one caveat – if any command in the “catch” block fails, the whole script will fail.

To further extend this example we can perform command grouping. Please note that the return code of a group of commands is the return code of the last command run:

{
    echo "try block"
    false
} || {
    echo "catch block"
    echo "..."
}

This can be further extended to achieve the functionality of try-catch-finally:

{
    echo "try block" &&
    false &&
    echo "try block after failed command"
} || {
    echo "catch block" &&
    true  &&
    echo "..."
} || { FAILED_CATCH="$?"; } ; {
    echo "finally block" &&
    false
} || { FAILED_FINALLY="$?"; } ; {
    if [[ ! -z $FAILED_CATCH ]]; then exit $FAILED_CATCH; fi
    if [[ ! -z $FAILED_FINALLY ]]; then exit $FAILED_FINALLY; fi
}

Before I start to explain what is going on here, I would like to say that I advise against using such complex structures. The code is not a well-known pattern nor it is easy to read.

Here we go:

  1. && is a logical AND, || is a logical OR, ; is just a command separator that allows us to run a few commands in a single line
  2. cmd1 || cmd2 || cmd3 – OR conditions are evaluated from left to right until the first success.
  3. Braces { } are used to improve readability, there is no need to use them if all commands are joined together using && operator.
  4. && operators joining all commands in every block ensure that the execution will not continue after the first failure. The whole block either fails if any of the commands fail or succeeds if none fail.
  5. The high-level overview can be represented as { try } OR { catch } OR { register that catch_failed=true } ; { finally } OR { register that finally_failed=true } ; { if catch or finally failed then exit }
  6. The flow is as follows
    • If try fails, then catch is executed, if catch fails a block setting a variable is executed and it always succeeds.
    • finally block gets executed, if it fails, a block setting a variable is executed and it always succeeds.
    • A block with if statements is executed that checks whether either catch or finally blocks failed and if any did, it exits the script with return code of the failed command.

Using traps

I have presented the concept of traps in the article about logging. Let’s briefly recap the concept: whenever a certain situation occurs, a command is executed. In the context of error handling, we will be interested in the ERR trap, which is executed whenever a command fails. Potential usage ranges from performing a cleanup of temporary files or rolling back changes to informing users about failure. Take a look at an example of ERR trap:

trap "echo 'ERROR: An error occurred during execution, check log $LOGFILE for details.' >&3" ERR

It is a line taken from my article on logging. Here an ERR trap is used to display a warning message.

For more complex tasks we can define a function and reference it in a trap:

source includes/logging.sh "script4.log"
set -e

TEST_FILE_PATH='/tmp/dummy-marker'
cleanup() {
  echo "A command failed with return code of $?" >&3
  rm -f "$TEST_FILE_PATH"
}

trap 'cleanup' ERR # from now on, if any command fails cleanup is executed
trap 'cleanup' INT # do int for SIGINT as well (ctrl + c)

echo "Creating a test file"
touch $TEST_FILE_PATH
echo "echo before executing a command that exits with non-zero return code"
false
echo "echo after the command is executed

We should note here that only a single trap can be defined for a given signal. In the example above ERR trap defined by the logging script being sourced is redefined and cleanup function is called instead. This is something you should be aware of.

When the process gets killed

There are some cases where we cannot rely on any of the mechanisms described above – if our bash process gets terminated by the operating system due to programming failure, no error handling code will help us recover. Take a look at the following script:

segfault.sh

#!/bin/bash
function_that_generates_a_segfault() {
  ulimit -s 1
  function_that_generates_a_segfault
}
function_that_generates_a_segfault

The program generates a segmentation fault (memory access error) which causes process termination by the operating system. Take a look at what would happen if we were to run this program from our script:

#!/bin/bash
source includes/logging.sh "script4.log"
set -e
./segfault.sh || {
    echo "Operation failed"
}
echo "..."

The following log gets created:

[2023-01-17T15:45:53+01:00]  + set -e
[2023-01-17T15:45:53+01:00]  + ./segfault.sh
4-segfault.sh: line 6:  4382 Segmentation fault  	./segfault.sh
[2023-01-17T15:45:53+01:00]  + echo 'Operation failed'
Operation failed
[2023-01-17T15:45:53+01:00]  + echo ...
...

The segfault.sh process gets terminated and our error handling code works. But what would happen if it is our main shell process that gets terminated? Let’s find out by running the segmentation fault generating code inside our script:

#!/bin/bash
source includes/logging.sh "script4.log"
set -e


function_that_generates_a_segfault() {
  ulimit -s 1
  function_that_generates_a_segfault
}
function_that_generates_a_segfault


function_that_generates_a_segfault || {
    echo "Operation failed"
}
echo "..."

This is the resulting log:

[2023-01-17T15:48:51+01:00]  + set -e
[2023-01-17T15:48:51+01:00]  + function_that_generates_a_segfault
+ ulimit -s 1
+ function_that_generates_a_segfault 
(repeated a few times)

The “catch” block is not executed nor is the echo after it, as the process is terminated by the operating system. The e did not save us from having the process killed. I personally do not think we have to account for such behavior, it is presented here as a fun fact.

Summary

I presented a few different tools that can be used to make error handling easier. To summarize the article in one simple piece of advice, I encourage everyone to use the e bash option and explicitly handle “expected” failures. Having an error message and simply exiting the script is much better than the default behavior of printing the error and continuing the execution. The knowledge I have shared in this article is somewhat basic, but I hope anyone can find a thing or two to improve their scripts.

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *