防御式 bash 编程

发表于 2021-08-26 更新于 2023-12-14 分类于 Linux/shell 阅读次数： Disqus：本文字数： 7.1k 阅读时长 ≈ 6 分钟

防御式 bash 编程译文
资料来源:
https://kfirlavi.herokuapp.com/blog/2012/11/14/defensive-bash-programming/
更新
1
2021.08.26 初始

导语

这是一篇在草稿里存了很久的译文,一直没有完成,最近才有时间重新整理.有一些描述上变动,大意未变.

Defensive BASH Programming

正文

这里作者提到的防御式 bash 编程,是一系列的 bash 编程范式,来防止 bash 脚本被执行恶意行为,并保持代码整洁.

不可变全局变量

尽量减少全局变量数量

使用 UPPER_CASE (下划线连接大写字母) 的命名方式

使用 readonly 修饰全局变量

使用全局变量替换 $0(当前 shell 的名称) $1(第一个传入的变量) 这样隐晦的变量名

下面是使用全局变量的示例

1
2
3

readonly PROGNAME=$(basename $0)
readonly PROGDIR=$(readlink -m $(dirname $0))
readonly ARGS="$@"

限定作用域

任何可以是局部变量的不要写成全局变量.

change_owner_of_file() {
    local filename=$1
    local user=$2
    local group=$3

    chown $user:$group $filename
}

变量命名要尽可能自解释.

通常循环中的临时变量命名成 i,同时一定不要忘记 local 修饰.

change_owner_of_files() {
    local user=$1; shift
    local group=$1; shift
    local files=$@
    local i

    for i in $files
    do
        chown $user:$group $i
    done
}

local 修饰的变量全局无法访问.

1 2	kfir@goofy ~ $ local a bash: local: can only be used in a function

Main 函数

将 bash 脚本中的操作尽量写成函数形式,整个脚本类似函数式编程.

只有一个 main 函数,main 函数内部也是变量尽量是 local

全局调用只有一个 main

main() {
    local files="/tmp/a /tmp/b"
    local i

    for i in $files
    do
        change_owner_of_file kfir users $i
    done
}
main

一切操作都写成函数

唯一的顶格写,全局运行的代码

全局变量,这是不可能改变的
main

保持代码整洁和良好的说明性.

1
2
3

main() {
    local files=$(ls /tmp | grep pid | grep -v daemon)
}

temporary_files() {
    local dir=$1

    ls $dir \
        | grep pid \
        | grep -v daemon
}

main() {
    local files=$(temporary_files /tmp)
}

下面的拆分后代码要好很多.出现问题可以直接找 temporary_files(),而不是找 main 的一条一条语句,如果写单元测试可以直接对 temporary_files 而不是 main.

如果第一种写法,直接对 main 测试,那就不是单元测试,转成全局运行调试了.

test_temporary_files() {
    local dir=/tmp

    touch $dir/a-pid1232.tmp
    touch $dir/a-pid1232-daemon.tmp

    returns "$dir/a-pid1232.tmp" temporary_files $dir

    touch $dir/b-pid1534.tmp

    returns "$dir/a-pid1232.tmp $dir/b-pid1534.tmp" temporary_files $dir
}

函数调试

使用 -x 启动脚本,-x 会输出全部执行细节.

1	bash -x my_prog.sh

函数内部使用 set -x 和 set +x,在它们中间的代码会在执行前打印.

temporary_files() {
    local dir=$1

    set -x
    ls $dir \
        | grep pid \
        | grep -v daemon
    set +x
}

输出函数名和参数

temporary_files() {
    echo $FUNCNAME $@
    local dir=$1

    ls $dir \
        | grep pid \
        | grep -v daemon
}

执行 temporary_files /tmp 会得到 temporary_files /tmp 的输出.

代码自解释性

先来看一段代码,除了在脑中模拟一遍执行,你能直接看懂吗?

main() {
    local dir=/tmp

    [[ -z $dir ]] \
        && do_something...

    [[ -n $dir ]] \
        && do_something...

    [[ -f $dir ]] \
        && do_something...

    [[ -d $dir ]] \
        && do_something...
}
main

拆分后就好多了,虽然付出了更多的行数,但是非常清晰明了.

is_empty() {
    local var=$1

    [[ -z $var ]]
}

is_not_empty() {
    local var=$1

    [[ -n $var ]]
}

is_file() {
    local file=$1

    [[ -f $file ]]
}

is_dir() {
    local dir=$1

    [[ -d $dir ]]
}

main() {
    local dir=/tmp

    is_empty $dir \
        && do_something...

    is_not_empty $dir \
        && do_something...

    is_file $dir \
        && do_something...

    is_dir $dir \
        && do_something...
}
main

每一行只做一件事

因为 shell 中大量使用管道等,造成一行命令可能会完成很多功能,固然强大,但是可读性不好.

尽量使用 \ 将原来一行非常复杂的命令拆分成几行,一行只完成一个功能.

temporary_files() {
    local dir=$1

    ls $dir | grep pid | grep -v daemon #可读性不好
}

temporary_files() {
    local dir=$1

    ls $dir \
        | grep pid \
        | grep -v daemon #这样好太多了
}

连接的符号要放在这样拆分的一行开头

不好的例子

temporary_files() {
    local dir=$1

    ls $dir | \
        grep pid | \
        grep -v daemon
}

好的例子

print_dir_if_not_empty() {
    local dir=$1

    is_empty $dir \
        && echo "dir is empty" \
        || echo "dir=$dir"
}

正确使用输出

不要出现下面这样的代码

1
2
3

echo "this prog does:..."
echo "flags:"
echo "-h print help"

正确的示例

usage() {
    echo "this prog does:..."
    echo "flags:"
    echo "-h print help"
}

虽然包裹进函数了,但是 echo 还在每一行都有重复.使用 here 文档.

usage() {
    cat <<- EOF
    usage: $PROGNAME options
    
    Program deletes files from filesystems to release space. 
    It gets config file that define fileystem paths to work on, and whitelist rules to 
    keep certain files.

    OPTIONS:
       -c --config              configuration file containing the rules. use --help-config to see the syntax.
       -n --pretend             do not really delete, just how what you are going to do.
       -t --test                run unit test to check the program
       -v --verbose             Verbose. You can specify more then one -v to have more verbose
       -x --debug               debug
       -h --help                show this help
          --help-config         configuration help

    
    Examples:
       Run all tests:
       $PROGNAME --test all

       Run specific test:
       $PROGNAME --test test_string.sh

       Run:
       $PROGNAME --config /path/to/config/$PROGNAME.conf

       Just show what you are going to do:
       $PROGNAME -vn -c /path/to/config/$PROGNAME.conf
    EOF
}

注意每一行开头都要有真正的制表符 \t,vim 中如果你的制表符是 4 个空格,可以使用下面的替换命令.

:s/^    /\t/

命令行参数

这里作者用了 Kirk’s blog post - bash shell script to use getopts with gnu style long positional parameters 的一段代码补充上面的例子.(这算是 gnu 风格的参数?)

cmdline() {
    # got this idea from here:
    # http://kirk.webfinish.com/2009/10/bash-shell-script-to-use-getopts-with-gnu-style-long-positional-parameters/
    local arg=
    for arg
    do
        local delim=""
        case "$arg" in
            #translate --gnu-long-options to -g (short options)
            --config)         args="${args}-c ";;
            --pretend)        args="${args}-n ";;
            --test)           args="${args}-t ";;
            --help-config)    usage_config && exit 0;;
            --help)           args="${args}-h ";;
            --verbose)        args="${args}-v ";;
            --debug)          args="${args}-x ";;
            #pass through anything else
            *) [[ "${arg:0:1}" == "-" ]] || delim="\""
                args="${args}${delim}${arg}${delim} ";;
        esac
    done

    #Reset the positional parameters to the short options
    eval set -- $args

    while getopts "nvhxt:c:" OPTION
    do
         case $OPTION in
         v)
             readonly VERBOSE=1
             ;;
         h)
             usage
             exit 0
             ;;
         x)
             readonly DEBUG='-x'
             set -x
             ;;
         t)
             RUN_TESTS=$OPTARG
             verbose VINFO "Running tests"
             ;;
         c)
             readonly CONFIG_FILE=$OPTARG
             ;;
         n)
             readonly PRETEND=1
             ;;
        esac
    done

    if [[ $recursive_testing || -z $RUN_TESTS ]]; then
        [[ ! -f $CONFIG_FILE ]] \
            && eexit "You must provide --config file"
    fi
    return 0
}

使用函数

main() {
    cmdline $ARGS # 这里是顶部定义的不可变全局变量
}
main

单元测试

单元测试在高级语言中非常常见了,但是在 bash 确实是应用不多.

这里的测试框架是 shunit2,作者是 kward (同时也是 log4sh 的作者).看 commit 开发依旧活跃,今年一直有推送.

开始 shunit2 是托管到了 google code,后来 google code 关闭又迁移到了现在的 GitHub.(google 你关停了多少服务了…)

test_config_line_paths() {
    local s='partition cpm-all, 80-90,'

    returns "/a" "config_line_paths '$s /a, '"
    returns "/a /b/c" "config_line_paths '$s /a:/b/c, '"
    returns "/a /b /c" "config_line_paths '$s   /a  :    /b : /c, '"
}

config_line_paths() {
    local partition_line="$@"

    echo $partition_line \
        | csv_column 3 \
        | delete_spaces \
        | column 1 \
        | colons_to_spaces
}

source /usr/bin/shunit2

下面是使用 df 的另一个示例.这里对上面的原则有一点改变,因为 shunit2 不允许更高全局作用域函数,这里声明了 df 但并没有 readonly 修饰.

DF=df

mock_df_with_eols() {
    cat <<- EOF
    Filesystem           1K-blocks      Used Available Use% Mounted on
    /very/long/device/path
                         124628916  23063572 100299192  19% /
    EOF
}

test_disk_size() {
    returns 1000 "disk_size /dev/sda1"

    DF=mock_df_with_eols
    returns 124628916 "disk_size /very/long/device/path"
}

df_column() {
    local disk_device=$1
    local column=$2

    $DF $disk_device \
        | grep -v 'Use%' \
        | tr '\n' ' ' \
        | awk "{print \$$column}"
}

disk_size() {
    local disk_device=$1

    df_column $disk_device 2
}

其他

这篇文章很早就收入了 plan 文件夹,译文也很早就动笔了,却被遗忘在了角落里,还好又翻出来了..