Chapter 10. Manipulating Variables

10.1. Manipulating Strings

Bash supports a surprising number of string manipulation operations. Unfortunately, these tools lack a unified focus. Some are a subset of parameter substitution, and others fall under the functionality of the UNIX expr command. This results in inconsistent command syntax and overlap of functionality, not to mention confusion.

String Length

${#string}

expr length $string

These are the equivalent of strlen() in C.

expr "$string" : '.*'

   1 stringZ=abcABC123ABCabc
   2 
   3 echo ${#stringZ}                 # 15
   4 echo `expr length $stringZ`      # 15
   5 echo `expr "$stringZ" : '.*'`    # 15


Example 10-1. Inserting a blank line between paragraphs in a text file

   1 #!/bin/bash
   2 # paragraph-space.sh
   3 # Ver. 2.1, Reldate 29Jul12 [fixup]
   4 
   5 # Inserts a blank line between paragraphs of a single-spaced text file.
   6 # Usage: $0 <FILENAME
   7 
   8 MINLEN=60        # Change this value? It's a judgment call.
   9 #  Assume lines shorter than $MINLEN characters ending in a period
  10 #+ terminate a paragraph. See exercises below.
  11 
  12 while read line  # For as many lines as the input file has ...
  13 do
  14   echo "$line"   # Output the line itself.
  15 
  16   len=${#line}
  17   if [[ "$len" -lt "$MINLEN" && "$line" =~ [*{\.}]$ ]]
  18 # if [[ "$len" -lt "$MINLEN" && "$line" =~ \[*\.\] ]]
  19 # An update to Bash broke the previous version of this script. Ouch!
  20 # Thank you, Halim Srama, for pointing this out and suggesting a fix.
  21     then echo    #  Add a blank line immediately
  22   fi             #+ after a short line terminated by a period.
  23 done
  24 
  25 exit
  26 
  27 # Exercises:
  28 # ---------
  29 #  1) The script usually inserts a blank line at the end
  30 #+    of the target file. Fix this.
  31 #  2) Line 17 only considers periods as sentence terminators.
  32 #     Modify this to include other common end-of-sentence characters,
  33 #+    such as ?, !, and ".

Length of Matching Substring at Beginning of String

expr match "$string" '$substring'

$substring is a regular expression.

expr "$string" : '$substring'

$substring is a regular expression.

   1 stringZ=abcABC123ABCabc
   2 #       |------|
   3 #       12345678
   4 
   5 echo `expr match "$stringZ" 'abc[A-Z]*.2'`   # 8
   6 echo `expr "$stringZ" : 'abc[A-Z]*.2'`       # 8

Index

expr index $string $substring

Numerical position in $string of first character in $substring that matches.

   1 stringZ=abcABC123ABCabc
   2 #       123456 ...
   3 echo `expr index "$stringZ" C12`             # 6
   4                                              # C position.
   5 
   6 echo `expr index "$stringZ" 1c`              # 3
   7 # 'c' (in #3 position) matches before '1'.

This is the near equivalent of strchr() in C.

Substring Extraction

${string:position}

Extracts substring from $string at $position.

If the $string parameter is "*" or "@", then this extracts the positional parameters, [1] starting at $position.

${string:position:length}

Extracts $length characters of substring from $string at $position.

   1 stringZ=abcABC123ABCabc
   2 #       0123456789.....
   3 #       0-based indexing.
   4 
   5 echo ${stringZ:0}                            # abcABC123ABCabc
   6 echo ${stringZ:1}                            # bcABC123ABCabc
   7 echo ${stringZ:7}                            # 23ABCabc
   8 
   9 echo ${stringZ:7:3}                          # 23A
  10                                              # Three characters of substring.
  11 
  12 
  13 
  14 # Is it possible to index from the right end of the string?
  15     
  16 echo ${stringZ:-4}                           # abcABC123ABCabc
  17 # Defaults to full string, as in ${parameter:-default}.
  18 # However . . .
  19 
  20 echo ${stringZ:(-4)}                         # Cabc 
  21 echo ${stringZ: -4}                          # Cabc
  22 # Now, it works.
  23 # Parentheses or added space "escape" the position parameter.
  24 
  25 # Thank you, Dan Jacobson, for pointing this out.

The position and length arguments can be "parameterized," that is, represented as a variable, rather than as a numerical constant.


Example 10-2. Generating an 8-character "random" string

   1 #!/bin/bash
   2 # rand-string.sh
   3 # Generating an 8-character "random" string.
   4 
   5 if [ -n "$1" ]  #  If command-line argument present,
   6 then            #+ then set start-string to it.
   7   str0="$1"
   8 else            #  Else use PID of script as start-string.
   9   str0="$$"
  10 fi
  11 
  12 POS=2  # Starting from position 2 in the string.
  13 LEN=8  # Extract eight characters.
  14 
  15 str1=$( echo "$str0" | md5sum | md5sum )
  16 #  Doubly scramble     ^^^^^^   ^^^^^^
  17 #+ by piping and repiping to md5sum.
  18 
  19 randstring="${str1:$POS:$LEN}"
  20 # Can parameterize ^^^^ ^^^^
  21 
  22 echo "$randstring"
  23 
  24 exit $?
  25 
  26 # bozo$ ./rand-string.sh my-password
  27 # 1bdd88c4
  28 
  29 #  No, this is is not recommended
  30 #+ as a method of generating hack-proof passwords.

If the $string parameter is "*" or "@", then this extracts a maximum of $length positional parameters, starting at $position.

   1 echo ${*:2}          # Echoes second and following positional parameters.
   2 echo ${@:2}          # Same as above.
   3 
   4 echo ${*:2:3}        # Echoes three positional parameters, starting at second.

expr substr $string $position $length

Extracts $length characters from $string starting at $position.

   1 stringZ=abcABC123ABCabc
   2 #       123456789......
   3 #       1-based indexing.
   4 
   5 echo `expr substr $stringZ 1 2`              # ab
   6 echo `expr substr $stringZ 4 3`              # ABC

expr match "$string" '\($substring\)'

Extracts $substring at beginning of $string, where $substring is a regular expression.

expr "$string" : '\($substring\)'

Extracts $substring at beginning of $string, where $substring is a regular expression.

   1 stringZ=abcABC123ABCabc
   2 #       =======	    
   3 
   4 echo `expr match "$stringZ" '\(.[b-c]*[A-Z]..[0-9]\)'`   # abcABC1
   5 echo `expr "$stringZ" : '\(.[b-c]*[A-Z]..[0-9]\)'`       # abcABC1
   6 echo `expr "$stringZ" : '\(.......\)'`                   # abcABC1
   7 # All of the above forms give an identical result.

expr match "$string" '.*\($substring\)'

Extracts $substring at end of $string, where $substring is a regular expression.

expr "$string" : '.*\($substring\)'

Extracts $substring at end of $string, where $substring is a regular expression.

   1 stringZ=abcABC123ABCabc
   2 #                ======
   3 
   4 echo `expr match "$stringZ" '.*\([A-C][A-C][A-C][a-c]*\)'`    # ABCabc
   5 echo `expr "$stringZ" : '.*\(......\)'`                       # ABCabc

Substring Removal

${string#substring}

Deletes shortest match of $substring from front of $string.

${string##substring}

Deletes longest match of $substring from front of $string.

   1 stringZ=abcABC123ABCabc
   2 #       |----|          shortest
   3 #       |----------|    longest
   4 
   5 echo ${stringZ#a*C}      # 123ABCabc
   6 # Strip out shortest match between 'a' and 'C'.
   7 
   8 echo ${stringZ##a*C}     # abc
   9 # Strip out longest match between 'a' and 'C'.
  10 
  11 
  12 
  13 # You can parameterize the substrings.
  14 
  15 X='a*C'
  16 
  17 echo ${stringZ#$X}      # 123ABCabc
  18 echo ${stringZ##$X}     # abc
  19                         # As above.

${string%substring}

Deletes shortest match of $substring from back of $string.

For example:
   1 # Rename all filenames in $PWD with "TXT" suffix to a "txt" suffix.
   2 # For example, "file1.TXT" becomes "file1.txt" . . .
   3 
   4 SUFF=TXT
   5 suff=txt
   6 
   7 for i in $(ls *.$SUFF)
   8 do
   9   mv -f $i ${i%.$SUFF}.$suff
  10   #  Leave unchanged everything *except* the shortest pattern match
  11   #+ starting from the right-hand-side of the variable $i . . .
  12 done ### This could be condensed into a "one-liner" if desired.
  13 
  14 # Thank you, Rory Winston.

${string%%substring}

Deletes longest match of $substring from back of $string.

   1 stringZ=abcABC123ABCabc
   2 #                    ||     shortest
   3 #        |------------|     longest
   4 
   5 echo ${stringZ%b*c}      # abcABC123ABCa
   6 # Strip out shortest match between 'b' and 'c', from back of $stringZ.
   7 
   8 echo ${stringZ%%b*c}     # a
   9 # Strip out longest match between 'b' and 'c', from back of $stringZ.

This operator is useful for generating filenames.


Example 10-3. Converting graphic file formats, with filename change

   1 #!/bin/bash
   2 #  cvt.sh:
   3 #  Converts all the MacPaint image files in a directory to "pbm" format.
   4 
   5 #  Uses the "macptopbm" binary from the "netpbm" package,
   6 #+ which is maintained by Brian Henderson (bryanh@giraffe-data.com).
   7 #  Netpbm is a standard part of most Linux distros.
   8 
   9 OPERATION=macptopbm
  10 SUFFIX=pbm          # New filename suffix. 
  11 
  12 if [ -n "$1" ]
  13 then
  14   directory=$1      # If directory name given as a script argument...
  15 else
  16   directory=$PWD    # Otherwise use current working directory.
  17 fi  
  18   
  19 #  Assumes all files in the target directory are MacPaint image files,
  20 #+ with a ".mac" filename suffix.
  21 
  22 for file in $directory/*    # Filename globbing.
  23 do
  24   filename=${file%.*c}      #  Strip ".mac" suffix off filename
  25                             #+ ('.*c' matches everything
  26 			    #+ between '.' and 'c', inclusive).
  27   $OPERATION $file > "$filename.$SUFFIX"
  28                             # Redirect conversion to new filename.
  29   rm -f $file               # Delete original files after converting.   
  30   echo "$filename.$SUFFIX"  # Log what is happening to stdout.
  31 done
  32 
  33 exit 0
  34 
  35 # Exercise:
  36 # --------
  37 #  As it stands, this script converts *all* the files in the current
  38 #+ working directory.
  39 #  Modify it to work *only* on files with a ".mac" suffix.
  40 
  41 
  42 
  43 # *** And here's another way to do it. *** #
  44 
  45 #!/bin/bash
  46 # Batch convert into different graphic formats.
  47 # Assumes imagemagick installed (standard in most Linux distros).
  48 
  49 INFMT=png   # Can be tif, jpg, gif, etc.
  50 OUTFMT=pdf  # Can be tif, jpg, gif, pdf, etc.
  51 
  52 for pic in *"$INFMT"
  53 do
  54   p2=$(ls "$pic" | sed -e s/\.$INFMT//)
  55   # echo $p2
  56     convert "$pic" $p2.$OUTFMT
  57     done
  58 
  59 exit $?


Example 10-4. Converting streaming audio files to ogg

   1 #!/bin/bash
   2 # ra2ogg.sh: Convert streaming audio files (*.ra) to ogg.
   3 
   4 # Uses the "mplayer" media player program:
   5 #      http://www.mplayerhq.hu/homepage
   6 # Uses the "ogg" library and "oggenc":
   7 #      http://www.xiph.org/
   8 #
   9 # This script may need appropriate codecs installed, such as sipr.so ...
  10 # Possibly also the compat-libstdc++ package.
  11 
  12 
  13 OFILEPREF=${1%%ra}      # Strip off the "ra" suffix.
  14 OFILESUFF=wav           # Suffix for wav file.
  15 OUTFILE="$OFILEPREF""$OFILESUFF"
  16 E_NOARGS=85
  17 
  18 if [ -z "$1" ]          # Must specify a filename to convert.
  19 then
  20   echo "Usage: `basename $0` [filename]"
  21   exit $E_NOARGS
  22 fi
  23 
  24 
  25 ##########################################################################
  26 mplayer "$1" -ao pcm:file=$OUTFILE
  27 oggenc "$OUTFILE"  # Correct file extension automatically added by oggenc.
  28 ##########################################################################
  29 
  30 rm "$OUTFILE"      # Delete intermediate *.wav file.
  31                    # If you want to keep it, comment out above line.
  32 
  33 exit $?
  34 
  35 #  Note:
  36 #  ----
  37 #  On a Website, simply clicking on a *.ram streaming audio file
  38 #+ usually only downloads the URL of the actual *.ra audio file.
  39 #  You can then use "wget" or something similar
  40 #+ to download the *.ra file itself.
  41 
  42 
  43 #  Exercises:
  44 #  ---------
  45 #  As is, this script converts only *.ra filenames.
  46 #  Add flexibility by permitting use of *.ram and other filenames.
  47 #
  48 #  If you're really ambitious, expand the script
  49 #+ to do automatic downloads and conversions of streaming audio files.
  50 #  Given a URL, batch download streaming audio files (using "wget")
  51 #+ and convert them on the fly.

A simple emulation of getopt using substring-extraction constructs.


Example 10-5. Emulating getopt

   1 #!/bin/bash
   2 # getopt-simple.sh
   3 # Author: Chris Morgan
   4 # Used in the ABS Guide with permission.
   5 
   6 
   7 getopt_simple()
   8 {
   9     echo "getopt_simple()"
  10     echo "Parameters are '$*'"
  11     until [ -z "$1" ]
  12     do
  13       echo "Processing parameter of: '$1'"
  14       if [ ${1:0:1} = '/' ]
  15       then
  16           tmp=${1:1}               # Strip off leading '/' . . .
  17           parameter=${tmp%%=*}     # Extract name.
  18           value=${tmp##*=}         # Extract value.
  19           echo "Parameter: '$parameter', value: '$value'"
  20           eval $parameter=$value
  21       fi
  22       shift
  23     done
  24 }
  25 
  26 # Pass all options to getopt_simple().
  27 getopt_simple $*
  28 
  29 echo "test is '$test'"
  30 echo "test2 is '$test2'"
  31 
  32 exit 0  # See also, UseGetOpt.sh, a modified version of this script.
  33 
  34 ---
  35 
  36 sh getopt_example.sh /test=value1 /test2=value2
  37 
  38 Parameters are '/test=value1 /test2=value2'
  39 Processing parameter of: '/test=value1'
  40 Parameter: 'test', value: 'value1'
  41 Processing parameter of: '/test2=value2'
  42 Parameter: 'test2', value: 'value2'
  43 test is 'value1'
  44 test2 is 'value2'
  45 

Substring Replacement

${string/substring/replacement}

Replace first match of $substring with $replacement. [2]

${string//substring/replacement}

Replace all matches of $substring with $replacement.

   1 stringZ=abcABC123ABCabc
   2 
   3 echo ${stringZ/abc/xyz}       # xyzABC123ABCabc
   4                               # Replaces first match of 'abc' with 'xyz'.
   5 
   6 echo ${stringZ//abc/xyz}      # xyzABC123ABCxyz
   7                               # Replaces all matches of 'abc' with # 'xyz'.
   8 
   9 echo  ---------------
  10 echo "$stringZ"               # abcABC123ABCabc
  11 echo  ---------------
  12                               # The string itself is not altered!
  13 
  14 # Can the match and replacement strings be parameterized?
  15 match=abc
  16 repl=000
  17 echo ${stringZ/$match/$repl}  # 000ABC123ABCabc
  18 #              ^      ^         ^^^
  19 echo ${stringZ//$match/$repl} # 000ABC123ABC000
  20 # Yes!          ^      ^        ^^^         ^^^
  21 
  22 echo
  23 
  24 # What happens if no $replacement string is supplied?
  25 echo ${stringZ/abc}           # ABC123ABCabc
  26 echo ${stringZ//abc}          # ABC123ABC
  27 # A simple deletion takes place.

${string/#substring/replacement}

If $substring matches front end of $string, substitute $replacement for $substring.

${string/%substring/replacement}

If $substring matches back end of $string, substitute $replacement for $substring.

   1 stringZ=abcABC123ABCabc
   2 
   3 echo ${stringZ/#abc/XYZ}          # XYZABC123ABCabc
   4                                   # Replaces front-end match of 'abc' with 'XYZ'.
   5 
   6 echo ${stringZ/%abc/XYZ}          # abcABC123ABCXYZ
   7                                   # Replaces back-end match of 'abc' with 'XYZ'.

10.1.1. Manipulating strings using awk

A Bash script may invoke the string manipulation facilities of awk as an alternative to using its built-in operations.


Example 10-6. Alternate ways of extracting and locating substrings

   1 #!/bin/bash
   2 # substring-extraction.sh
   3 
   4 String=23skidoo1
   5 #      012345678    Bash
   6 #      123456789    awk
   7 # Note different string indexing system:
   8 # Bash numbers first character of string as 0.
   9 # Awk  numbers first character of string as 1.
  10 
  11 echo ${String:2:4} # position 3 (0-1-2), 4 characters long
  12                                          # skid
  13 
  14 # The awk equivalent of ${string:pos:length} is substr(string,pos,length).
  15 echo | awk '
  16 { print substr("'"${String}"'",3,4)      # skid
  17 }
  18 '
  19 #  Piping an empty "echo" to awk gives it dummy input,
  20 #+ and thus makes it unnecessary to supply a filename.
  21 
  22 echo "----"
  23 
  24 # And likewise:
  25 
  26 echo | awk '
  27 { print index("'"${String}"'", "skid")      # 3
  28 }                                           # (skid starts at position 3)
  29 '   # The awk equivalent of "expr index" ...
  30 
  31 exit 0

10.1.2. Further Reference

For more on string manipulation in scripts, refer to Section 10.2 and the relevant section of the expr command listing.

Script examples:

  1. Example 16-9

  2. Example 10-9

  3. Example 10-10

  4. Example 10-11

  5. Example 10-13

  6. Example A-36

  7. Example A-41

Notes

[1]

This applies to either command-line arguments or parameters passed to a function.

[2]

Note that $substring and $replacement may refer to either literal strings or variables, depending on context. See the first usage example.