Appendix D. Parsing and Managing Pathnames

Emmanual Rouat contributed the following example of parsing and transforming filenames and, in particular, pathnames. It draws heavily on the functionality of sed.

   1 #!/usr/bin/env bash
   2 #-----------------------------------------------------------
   3 # Management of PATH, LD_LIBRARY_PATH, MANPATH variables...
   4 # By Emmanuel Rouat <no-email>
   5 # (Inspired by the bash documentation 'pathfuncs' and on
   6 # discussions found on stackoverflow:
   7 # http://stackoverflow.com/questions/370047/
   8 # http://stackoverflow.com/questions/273909/#346860 )
   9 # Last modified: Sat Sep 22 12:01:55 CEST 2012
  10 #
  11 # The following functions handle spaces correctly.
  12 # These functions belong in .bash_profile rather than in
  13 # .bashrc, I guess.
  14 #
  15 # The modular aspect of these functions should make it easy
  16 # to expand them to handle path substitutions instead
  17 # of path removal etc....
  18 #
  19 # See http://www.catonmat.net/blog/awk-one-liners-explained-part-two/
  20 # (item 43) for an explanation of the 'duplicate-entries' removal
  21 # (it's a nice trick!)
  22 #-----------------------------------------------------------
  23 
  24 # Show $@ (usually PATH) as list.
  25 function p_show() { local p="$@" && for p; do [[ ${!p} ]] &&
  26 echo -e ${!p//:/\\n}; done }
  27 
  28 # Filter out empty lines, multiple/trailing slashes, and duplicate entries.
  29 function p_filter()
  30 { awk '/^[ \t]*$/ {next} {sub(/\/+$/, "");gsub(/\/+/, "/")}!x[$0]++' ;}
  31 
  32 # Rebuild list of items into ':' separated word (PATH-like).
  33 function p_build() { paste -sd: ;}
  34 
  35 # Clean $1 (typically PATH) and rebuild it
  36 function p_clean()
  37 { local p=${1} && eval ${p}='$(p_show ${p} | p_filter | p_build)' ;}
  38 
  39 # Remove $1 from $2 (found on stackoverflow, with modifications).
  40 function p_rm()
  41 { local d=$(echo $1 | p_filter) p=${2} &&
  42   eval ${p}='$(p_show ${p} | p_filter | grep -xv "${d}" | p_build)' ;}
  43 
  44 #  Same as previous, but filters on a pattern (dangerous...
  45 #+ don't use 'bin' or '/' as pattern!).
  46 function p_rmpat()
  47 { local d=$(echo $1 | p_filter) p=${2} && eval ${p}='$(p_show ${p} |
  48   p_filter | grep -v "${d}" | p_build)' ;}
  49 
  50 # Delete $1 from $2 and append it cleanly.
  51 function p_append()
  52 { local d=$(echo $1 | p_filter) p=${2} && p_rm "${d}" ${p} &&
  53   eval ${p}='$(p_show ${p} d | p_build)' ;}
  54 
  55 # Delete $1 from $2 and prepend it cleanly.
  56 function p_prepend()
  57 { local d=$(echo $1 | p_filter) p=${2} && p_rm "${d}" ${p} &&
  58   eval ${p}='$(p_show d ${p} | p_build)' ;}
  59 
  60 # Some tests:
  61 echo
  62 MYPATH="/bin:/usr/bin/:/bin://bin/"
  63 p_append "/project//my project/bin" MYPATH
  64 echo "Append '/project//my project/bin' to '/bin:/usr/bin/:/bin://bin/'"
  65 echo "(result should be: /bin:/usr/bin:/project/my project/bin)"
  66 echo $MYPATH
  67 
  68 echo
  69 MYOTHERPATH="/bin:/usr/bin/:/bin:/project//my project/bin"
  70 p_prepend "/project//my project/bin" MYOTHERPATH
  71 echo "Prepend '/project//my project/bin' \
  72 to '/bin:/usr/bin/:/bin:/project//my project/bin/'"
  73 echo "(result should be: /project/my project/bin:/bin:/usr/bin)"
  74 echo $MYOTHERPATH
  75 
  76 echo
  77 p_prepend "/project//my project/bin" FOOPATH  # FOOPATH doesn't exist.
  78 echo "Prepend '/project//my project/bin' to an unset variable"
  79 echo "(result should be: /project/my project/bin)"
  80 echo $FOOPATH
  81 
  82 echo
  83 BARPATH="/a:/b/://b c://a:/my local pub"
  84 p_clean BARPATH
  85 echo "Clean BARPATH='/a:/b/://b c://a:/my local pub'"
  86 echo "(result should be: /a:/b:/b c:/my local pub)"
  87 echo $BARPATH

***

David Wheeler kindly permitted me to use his instructive examples.

   1 Doing it correctly: A quick summary
   2 by David Wheeler
   3 http://www.dwheeler.com/essays/filenames-in-shell.html
   4 
   5 So, how can you process filenames correctly in shell? Here's a quick
   6 summary about how to do it correctly, for the impatient who "just want the
   7 answer". In short: Double-quote to use "$variable" instead of $variable,
   8 set IFS to just newline and tab, prefix all globs/filenames so they cannot
   9 begin with "-" when expanded, and use one of a few templates that work
  10 correctly. Here are some of those templates that work correctly:
  11 
  12 
  13  IFS="$(printf '\n\t')"
  14  # Remove SPACE, so filenames with spaces work well.
  15 
  16  #  Correct glob use:
  17  #+ always use "for" loop, prefix glob, check for existence:
  18  for file in ./* ; do          # Use "./*" ... NEVER bare "*" ...
  19    if [ -e "$file" ] ; then    # Make sure it isn't an empty match.
  20      COMMAND ... "$file" ...
  21    fi
  22  done
  23 
  24 
  25 
  26  # Correct glob use, but requires nonstandard bash extension.
  27  shopt -s nullglob  #  Bash extension,
  28                     #+ so that empty glob matches will work.
  29  for file in ./* ; do        # Use "./*", NEVER bare "*"
  30    COMMAND ... "$file" ...
  31  done
  32 
  33 
  34 
  35  #  These handle all filenames correctly;
  36  #+ can be unwieldy if COMMAND is large:
  37  find ... -exec COMMAND... {} \;
  38  find ... -exec COMMAND... {} \+ # If multiple files are okay for COMMAND.
  39 
  40 
  41 
  42  #  This skips filenames with control characters
  43  #+ (including tab and newline).
  44  IFS="$(printf '\n\t')"
  45  controlchars="$(printf '*[\001-\037\177]*')"
  46  for file in $(find . ! -name "$controlchars"') ; do
  47    COMMAND "$file" ...
  48  done
  49 
  50 
  51 
  52  #  Okay if filenames can't contain tabs or newlines --
  53  #+ beware the assumption.
  54  IFS="$(printf '\n\t')"
  55  for file in $(find .) ; do
  56    COMMAND "$file" ...
  57  done
  58 
  59 
  60 
  61  # Requires nonstandard but common extensions in find and xargs:
  62  find . -print0 | xargs -0 COMMAND
  63 
  64  # Requires nonstandard extensions to find and to shell (bash works).
  65  # variables might not stay set once the loop ends:
  66  find . -print0 | while IFS="" read -r -d "" file ; do ...
  67    COMMAND "$file" # Use quoted "$file", not $file, everywhere.
  68  done
  69 
  70 
  71 
  72  #  Requires nonstandard extensions to find and to shell (bash works).
  73  #  Underlying system must include named pipes (FIFOs)
  74  #+ or the /dev/fd mechanism.
  75  #  In this version, variables *do* stay set after the loop ends,
  76  #  and you can read from stdin.
  77  #+ (Change the 4 to another number if fd 4 is needed.)
  78 
  79  while IFS="" read -r -d "" file <&4 ; do
  80    COMMAND "$file"   # Use quoted "$file" -- not $file, everywhere.
  81  done 4< <(find . -print0)
  82 
  83 
  84  #  Named pipe version.
  85  #  Requires nonstandard extensions to find and to shell's read (bash ok).
  86  #  Underlying system must include named pipes (FIFOs).
  87  #  Again, in this version, variables *do* stay set after the loop ends,
  88  #  and you can read from stdin.
  89  # (Change the 4 to something else if fd 4 needed).
  90 
  91  mkfifo mypipe
  92 
  93  find . -print0 > mypipe &
  94  while IFS="" read -r -d "" file <&4 ; do
  95    COMMAND "$file" # Use quoted "$file", not $file, everywhere.
  96  done 4< mypipe