SED

Other Helpful web sites

This article is about GNU SED. There are other forms of SED, but only the GNU SED meets the key requirements necessary to make it a primary component of a s/w developer's toolbox.

Why SED?

GNU SED lets you filter and transform text quickly and easily and it lets you automate these activities to aid in performing many kinds of common software development tasks.

Begin a SED expert helps you with these chores in two ways:

working from the command line, it helps you gather information from files and formatting it nicely for things like:
- discovering just what is really in log files or other data files
- gathering information and formatting it nicely for emails or source code
creating scripts that automate the above tasks for use in automated testing or for the automatic generation of source code as part of your product's make process

One of the nice things about being a SED expert is that you constantly think of new ways to use SED to automate tasks that you might otherwise have avoided. It reduces the natural relectance to do things automatically because so many automation tasks become so easy.

One could argue that the same could be said of PERL, PYTHON, or C++ for that matter. However, those are all real programming languages. As such, the act of programming is fairly envolved. You have to have functions and variables, and data types to think about. SED on the other hand is a domain specific language with only 2 variables (really!), and only a meager ability to peform programming looping.

Most SED programs are more functional than algorithmic: you are basically apply one function after another to the data in files -- passing the results of the previous function to the next. Very many useful SED programs can be written entirely on one line -- so if you are using the console to develop the script, you iteratively add more and more features to the script but repeating the last experiment and adding a new function to the chain. When you get the output you want, stop.

Example Automation Scenario

Suppose you write an XML parser in C++, PERL, C, PYTHON, or assembly language. To test it, you obviously need to verify that the parser's output given an known XML input is correct. You could painfully hand construct all this "known data", or you could use SED to gather it for you. Here's how:

Use a SED regular expression substitution to put all the tags from the input data set on separate lines in a COMPARAND test "expect" file.
Given that the open and close tags are on separate lines discard all lines that are NOT open/close tags.
Convert XML tags containing /> into a separate closing tag line to make the file super-regular (using SED).
Make your XML parser produce a list of open and close tags and use it to produce a TEST-OUTPUT file containing all tags from the input file.
Use DIFF to compare the actuall TEST-OUTPUT with the expected COMPARAND file produced with SED.

At this point, you have to be asking yourself, aren't there many ways to do this? The answer to these questions is: Yes. There are many ways to solve this problem, but the SED strategy applies to a very large subset of programming problems.

XML aside, this approach of

Writing some sort of processing program that generates text output.
Using GNU SED to create an expected output values file from the program's input file.
Comparing the real program's output from the program's input (transformed through GNU SED)

Can be effectively and efficiently applied to the testing most real programs. Yes, you can do this with other programming languages, such as PERL, PYTHON, C++, C, and assembly languages, but you can do it quicker and more efficiently with SED than these other languages -- assuming that you are REGEX GURU.

Levels of Complexity

SED scripts fall into increasing complexity levels along the following lines:

string replacement
selective printing
multi-line string replacement
copying text across lines
while loops

These levels are discussed in the following sections.

String Replacements

Trivial string replacement

Here is an example of replacing all references to the string "fred" with "bill" on every line of inputFile.txt: Note that in this example, inputFile.txt will be read, fred will be changed to bill, and the resultant file will be printed to stdout.
If you want to modify inputFile.txt directly, use the -i option:
If you need to replace more than one string, you can use one of the following approaches:
Approach 1: Piping the data through separe invocations of SED
Approach 2: Using multiple SED commands in a single -e option, separated by semicolons Approach 3: Use a bash feature that allows you to have strings that span more than one line of the script -- and use the natural ends of line as the command separator: Approach 4: Save the commands in a script file and invoke SED using that script file. Note that the multi-line approaches, script files and multi-line bash strings give you the benefit of adding #-style comments/

Regular Expression String Replacements

Regular Expressions (see the Wikipedia article) are ubiquituous in programming but are feared by many almost as much as are dentists. Yes, they are fairly nasty to look at, but that is why SED let's you put comments in your scripts. The power they provide is worth far more than the time it takes to learn how to use them.
Further, SED is a perfect vehicle for learning to use regexes because it lets you do powerful things directly from the command line. For example:
This kind of experiment with SED's regex substitution feature is easily entered on the command line and using recall, easy to modify and play with. The result of course will be: The real power of using regular expression substituions comes from their ability to let you perform subsitutions based on parts of the matches that are found, not just the whole thing. Consider this example: The result of this fairly nasty looking substituion is the string Here, the "from" string in the substituion command matches "Lowell", but not as a single pattern. The real pattern consists of L followed by any non-blank. The "any non-blank" part is encapsulated in \( ... \) to show that it must be treated as a separate component. When performing the substitution, the various parenthetical sections can be used separately when performing substituions.
The ability to re-order the pieces of input text when producing output gives SED most of its power.

Selective Printing

SED normally copies stdin to stdout while performing any substitutions that you want performed on each line. However, if it is invoked with the -n option, then nothing is printed, no how many substitutions that you perform, unless you specifically print the lines. This allows you to implement a grep-like behavior using only SED. For example: Given file X.txt containing only a b c d And the following SED invocation: sed -n '/c/p' X.txt The only produced by this invocation would be c The SED invocation consists of 3 parts:

the -n option suppresses all output not specifically requested
the /c/ is a line-range selecting regular expression stating that only lines containing 'c' are interesting
the 'p' command says to print the line

Note that the -n option, the line range selecting regular expression, and the p command are orthogonal to one another. They can be combined or used separately.

SED