lbsplit FAQ PAGE

Summary of topics discussed:

    What is lbsplit?
    You can do this with csplit, perl, and sed already.  Why lbsplit?
    What tasks is lbsplit targeted for?
    Does lbsplit have program variables?
    How do I change the text in the section before it gets printed?
    How do regular expression substitutions work?
    How can I write a case insensitive regular expression?
    My regular expression text has /'s in it, how do I make that work?
    Can I use lbsplit to parse C code blocks?  (NO)
    How do I parse a line's contents into variables?
    How do I detect and process multiple sections in a file or stream?
    What  kind of sections can lbsplit detect?
    How come I only got the first section to print out?
    How can I insert line numbers, section counts, etc?
    How could I number the lines in file?
    How could I simulate grep using lbsplit?
    How do I read from stdin?
    What kinds of text processing commands can I use?
    How to collapse entire sections into single lines?
    What kind of regular expression am I using?
    How to I select specific lines for textual transformations?
    How I constrain substitutions to particular lines?
    How do I write conditional statements?
    How do invert the logic of conditional statements?
    How do I delete, discard, or suppress a given line?
    How can I suppress the printing of empty or blank lines?
    How do I define repeated sections without cutting and pasting the text?
    How do I execute commands after the last line of the section?
    How do I execute commands before the first line of the section?
    How do I execute commands before the first section and after the last?
    How do I suppress the printing of the entire rest of the section?
    Why doesn't regex /./ match empty lines?
    How can I use variables in my scripts?
    How do I write "and" and "or" clauses using regular expressions?
    Why is my section only 1 line long?
    Why does my range condition action command apply to only 1 line?
    Why does my range condition include too many lines?
    How can I compute a range condition's regex?
    How can I quit processing the entire section?
    How can I filter out a section?
    How can I selectively filter out a section?
    How do I deal with columnar data?
    How do I expand tabs in my input data?
    How do read a file and print it?
    How do I write while-loop statements?


What is lbsplit?

  -  A stream editing program designed to detect blocks of lines and perform 
     sed-like text editing commands on each.  The grammar that lbsplit provides
     allows a lot simpler expressions of commonly used text processing commands
     because it allows for the definition of variables and the processing of 
     variables as if they were the current line of text.


You can do this with csplit, perl, and sed already.  Why lbsplit?

  -  csplit writes each block to a temporary file and you can use simple sed
     scripts to process them seperately, but this is very slow.

  -  Perl and sed make you write complex scripting logic to detect the 3 kinds 
     of blocks lbsplit already knows how to recognize.  More work can be spent 
     with these tools detecting the blocks than in actually performing the 
     desired textual transformations.

  -  lbsplit runs faster than perl in making these detections, and requires far
     less scripting to implement simple transformations.

  -  lbsplit's command language is more declarative than procedural -- although
     a while command exists and section processing order is done in sequence.

  -  on RedHat linux, grep is relatively slow on huge files.  lbsplit (and sed)
     run a lot faster to do the same basic things.  For small files, these is
     not so much speed advantage over grep, but on big files, the time 
     difference can amount to several minutes (with either lbsplit or sed). 
     There may not be so much speed advantage on other Linuxes.


What tasks is lbsplit targeted for?

  -  extracting text from log files and performing transformations on the 
     individual blocks.  For example:

	 The valgrind program can automatically generate suppressions which 
	 look like this:

	     {
		  <insert name here>
		  fun:f1
		  fun:f2
		  ...
	     }

	 This text is intermingled with all kinds of other logging information.
	 lbsplit can be used to find all the {} pairs and print only them to 
	 stdout.  And, you can instruct it to replace the <insert name here> 
	 with text which is specific to the block number.  Here is the 
         lbscript invocation that does that:


         lbsplit -n file -S '{ /^{/,/^}/
			       /<insert/{
					  s/.*//1;
					  N;
					  s/.*/   L\\0/1;
					}
			     }+'

	 Here, lbsplit repeatedly processes the same section definition because 
	 of the + after the trailing curly brace: "}+".  It repeatedly extracts
	 blocks of text lines delimited by open and close curly braces 
	 appearing in column 1.  If the current line contains "<insert", the 
	 script causes the whole line to be emptied, then inserts the current 
	 block number into the empty line, and finally sticks on a leading L, 
         so as to leave this text in stdout:

         {
              L1
              fun:f1
              fun:f2
              ...
         }


  -  Parsing lines and splitting the fields up into variables for 
     interpretation.  For example:

	 The g++ compiler produces error messages that have this basic
	 format:

	     filename.cpp:412 Error, something bad happened.

	 This is all well and good, but when C++ templates are used, the 
	 error messages have important bits of information stuck at the
	 end of line, like this:

           f.c:27: Error, std::vector<T>::operator[] (int) [with T = int]

	 The "[with T = int]" part explains what "T" means in the
	 rest of the line.  In this simple example, the error message isn't
	 awfully difficult to understand, but in practice the lines are very
	 long and may have as many as 20 "TypeName = ActualTypeExpression"
	 fields in them.  

	 It is far simpler to understand the real error, if all the symbolic
	 type names, the Ts in the above example, are replaced with the
	 actual type expressions.  

	 Since lbsplit has regular expression parsing and variables as part
	 of the language, the following snippet of lbsplit command language
	 can be used to fix the above problems:

	     w /\[with [^\]\+]$/
	     {
		 g/ \([a-zA-Z0-9_]\+\) *= \([^=\]*\)] *$/match|name|value;
	     
		 s/[, ]*\{match}$/]/g;
	     
		 s/\<\{name}\>/\{value}/g;
	     }

	     s/\[with *] *$//g;

	This fragment, "greps" out the symbolic name "T" into variable "name",
	and the value of the type expression which T represents into variable
	"value".  The entire "T = int" pattern, is stored inv variable 
	"match".  

	Next, it removes the matched pattern from the end of the 
	line and finally replaces all references to the word "T" with 
	the type expression, "int".  Leaving, in the above example,

	   f.c:27: Error, std::vector<int>::operator[] (int)

  -  Generalized text reformatting.  For lovers of columnar data, lbsplit
     provides left and right justification of text in lines as well as
     substitution of line contents with the contents of variables.

  -  Making decisions about multi-line blocks. For example, suppose your
     output looks like this:

       ==3192== Valgrind output message
       ==3192==    line 1 some stuff
       ==3192==    line 2 more stuff including this FLAG value
       ...

      In this case, lbsplit lets you buffer up this whole mess into variables
      and then if one of the lines, such as line2, contains a FLAG value,
      you can decide to discard the entire section, or print it, or perform
      a transformation on it, etc.


Does lbsplit have program variables?

  -  Yes but they are clunky.  You are limited to the following actions:

      * store the current line in a variable  ( |var|=; )

      * append the current line to a variable (with an intervening \n).
	( |var|+; )

      * execute command actions on variables as if they were the current line

	-  regular expression substitutions  ( |var|s/fred/bill; )

	-  initialization  ( |var| l/stuff/; )

	-  special commands that only work on variables  ( |var| x; )

      * assign variables from expanded constants (lets you combine
	multiple variables with intervening contant text)
	( |var| l/ stuff \{var1} - more stuff \{var2} ... /; )
	  
      * print variables. ( |var|P; )

      * use variables in regex substitions affecting the current line
	( s/stuff\{var1}/ crap \{var2}\{var3} ... /g;

      * compute variable names and assign the data to the computed name
	( S/computedName-\{helper}/Some computed value - \{value}/; )

      * parsing text using regular expressions to pick out fields.
	( g/regex/var1|var2|var3; )

      lbsplit is not a full scale programming language and does not try
      to solve all the textual scripting problems that you face.  Instead
      it tries to do one thing well:  find and perform simple transformations
      on blocks of text from log files.

      See comments about variables in other sections, below.

      Note:  you can initialize variables in a prefix section which is executed
      before the beginning of actual processing of your input stream.  You can
      print them after the last line of the input stream is processed using a
      suffix section.  These are defined on the command line like this:

	lbsplit -px "{prefixSection with Actions;}"  -sx '{suffix section ...}'


How do I change the text in the section before it gets printed?
How do regular expression substitutions work?
How can I write a case insensitive regular expression?

  -  The substitute command has this syntax:

	<optional condition> s<delim>pattern<delim>substitution<delim><options>

     Where:

       optional condition   Selections which lines the substitution applies
			    to.  If no condition is specified, it applies to
			    all lines.  Conditions are defined like one of 
			    these:

			      <delim2>regex<delim2>

			      <lineNumber>
			      <lineNumber>,<lineNumber>
			      <lineNumber>,$

			     Where <delim2> is any of %, :, or /, and
			     lineNumber is relative to the current section.

			     NOTE:  Before any range or condition, you can use
			     the ! operator to invert the logic, so:

			       !/fred/P;  means print a duplicate line any
					  line that does not contain fred.

			       !2d;       means delete all but line 2.

			       !4,99P;    print duplicates all lines except
					  lines 4-99.

       delim                 is a character that defines a string boundary,
			     you are limited to %, :, and / as delimiters.

       options               Options can include following characters:
			       
			       i
			       1
			       g

			     "i" is optional.  Either "1" or "g" is required.

			     1 means only replace the first instance of the 
                             pattern with its substitute,

			     g means to perform a global replace

			     i means to perform a case insensitive 
                             substitution.

       pattern               any regular expression -- sed style.  Can contain
			     escape characters like \r, \n, \t, \s, etc, and
			     can contain regex special characters:

			       \(, \|, \)

			     Regexes are too complicated to describe here,
			     but there are unix man pages, and you can look
			     for this specifics:  extended regular expressions,
			     SED regular expressions, ed regular expressions,
			     etc.  Perl regexes are more complex and not
			     supported.

       substitution          any text.  If you want to include parts of the
			     matched pattern in the substitution pattern,
			     use \0 - \9.  These strings refer to parts of
			     the matched pattern.  \0 refers to the entire
			     match -- so if you want to put parenthesis around
			     some text, so this:

			       s/fred/(\0)/g

			     This will replace all instances of "fred" in the
			     current line with "(fred")

			     \1-\9 refer to the sub-parts of the matched
			     pattern which are identified with \(...\) 
			     groupings.  Nested groups are possible, and the
			     number identifies the \( group in the pattern.
			     For example:

			       \(fred\(bill\)\(tom\)\)

			     \0 matches "fredbilltom" in the line

			     \1 matches "fredbilltom" as well
			     
			     \2 matches "bill"

			     \3 matches "tom"

			     Note that you can use \| to mean OR.  As in:

			       \(a\|b\)

			     This would leave \0 containing either a or b
			     when used in the substitutions.

     Note that regexes and substitutons are "variable expanded" before use.  
     Variable expansion just replaces text of this form:  "\{varname}"
     with the contents of that variable.  Note, there is NO \ before the
     trailing }.  Variables are assigned like this:

       |var|=;    // store the current line in the variable
       |var|+;    // append the current line to the variable with an
		     intervening \n.

       |var|s/regex/substitution/options;  // applies the substition to the
					   // named variable instead of the
					   // current line.

     Note that to replace an empty variable with some string using the 
     regex substitution, you can't just say:  |var|s//stuff/g;  You have
     use this approach:

	|var|s/^.*$/stuff/g;

     this will let you replace any text in the variable with stuff -- even if
     there is none.  Hopefully, you can use the "=" command instead, but it isn't
     always possible.

      Note:  you can initialize variables in a prefix section which is executed
      before the beginning of actual processing of your input stream.  You can
      print them after the last line of the input stream is processed using a
      suffix section.  These are defined on the command line like this:

        lbsplit -px "{prefixSection with Actions;}"  -sx '{suffix section ...}'


My regular expression text has /'s in it, how do I make that work?

 -  Two ways:

    *  You can escape the slashes in the text, like this:

      \/

    *  Or, you can use different delimiters for the regular expressions:

       /, :, or % can be used as the delimiter.  Whichever one you start
       with defines the delimiter for the entire regular expression or
       string.

    For example, you could write the above invocation like this:

	 lbsplit -n file -S '{ %^{%,%^}%
			       :<insert:{
					  s:.*::1;
					  N;
					  s%.*%   L\\0%1;
					}
			     }+'


Can I use lbsplit to parse C code blocks?

  -  Whoa!  No can do big guy!  lbsplit doesn't handle nested blocks which use the
     same regular expression as a delimiter.

     - however you can partialy simulate nesting if the inner blocks have
       different delimiters than the outer blocks.  For example:

	  { // outer

	     [ // inner
	     ]
	      
	  }

       Here, you would make your outer blocks your major section delimiters
       and use a ranged block to process the inner block:

	 '{  /{/,/}/

	     outer block actions
	   
	     /\[/,/\]/ {
			  inner block actions
		       }
	  }'


How do I detect and process multiple sections in a file or stream?

  - You use multiple section definitions:

    Either:

      lbsplit -n file ... -S '{/^Page/ ... }' '{/^Another/ ... }' ...

    Or:

      lbsplit -n file ... -S '{/^Page/ ... }+' 

    Note:  You can specify multiple sections, each will be processed
	   exactly once then discarded.  You cannot loop over groups
	   of section definitions (yet).  If used, the + operator will
	   consume all text matching the modified section until out of data.
	   Thus, if used, it must be applied to final section definition.

	   You can loop over individual lines when implementing section
	   definitions.  See the "w" command.


What  kind of sections can lbsplit detect?

  - Three kinds:

      A.  blocks of lines, all of which match the same regular expression.
	  For example, valgrind messages all look like this:

	    ==42921==   Text ...

	  Where the number, 42921, is the process id of the program valgrind
	  is debugging.  Each valgrind error ends with a line like this:

	    ==42921==

	  Meaning that there is not text after the ==.

	  This kind of section can be used to define a single section that
	  spans the entire file.  See below.
   
      B.  blocks of lines which begin with a line matching a regular 
	  expression but containing others that do not.
   
      C.  blocks of lines which begin with one regular expression and end with 
	  another.
   
      Type A:
   
	     stuff
	     stuff
	     a-regex        3 lines match the regular expression, a-regex
	     a-regex
	     a-regex
	     more stuff
	     more stuff
	     ...
   
	Type B:
   
	       stuff
	       page 1                           first section start
		  page 1 contents
	       page 2                           second section start
		 page 2 contents
	      ...
   
	Type C
   
	      stuff
	      begin section 1               first section start
		 middle
	      end section 1                 first section end
	      more stuff
	      begin section 2               second section start
		middle 
	      end section 2                 second section end
	      other stuff

      D:  Selector sections that you automatically choose 1 of N
	  possible sections based on the data in the input stream.

    Here's how to write section boundaries for each of these cases:

       A:  { %regex%w ... }   // a group whose lines ALL match regex.  Again,
			      // your regex could contain %, :, or / as
			      // delimiters
			      //
			      // To define one section that consumes the 
			      // entire file, use:
			      //
			      //   { /^.*$/w
			      //     ....
			      ///  }

       B:  { :regex: ...  }   // a group BEGINNING with regex and
			      // everything until the next instance thereof.

       C:  { /begin/,/end/ .. } // lines containing begin THROUGH those
				// containing end

       D:  ?{ {/s1/...} {/s2/...} }  // either section s1 or s2 depending
				     // on whichever is found first in the
				     // input stream.

    Note:  See "Why is my section only 1 line long?" Below


How come I only got the first section to print out?

  -  Because you only defined one section and you didn't use the + option at 
     the end of a section definition:

      -  '{ ... }'          matches exactly 1 section

      -  '{ ... }'  '{...}' matches exactly 2 sections

      -  '{ ... }+'         applies to all instances of the section to the end 
			    of the input stream.


How can I insert line numbers, section counts, etc?

  - the commands n, N, f, and I 

    n; inserts the section relative line number (1 based)

    N; inserts the section number (1 based)

    I; inserts the stream relative line number (1 based) (across input files)

    f; inserts the current file and line number (1 based).

    Numbers are inserted at the beginning of the line and are
    followed by a tab.  For example, if the current line contains:

       this

    And you execute the I; command, it will be modified to look like this:

       348\tthis

    Assuming it was line 348 from the beginning of the input stream.

    You might want to think about adding numbers to your lines as the last
    thing in the section -- if you do it earlier, you will have to remember

    The f command inserts:

      filename \t lineNumber \t

    (without the spaces!)


How could I number the lines in file?

  -  lbsplit -n filename -S '{ /$/w f; }+'

     Explanation:

       -n means to only print the lines in sections

       The section boundary, /$/w, means to treat all lines that have an
       ending as a group.  (Note:  /./w won't work, try it).

       f; means to prefix the line the current filename and line number.


How could I simulate grep using lbsplit?

  -  lbsplit -n YourFile -S '{ :YourRegex:w f; }+'

     Explanation:

       -n  means not to print any text not in a section.
       
       The section definition is a "while" style section.  A "while" section
       is defined as a group of lines all matching the expression, yourRegex.
       If you leave out the trailing w, you get all the text in the file 
       starting with the first line in the file that maches YourRegex.

       Blank lines only match the '$' regular expression, no others.


How do I read from stdin?

  - lbsplit - 

    A single - option before the section definitions is interpreted as
    refering to stdin.  At most 1 - can be interpreted as stdin.


What kinds of text processing commands can I use?

  - Mainly regex substitutions          s/this/that/g;
	   textual translations         y%[a-z]%A-Z]%;
	   inserting numbers            I;  n; N; f;
	   expanding tabs               t;
	   compressing with tabs        T;
	   loading the line with text   l/text/;
	   left and right justification j10; J10;
	   cutting out columns          c 1-10,20,30-90;

How to collapse entire sections into single lines?

  - E/eolText/;

    If the eolText is empty (//), then all the lines are concatenated
    together -- but this is generally useless.  A better strategy is to
    convert \n into |, or some other easily replaceable string.  You can't
    do this with the y command, however.  The end of line sequence is not
    part of the text in the line.  

    Note that if you do something line this:

       lbsplit -n file -S '{/pattern/...   E/|/;}' | sed <cmds> | tr '|' '\n'

    Then you can extract a section, turn it into a single long line, then
    use sed to make substititions in that line, the use tr to split the
    lines back out.


What kind of regular expression am I using?

    - sed style

      You can use \| to define regexes that match either one pattern or
      another.

      You use \( ... \) to encapsulate sub-expressions (rather than () like
      is done with perl).

      CAVEATS:  & is not interpreted in regular expression substitutions on
      the right hand side.  Use \0 instead of &.

      Note:  Regular expressions can have options:  /fred/i  matches both
      "fred" and "Fred" and "FRED".


How to I select specific lines for textual transformations?
How I constrain substitutions to particular lines?
How do I write conditional statements?
How do invert the logic of conditional statements?

  -  Any command in a section can be preceded with one of the following:

      <line>
      <line1>,<line2>
      <line1>,$
      /regex/
      !<line>
      !<line1>,<line2>
      !<line1>,$
      !/regex/


     For example:

	2d
	2,3s/fred/bill/1;
	:Tom:s/om/OM/g;
	!:frank:s/om/OM/g;

     The prefixes select only the specified line number, line range, 
     or lines matching a specified regex.


How do I delete, discard, or suppress a given line?

  -  the 'd' command -- which can be used conditionally, see above.


How can I suppress the printing of empty or blank lines?

  -  blank lines do not match the regex, /./, and so you can use this to
     detect and skip them.  Use this command action:

       !/./ d;

How do I define repeated sections without cutting and pasting the text?

  -  Sections defined with a trailing + are repeating sections.  For example:

      {/begin/,/end/}+

     Note that normally, repeating sections should be the LAST section.  However,
     if you define a section that does not have an end regex and is NOT marked
     as a while section (using the /w option on the begin regex).  Then, you can
     have other sections defined after it.  Test30 in the source distribution
     does this.  Here is its input data set:

	Intro 1
	Intro 2
	Intro 3
	Page  1
	  p1a
	  p1b
	Page 2
	  p2a
	  p2b
	Page 3
	  p3a
	  p3b
	Trailer 
	  t1
	  t2

     Here are the sections as defined by the command invocation below:

	1       Intro 1
	1       Intro 2
	1       Intro 3
	2       Page  1
	2         p1a
	2         p1b
	3       Page 2
	3         p2a
	3         p2b
	4       Page 3
	4         p3a
	4         p3b
	5       Trailer 
	5         t1
	5         t2

     The one of the command invocations for test 30 is:

		lbsplit -n tests/pageTest.txt -S \
	      '{/Page/w! N;}' \
	      '{/^Page/ N;}+' \
	      '{/Trailer/ N; }' 


How do I execute commands after the last line of the section?

  -  The 'A' action expects another action as its argument and it
     inserts that action into the suffix list of the current section.
     Suffix actions are execute after the section is finished.


How do I execute commands before the first line of the section?

  -  The 'B' action expects another action as its argument and it
     inserts that action into the prefix list of the current section.
     prefix actions are execute before the first line of the section is
     processed.  Note that you could implement this using the conditional
     action:

       1{prefix action list}


How do I suppress the printing of the entire rest of the section?

  -  The 'q' command turns off the current line and all others in this
     instance of the current section.  If the+ operator is used on the
     section definition, the q command has no effect on future instances
     of this section.

Why doesn't regex /./ match empty lines?

  -  It isn't supposed to.  If you want to substitute a truly blank line
     into something else, you can use this:

	s/^$/OtherStuff/1;

     Alternatively, you could just use the l/OtherStuff/; command to
     force the line, whatever it contains, to be equal to OtherStuff.
     This is particularly helpful for variables whose values usually
     default to an empty string.

     If you want to perform a regex substitution on a any line, even if
     it is empty, do this:  

       s/^.*$/desired/1;


How can I use variables in my scripts?

  -  The substitute command lets you use \{varname} syntax to specify
     a variable whose value will be inserted into either the regex or
     the substitution.  Do not use a \ on the trailing }.  This syntax can
     also be used in the regex (left hand side of the substitution).

     The varname refers either to a program variable or an environment variable
     if no program variable by that name exists.  

     Program variables are initialized like this:

       |varname|s/^.*$/SOME DEFAULTVALUE/g;    replace extant contents with new

       |var|l/SOME DEFAULT VALUE/;             load var with text

       |var|=;                                 load var with current line

       |var|+;                                 append current line var with
					       an intervening \n.

       S/\{bill}/stuff-\{george}/;             store stuff-(the contents of george)
					       into the variable whose name is
					       found in bill

     Note that the entire syntax is required if you are defining a previously
     undefined variable.

     This syntax can be used in B actions so that you don't have to waste
     compute cycles on every line of the input file.

     Alternatively, you can detect text in the body of a section that you
     want to store ni the variable, varname, and/or update it as the 
     scripts run.

     Any action that modifies a the current line can be applied to a variable
     and some special actions can only be used with a variable.

     The command 'm' exists to let you expand variable names into their values
     without having to go through the substitution process.  Essentially, the
     'm' command maps the current line to the value of a variable whose name is
     specified on that line -- or an environment variable if none is found.
     This command is not overly useful, but you might need it.

      Note:  you can initialize variables in a prefix section which is executed
      before the beginning of actual processing of your input stream.  You can
      print them after the last line of the input stream is processed using a
      suffix section.  These are defined on the command line like this:

        lbsplit -px "{prefixSection with Actions;}"  -sx '{suffix section ...}'


How do I write "and" and "or" clauses using regular expressions?

  -  Boolean logic equates the expression

       A && B

     with

       !( !A || !B)

     It also equates

       A || B

     With 

       !( !A && !B )

     When defining sub-ranges within a section over which to apply commands, 
     you can use the following inside a section definition:

	!<regex1>,<regex2>{ commands }

     This says to execute the commands if you are not in the range defined by
     <regex1> to <regex2>.  

     "AND" clause, you can do the following:

       /regex1/ /regex2/ commands

     which means to execute commands only if both regexes are true for the
     current line.

     "OR" clauses are implemented like this:

       /regex1\|regex2/ { commands }

     Given these linguistic features, and the bool logic underlying your needs,
     it may or may not be possible to accomplish the filtering you wish to do.

     Note that the above apply to all regular expressions, but the following does
     not:  When defining sections, you supply either 1 or two regular expressions
     which define bounds of the section.  To implement a selection between 1 
     section or another, use the '?' operator:

       ?{
	   { /s1/ ... }
	   { /s2/ ... }
	   ....
	}

     That is, you can specify more than one section in a group.  lbsplit will choose
     and activate whichever section it encounters first.  Note that this is strictly
     an "or" situation.  You can't generally use nested sub-sections in lbsplit.


Why is my section only 1 line long?

  -  The section selection regular expressions control this.  If you use
     the same regular expression for the beginning and ending regular
     expressions, then you will get a 1 line section:

	{  /fred/,/fred/ ... }
	{  /./,/./       ... }

  -  Note that you can avoid this nuisance by adding the '>' option to the
     end option for your section.  That option requires that sections end
     on a different line than on which they begin (when a two regex
     section is defined).

	{ /fred/,/fred/> ... }

Why does my range condition action command apply to only 1 line?

  -  For the same reason as the above -- by default, both the begin and
     the end conditions are applied to the current line.  If you want to
     guarantee that your condition range spans more than one line,
     use the '>' character at the end of the second range:

	{ /fred/,/fred/> # the section is more than 1 line long
        
	   /tom/,/tom/> action;  # the range of tom actions is more than
				 # one line long
        }

Why does my range condition include too many lines?

  -  A range condition in a section selects a subset of the lines in a
     section for special processing.  Consider:

	{  /./,/end of file/

	   /beginline/,/endline/ {
				    cmds;
				 }

	}

    Here, the entire file is defined as a single big section.  Each 
    subset of the lines in the file beginning with /beginline/ and
    ending with /endline/ will have the cmds applied to them.

    Note that this is repeated infinitely.  

    Since your range conditional regexes can contain \{var} references,
    you could have the cmds in the range change the var to be an
    un-matchable string and thus you would only truly match on the
    first instance.  For example:

      {  /./,/<eof>/

	 B{  |var|l/beginline/; }

	 /\{var}/,/endline/ { cmds ; 
			      |var|l/invalidstuff/;
			    }

      }

    Here's how this section definition works:

      *  The section is defind by any non-blank line and ends when
	 when a line containing <eof> is matched.  Presumably no such
	 line exists, so section runs from the beginning of the file
	 to the end thereof.

      *  Before the section is executed, the variable, var, is initialized
	 to "beginline"

      *  As the section is processed, each line (of the section and thus the
	 whole file) is compared against the range:

	    /\{var}/     # will contain beginline the first time through
	    /endline/

      *  The first time that a line containing "beginline" is found, the
	 sub-section defined by the range condition will become active
	 and the range commands will be applied.  In this case, the variable,
	 var, will be changed from "beginline" to "invalidstuff".  This
	 will effectively eliminate the possibility that any other lines
	 in the file will be effected by the range, and when this range
	 ends (when endline is found), that will be the end of the range's
	 use.

How can I compute a range condition's regex?

  -  As just shown, a range condition within a section can be defined as a
     regular expression:

	{  /./,/<eof>/

	   B{  |v| l/P/; }        << variable v gets a capital P;

	   /\{v}/,/p/  { P; }     << only print lines in the range

	   d;                     << delete all the rest
	}

How can I quit processing the entire section?

  -  The q; command terminates the entire section -- immediately -- it does
     not wait until the proper end of the section is found.


How can I filter out a section?
How can I selectively filter out a section?

  - The easiest way is to use lbsplit without the -n option, the define the
    section you want to filter out and have it no print anything.  
    For example:

	lbsplit somefile.txt -S '{ /begin/,/end/ d; }'

    This prints the entire file, somefile.txt, to stdout, except for
    the text between "begin" and "end", because all the lines in that
    section are deleted.  Note that you can't use 'q' here, because only
    the 'begin' line would be deleted.  The q command would terminate the
    section as soon as it was executed.

  - If you want to examine the section before deciding to filter it out
    entirely, you can use this trick:

      a.  don't just delete the lines in the section, as a above, but
	  also append them to a variable.  
	  
      b.  if you decide to keep the section, print the variable.

    For example:

       lbsplit somefile.txt -S \
	 '{ /begin/,/end/

	     B{ |lines|l//;  
		|keeplines|l//;
	      }

	     A{ |keeplines|/./ |lines|P;
	      }

	     |lines|+;

	     /fred/|keeplines|l/keepit/;

	     d;
	  }'

    Here's why this works:

       1.  Since the -n option is not used, all lines which are not
	   part of a section, are printed automatically.

       2.  The section of interest is defined by a begin/end pair of lines.

       3.  When the section is first entered, before any lines are processed,
	   two variables are initialized to empty:  "lines" and "printit".
	   The "lines" variable will hold all the lines in the section.
	   the "keeplines" variable will serve as a boolean flag meaning that 
	   we have decided to keep the section in the output.  The "B" command
	   defines the list of commands to be executed before the first line is
	   processed.

       4.  Each line is ultimately deleted from the output by the "d;" command
	   that appears at the end of the section.   Such commands must go at
	   the end if you wish to do any other processing in the section.

       5.  Each line of input text is appended to the "lines" variable with
	   a leading newline.

       6.  As each line of the section is processed, it is compared against
	   the regular expression, /fred/.  If such is found, the the
	   variable keeplines is modified to contain the text constant "keepit".

       7.  The "A" command defines the behavior that occurs after the last 
	   line of the section is processed.  Here, the "keeplines" variable
	   is compared against the regular expression, /./, which just checks
	   to see if the variable is empty or not.  If it is not empty, then
	   the statement prints the contents of the lines variable.


How do I execute commands before the first section and after the last?

  -  prefix and suffix sections can be defined on the command line like this:
  
      -px '{ prefix section }'
      -sx '{ suffix section }'

     You must can use '/./' as the section defining regexes.  This is only
     really useful for printing things and for initializing variables.


How do I deal with columnar data?

  - The c command (cut), lets you select columns out of the current line
    and replace the line with the selection.  For example, suppose the
    current line were this:

      0123456

    And a cut command action like this were used:

      c 1-3,7

    The line would be replaced with 

      0126

    Note that you probably want to use the 't' command to make sure tabs
    get expanded before using cut.


How do I expand tabs in my input data?

  -  The t; command expands tabs into space.  T; compresses with tabs;


How do read a file and print it?

  - The r; command lets you read a file and print it instead of the
    current line.  The r command has two forms;

      r;   uses the current line as the file name

      r file;  uses 'file' as the filename -- the text is expanded before use.

How do I parse a line's contents into variables?

  -  The g/regex/varlist; command lets you parse the current line or variable
     into pieces using a regular expression and list of variables into which
     to place the regular expression match information.  The variable list
     is a list of variables separated by '|'s.  Here is an example invocation:

       g/.*/match;

     In this case, the variable match will populated with the entire contents
     of the variable or line that is the context of the command.  Here's 
     another:

       g/[a-zA-Z_]\+/firstWord;

     In this case, the variable firstWord will be populated with the first
     word on the line.  Here's another:

       g/Section: *\([^ ,]\+),/wholeMatch|sectionName;

     In this case, variable wholeMatch will contain something like this:

       Section:   100,

     And variable sectionName will be populated with "100".  Here's a more
     complex example:

       g/Section: *\([0-9]\+\)\.\([0-9]\+\/whole/firstDigit|secondDigit;

     In this case, if the input data contains;

       Section:  1.2

     the variable, whole, will be populated with "Section:  1.2", and the
     variable, firstDigit, will be populated with "1", and the variable,
     secondDigit, will be populated with "2".

How do I write while-loop statements?

  -  While loops are limited to repeated processing of the same line.  
     You can't use a while loop to process multiple lines.  Here's the 
     syntax for processing a line multiple times:

       w/regex/action

     Or
       w!/regex/action

     This command action is only useful if hte action modifies the current 
     line so that the while loop eventually terminates!  Otherwise, the
     script will hang.