lbsplit v1.1.2 lbsplit NAME lbsplit - Lowell Boggs' file splitter ABSTRACT Split files into sections and perform translations on each based on user defined scripts which are very similar to sed scripts -- except that they support named variables, text justification, etc. See FAQ.txt and manual.html for more examples, longer discussions, etc. SYNOPSYS lbsplit [processingOptions] filename ... [sectionOptions] "processingOptions" modify the behavior of the program with respect to all sections. -n suppresses the printing of text which is not part of a section -f prefix specifies the prefix part of the filenames that MIGHT be used when processing sections. -N digits specifies the number of digits to generate when numbering output files. -d turn on debug outputs -- bulky ugly stuff, useful for diagnosing your section definitions. -D print extra debugging information, like variable contents at the end of execution. Changes some output formats. -v print the lbsplit version info. -px prefixSection define a prefix section's actions (mainly useful for initializing variables and printing things) For example: -px '{/./ |var|l/stuff/;}' This section loads variable, var, with "stuff" before any sections are processed. Variables default to empty strings, so only use this for non-null initializations. -sx suffixSection define a suffix section's actions (mainly useful for final printouts). Note that to check for a variable being populated, use this: -sx '{/./ |var|/./ { P; } }' this example prints the variable only if it is not empty at then end of a run. "sectionOptions" end the list of file names and start the list of section descriptors. -S [sectionDescription ...] indicates the descriptions are specified on the command line -F sectionFileName indicates the descriptions are found in specified file -FH [0-9] specifies that the section descriptions should be read from a file handle which is initialized by the shell that starts lbsplit. Here's a simple script example: exec 3< means the range spans at least 2 lines. !/regex/action; -- execute action only if the current line does NOT !/reg1/,/reg2/action; match the specified regex. Note that instead of using / as the separator, you can also use: %, or :. See previos section for notes on options to regexes. 2,4action; -- execute the specified action only if the current line number with in the current section is in the range 2-4. Note that any pair of numbers can be used but the second number must be >= the first. You can also use $ for the second number and means "end of the section". 3action; -- execute this action only if the current line number within the current section is 3. Any number may be used (but not $) !action; -- execute action on any but the specified line. !,a; -- execute a on any line NOT in the range of line1 to line2. {action; ... } -- define an action list -- only useful when you are using a conditional action of some kind. p/text/; -- Print the text using the current end of line string. You can embedded control characters in it using \r,\n, etc. Note that instead of using / as the separator, you can also use: %, or :. Note also that you can include variable references in the printed text: \{varname} is replaced with variable varname's contents. t; -- Replace the tabs in the current line with enough spaces to align to the proper tab position (8 chars per tab per unix std) T; -- Replace leading space in the line with tabs in 8 char chunks. n; -- prepend $lineNum\t to the begining of the line or variable. N; -- prepend the current section number and \t to the begining of the line or variable. I; -- prepend the current line number and \t to the begining of the line or variable. f; -- prepend the current (1) input file name, (2) a tab, (3) the line number, relative to that file (one based), and a second (4) a tab to the beginning of the current line or variable. A action; -- execute the specified action after the last line of the section. Variables modified by the section are still available. B action; -- execute the specified action before the first line of the section. Similar to the "1 action;" definition, see above. l/text/; -- replace the current line or variable with text -- can include escape sequences etc. In addition to /, the % and colon characters can be used as string delimiters. You can use \{var} to include the contents of variables in the text. You can use l//; to initialize the line or variable to an empty state. Here's how you initialize a variable to an empty string: |varname|l//; Ugly but works. m; -- treat the current line as a variable name, find that variable's value, and replace the current line with that. |var|x; -- replace the variable with current line. |var|= -- replace the variable with current line. Not usable without a variable reference. |var|+ -- append the current line or variable to the to the variable named in the variable context -- with an intervening \n. A substitute command to get rid of it can be written like this: |var|s/\n//g; Here: |var1||var2|+; Will append \n followed by var2 to var1. S/varExp/valueExp/; -- Compute a variable's name from varExp, and store the value defined by the valueExp in that variable. The computations involve expanding any \{var} references in the text (and escape character interpretations). j cnt; -- left justify the current line or variable in a field of spaces "cnt" wide. J cnt; -- right justify the current line or variable in a field of spaces "cnt" wide. q; -- stops processing of this section immediately. c cutset; -- selects ranges of columns, like the cut -c unit command. For example: c 1-10,40-99 will select the first 10 and 40'th through the 99'th character from the string -- it will concatenate them into a single string and replace the current line with that string. r [file]; -- print a file instead of the current line. If the file's name is specified in the command, variable expand the name then print it. If the file's name is not specified (r;), then use the current line as the file name and print that. g/regex/varlist; -- get variables from the current line using a regular expression to parse them out. For example; g/stuff\(x1\)crap\(x2)/match|p1|p2; Here, if the line matches the pattern, variable match will contain a non-empty value and, and so will p1 and p2. Variable p1 will contain "x1", and variable p2 will contains "x2"; If the current line does not match the pattern, match, p1, and p2, will be emptied. w/regex/action While the regex is true of the current line, execute the action. w!/regex/action While the regex is NOT tru of the current line, execute the action. DEFINING SECTION BOUNDARIES Each of the regular expressions that define the start and end of the section are specified like this: /pattern/[options] The options indicate how to deal with the line containing pattern. The trailing options are as follows: no options -- means take the default options for this line and pattern with respect to the match. Basically this means that if the line contains the pattern then it matches the begin or end of the section. ! -- invert the pattern. If the pattern does not match, then it defines the begin or end of the block, rather the reverse. w -- used only on the begin section regex, it means that the section is defined by all lines that match the single regex, and does not require or allow an end regex. i -- make the comparison case insensitive. > -- ensure that the section ends on a different line than the one on which it starts (use > only in the end regex) These options are necessary to deal with different kinds of text blocks. For example, consider this example input file: blah blah blah BEGIN something or other stuff in the section END blah blah blah This section could be matches like this: { /^BEGIN/,/^END/ ... In this case, all lines begining with the first BEGIN and ending with the first END will be processed. If you don't want to include the begin and end lines in the output, say for example, you only want "stuff in the section" to appear in the output, do this: { /^BEGIN/,/^END/ /^BEGIN/d; ^/END/d; } This will select all lines from BEGIN to END, but will suppress the printing of the BEGIN and END LINE. Alternatively, suppose your input text looks like this: a1 a2 b1 b2 c1 c2 c3 In this case, sections can be identified by the first character in the line, but there is no clear end of section line to match on. To match all the lines beginning with a, but not include the first line containing the b as part of the first section, you use the 'w' option to the begin regex in the section, and do not supply an end regex: { /^a/w ...} Here, the w option means: process lines matching /^a/ as part of the block, until it stops. When the line does not match, that is not part of the section. And suppose your input file looks like this:

Section1

paragraph

Section2

par2 In this case,

and paragraph should go together. To accomplish this, use the following section definition: { /

/ action1; ... } Here, the lack of the ending regex means to process

line and all lines until, but not including, the next

as part of the section. Presumably, EVERTHING after the final match of the regex is meant to go into the final section. CONDITIONAL EXECUTION OF ACTIONS To reiterate from the command actions described earlier, there are 2 forms of conditional execution: 1. you can restrict the execution of actions only to lines which contain a regular expression: /regex/action; note that the action can be a block: /regex/{action1;a2;a3} 2. You can restrict execution to only those lines with a particular line number within a block. 2,8d; 2,8/stuff/action; combines 1 and 2 3. You can invert regex line selections, and execute the action only if the line does not contain a specified regular expression: !/regex/action; 4. You can create "and" clauses like this: /regex1/ /regex2/ actions; 5. You can create "or" clauses like this: /regex1\|regex2/ !/r1\|/2/ AUTOMATIC NUMBERING To reiterate information from above: 1. sections have numbers 2. lines within sections have numbers 3. lines within the input stream as a whole have numbers. 4. lines within input files have numbers. You can prepend the line number plus a tab to the beginning of any line using one of the following commands: n, N, f, or I. To move the the number somewhere else in the line, you can use the substitute command: N; s/\(^[^\t]\)\t\(.*\)/\2:\1/1; This command prefixes the current line with the section number and a tab. Then, the substitute command discards the tab and puts the section number at the end of the line -- preceeded by a colon. EXAMPLES 1: suppose you want extract only the generated suppressions from a valgrind output, which looks like this: == 459 == Valgrind ... == 459 == Some Error ... { generated suppression text } == 459 == Other Error { generated suppression text } ... That is, there is a lots of text in which a small number of {} blocks are interspersed. These blocks begin and end in column 1 of the line on which they occur. lbsplit -n valgrind.txt -S '{/^{/,/^}/.;}+' This command discards all text which is not part of a {} block (-n). It simply prints the block, including the {}'s. The trailing + means to repeat the section ad-infinitum 2: Suppose you want to filter duplicate blocks of text which occur in a file formatted like this: blockstart middle blockend ... The following command can be used: lbsplit -n file \ -S '{/^blockstart/,/^blockend/ $/|/; A{$//; p/\n/;};}+' | sort -u | tr '|' '\n' This command does the following: 1. it suppress all text that is not part of a blockstart/blockend pair. 2. it converts the end of line string from \n to |. 3. After the block is finished, it converts the end of line string to nothing, then prints \n. This leaves the entire text of a block as a single giant line of text. 4. piping this output to the sort command, and using the -u option eliminates duplicate blocks (which are not just single lines fed to the sort command). 5. the final 'tr' command converts '|' back to newline to that the blocks have their proper line splits. SEE FAQ.txt in the source distribution for more examples and ideas. SEE ALSO csplit, split, sed, grep, perl, cut