Pipeline in Java
Version 2 - January 2005
Laurent Paris ( lparis@fr.ibm.com )


Introduction

This document describes the java PIPELINE application, adaptation of the VM PIPE command (adapted from UNIX's original pipeline concept). Just as John Hartmann has implemented the concept in a VM-ish way, I tried to implement it in Java.
I would like to highlight the fact that it is not an official project, development, but a personal development, especially for Java training. I began development last year and continue to work on. (3 years ago, I developed the pipe-db2.nsf which was a tiny adaptation of pipeline under Notes).
The best source of information about pipelines is John Hartmann's book CMS/TSO Pipelines (document number SL26-0018-02)
All the documentation related to the Pipe Stages, are issued from VM and adapted (as possible ) for this current pipe implementation.

General Topics

  • What's a pipeline
  • What to do with Pipeline
  • Differences from VM pipelines

    Technical Topics

  • Installation
  • How it works
  • Run it as a command line processor or Applet
  • Data flow

    Pipeline by examples

    The main approach is to run and understand the examples given then after, you can go down more in detail by reading the different built-ins stages.

  • start with Pipe
  • example1: extract [categories] from the win.ini file
  • example2: count number of vars into win.ini file
  • example3: Make sum of numbers using APL primitives

    The Stages reference documentation contain a full of examples to give your a better view of what you can do with.

    Pipeline more in detail

  • pipe command the syntax!
  • Pipeline - Stages reference Detailed description of the 71 stages in the core system.
    (you could yourself generate the 'stages reference' html doc by running: pipe "? * * ! > pipeline.html"
    All the 'Stages reference' documentation is part of the pipeline.jar file and can be viewed online.
    Each stage documentation has the following chapters:
    - Syntax
    - Purpose
    - Operands
    - Streams
    - Examples
      A certain number of examples can be run by pipeline by using the 'runcases' stage.
      ex: to play examples of the stage 'Preface':
           pipe "runcases preface"
      (Only Stages having tables of data and results in bold can be played by pipeline) 
    
  • Calling pipeline in Java

    What's a pipeline


    A pipeline is a way for the output of one program to be routed more or less directly into the input of another program.

    In a system with useful pipelines, one finds that some programs read data from an external source, or write it to an external destination, but many just process data already in the pipeline. This leads to substantial code reuse, since a piece of processing becomes independent from the source and target of the data it processes.

         +------+     +------+     +------+     +------+
         !      !     !      !     !      !     !      !
         ! Pgm1 !---->! Pgm2 !---->! Pgm3 !---->! Pgm4 !
         !      !     !      !     !      !     !      !
         +------+     +------+     +------+     +------+
    

    Of course, all operating systems allow this to some degree, but there are two crucial facts that distinguish pipelines:

    Efficiency In most systems, the programs run one at a time, and the data is buffered in files between programs. Because of the cost of the I/O time, programs tend to be large and do as much as possible at once. This, of course, reduces both their reliability and their reusability. With pipelines, the programs are all running at once, and the data flows directly between them. This removes the I/O cost, and allows smaller programs that do just one thing (and do it well), which improves both their reliability and reusability. In fact, in a mature system with a lot of utilities, one often finds that no code at all needs to be written!
    Ease of use In most systems, the details of the intermediate files need to be specified. With pipelines, all you have to do is give the names of the programs, with a single character between them. This makes ad-hoc usage much easier.


    Differences from VM pipelines


    It is an application in Java term and is called from a DOS windows, so any special characters interpreted by the DOS command interpreter would be stripped before to be passed to the pipe application.

    The < and > characters are stripped by the OS command interpreter to perform I/O redirection, and are thus not very convenient for use as stage names.

    Then, the best to pass a pipeline command is to enclose it with " , as:

      C:\> pipe "< input.fil ! > append.fil"
    

    An also fundamental difference is that under VM all the stages are "managed" by the internal pipeline dispatcher and it provides a certain synchronization. In this Java application, all the stages are independant Threads and it is impossible to "force" a synchronization because each Thread runs at its own speed. Generally there is no convenience but we can get unattendee results with the Juxtapose gateway stage...which require a synchronization in records flowing !


    Installation


    Warning: pipeline.jar doesn't contain the bluepages and mqseries classes that are separate jar files.
    So if you use the Bluepages (or Mqseries) stage in your command, ensure your classpath points to these corresponding jar files.


    How it works


    Pipeline processes its pipeline command in the following manner: (pipeline command is the string passed in arguments)

  • run its scanner to:
  • run its dispatcher to:
  • run its runner (bip bip...road runner)

    Run it as a command line processor or Applet


    Pipeline can run in two ways:

  • as a command line application:
  • as an applet:

    Data flow


    They are two types of streams for a stage:

  • input streams - the stage picks up data from its input streams
  • output streams - the stage writes data to its output streams

    One output stream is connected to one input stream and data flows from the one to the other (inspired from PipedInputStream/PipedOutputStream).
    Stages can have 0 to n input streams and 0 to n output streams depending on the function of the stage...and also of the topology of connections established between them in the pipeline command.
    Example: Examine the following pipeline cmmand:

    pipe "< file1 !c: collate 1.6 ! > match1 ? < file2 ! c: ! > match2 ? c: ! > match3"
    
    We can see three different pipelines:
    
      < file1 !c: collate 1.6 ! > match1
      < file2 ! c: ! > match2
      c: ! > match3
    
    The label "c:" allows to connect pipelines them !
    
    This pipeline command matches records between file1 and file2,
    checking is done on columns 1.6 (the key) and output files (match1 to 3)
    will contain:
       - match1   => records from file1 AND file2 that match
       - match2   => records from file1 that do not match those in file2
       - match3   => records from file2 that do not match those in file1
    
    The topology of this pipeline command is:
    
     +-------+       +----------+        +-------+
     |   <   |0     0|          |------->|   >   |
     | file1 |------>|          |0      0|match1 |
     |       |       |          |        +-------+
     +-------+       | collate  |        +-------+
                     |   1.6    |------->|   >   |
                     |          |1      0|match2 |
     +-------+       |          |        +-------+
     |   <   |0     1|          |        +-------+
     | file2 |------>|          |------->|   >   |
     |       |       +----------+2      0|match3 |
     +-------+                           +-------+
    
    
    We see that collate has 2 input streams (0,1) and 3 output streams (0,1,2).
    And "< file1","< file2" have only one output stream (0), and
    "> match1" (match2,match3) have only one input stream (0).
    
    The connection of output streams and input streams is performed by
    the dispatcher of pipeline and each stage can check how many of these
    potential streams (input and output) are connected, that can
    change its behaviour!
    
    
    


    What to do with Pipeline


    Pipeline is a tool with many built-in programs to help you to process data without programming. You have just to assemble pipe stages together to accomplish the task you want to do.

    Each stage is like a small program that processes data coming from its predecessor program(s), then pass processed data to its successor(s).

    You can use pipe as:

  • using it in a MSDos-Window ( pipe "stage ! stage ! .... ! stage" ) or under Aix, Linux, NT, ... any OS supporting java.
  • calling it in a Java class (see chapter: calling Pipeline in java)

    You can also write your own stages (.class) to pipe (see chapter: write my own stages)


    Start with Pipe


    You can use PIPE in following manners:


    In single command, don't miss to enclose your pipeline command with " to prevent operating system interpretation !

    Stages are organized into several types: (for more detail, see Pipeline - Stages reference)


    the two basic stages: LITERAL, CONSOLE

  • LITERAL stage is used to write literal string to the pipeline
  • CONSOLE stage is used to displays records from pipeline to the screen

    This pipe command just displays 'Hello World' onto the screen.

        c:\> pipe "literal Hello World !console"
    

    Reading file, formatting records and writing results into other file

  • < stage is used to read records from text file
  • SPECS stage is used to reformat records
    This is the most important stage to format records as you wish.
  • > stage is used ot write records into file

    This pipe command read config.sys file, append '*' in front of records and write them into another file.

        c:\> pipe "< c:\config.sys ! specs /*/ 1 1-* n! > c:\config.star"
    

    the SPECS stage above:


    Reading autoexec.bat, keep lines containing SET and displays them

  • < stage is used to read records from the file
  • LOCATE stage is used to keep records containing string
  • CHOP stage is used to have lines of 70 chars length
  • CONSOLE displays lines to the screen
       pipe "< c:\autoexec.bat! locate /SET/ ! chop 70 ! console"
    

    Note that it consists of the PIPE command, followed by four stages (<, LOCATE, CHOP, and CONSOLE), three of which have additional parameters. If you've got PIPE installed on your machine, try it and then let's analyze it in more detail.

    First, the < stage (which is a device driver) reads records from your AUTOEXEC.BAT file (a control file used by Windows), and sends the records one at a time down the pipe.

    Next, LOCATE (a filter) gets them and scans each of them to see if it contains the string SET. Those that do continue down the pipe; those that don't get discarded (actually, you can route them into another stream, but let's not worry about that now).

    Now CHOP (another filter) gets the remaining records, and chops them to a maximum length of 70 bytes.

    Finally, CONSOLE (a device driver) takes CHOP's output and displays it on your terminal.


    Example1: Extract categories ([..]) from the c:\windows\win.ini


    To extract categories from this file we have to:

    This can achieve with the following command:

      pipe "< c:\windows\win.ini! find [! change /[/ //!change /]/ //!console"
    

    Details about the stages used here:



    Example2: count number of vars into c:\windows\win.ini


    To count variables we have to:

    This can achieve with the following command:

      pipe "< c:\windows\win.ini! locate /=/ ! count lines ! specs /win.ini contains:/ 1 w1 nw /vars./ nw ! console"
    

    Details about the stages used here:



    example3: Make sum of numbers using APL primitives


    To make sum of numeric values, (ie: 15 14 27 18) use following command:

      pipe "literal 15 14 27 18!specs ,+/, 1 1-* n!apl! console"
    

    Details about the stages used here:



    PIPE syntax

    
                                         <--- endchar ---<
      >>--- pipe ----! options !------------ pipeline -------------------><
    
    
    options:
      >--- ( --.------------------------.- ) ----------------------------><
               !--- STAGESep --- char --!
               !--- SEParator -- char --!
               !--- ENDchar ---- char --!
               '--- TRACE --------------'
    
    pipeline:
    
              <----- sepchar -----<
      >-------------- stage ----------------><
    
    endchar
      Character to be defined in the options.
    
    sepchar
      Default character is !
    
    char
      any character to for separating stages, pipelines.
    
    
    STAGEsep
    SEParator
           for defining the stage separator (called sepchar above) 
    
    ENDchar
           for defining the pipeline separator in case of multi-streams
    
    TRACE
            with trace option, pipe will display the detailed process
            all the primary calls are listed.
            (peek data, read data, output data, check if stage connected...)
    
                                   
    
    

    The PIPE command takes as parameters one or more pipeline specifications, optionally preceded by global options enclosed in parentheses. Each pipeline consists of one or more stages, which are separated by a stage separator character.

    Use option STAGESEP or SEPARATOR to override the default stage separator character, which is an exclamation mark (!). The following characters are not allowed as stage separator: left and right parenthesis, asterisk, period, colon, blank, and null.

    Use option ENDCHAR to specify a pipeline end character, also known as the pipeline separator char. The following characters are not allowed as endchar: left and right parenthesis, asterisk, period, colon, blank, and null.


    Calling pipeline in Java

    Now, to get/set data from/to pipeline, use var or stem stages as:

    //
    // import the pipeline package
    //
    import com.ibm.lparis.pipeline.core.*;
    ....
         public String[] line;    //to get lines from pipeline result
                                  //YOU must declare it public to give
                                  //pipeline possibility to get/set it !!
    
         String pipe_command="< c:\\autoexec.bat!locate /SET/!stem line";
    
         Pipeline pipe=new Pipeline(pipe_command,this); //pass my reference (this) to pipeline
                                                        //for him to be able to interact
                                                        //with my line variable !!!
                                    
         pipe.run();   //Run the pipeline command...
         for (int i=0;i