





                                          
                                          
                                          
                                          
          *****************************************************************
          *                                                               *
          *                                                               *
          *                        VANILLA SNOBOL4                        *
          *                                                               *
          *                 TUTORIAL AND REFERENCE MANUAL                 *
          *                                                               *
          *           (c) Copyright 1985, 1988 by Catspaw, Inc.           *
          *                                                               *
          *****************************************************************
                                          
                                          
                                          
                                    Mark B. Emmer
                                          
                                    Catspaw, Inc.
                                    P.O. Box 1123
                             Salida, Colorado 81201 USA
                              Telephone: (719) 539-3884
            

            Vanilla SNOBOL4 and all accompanying documentation are copy-
            righted materials.  However, they may be copied and shared
            provided the following terms are adhered to:

            1. No fee greater than $10 is charged for use, copying or dis-
               tribution.

            2. SNOBOL4.EXE and all documentation are not modified in any
               way, and are distributed together.

            3. The manual may not be packaged with any other product.

            4. Neither SNOBOL4+ (our commercial product), nor its printed
               manual, may be copied.

            Vanilla SNOBOL4 was released because we believe many people
            would enjoy programming in SNOBOL4, if there was a version of
            the language that was widely and freely available.  Contribu-
            tions are NOT requested.  Enjoy and share it!





                                                          TABLE OF CONTENTS
          -----------------------------------------------------------------



                              PART I -- GETTING STARTED

          Chapter 1      Getting Started                                  1
            1.1            About This Manual.........................1
            1.2            Installing Vanilla SNOBOL4................1
            1.3            An Example................................2

          Chapter 2      First Program                                    4
            2.1            A First Program...........................4
            2.2            Interactive Statement Execution...........6


                                 PART II -- TUTORIAL

          Chapter 3      Fundamentals                                     8
            3.1            Simple Data Types.........................8
            3.2            Simple Operators.........................10
            3.3            Variables................................14

          Chapter 4      Control Flow and Functions                      17
            4.1            Success and Failure......................17
            4.2            A SNOBOL4 Statement......................18
            4.3            Built-In Functions.......................19

          Chapter 5      Input/Output and Keywords                       23
            5.1            Input/Output.............................23
            5.2            Keywords.................................26
            5.3            Programs Without Pattern Matching........28

          Chapter 6      Pattern Matching                                30
            6.1            Introduction.............................30
            6.2            Specifying Pattern Matching..............31
            6.3            Subject String...........................31
            6.4            Pattern Subsequents and Alternates.......32
            6.5            Simple Pattern Matches...................33
            6.6            The Pattern Data Type....................34
            6.7            Capturing Match Results..................34
            6.8            Unknowns.................................35
            6.9            Pattern Matching with Replacement........42
            6.10           Sample Programs..........................44
            6.11           Anchored and Unanchored Matching.........48

          Chapter 7      Additional Operators and Data Types             49
            7.1            Indirect Reference.......................49
            7.2            Unevaluated Expressions..................52
            7.3            Immediate Assignment.....................53
            7.4            Arrays...................................55
            7.5            Tables...................................57
            7.6            The Name Operator........................61



                                       - i -                                 





          Chapter 8      Program-Defined Objects                         63
            8.1            Program-Defined Functions................63
            8.2            Program-Defined Data Types...............71
            8.3            Program-Defined Operators................74

          Chapter 9      Advanced Topics                                 77
            9.1            The ARBNO Function.......................77
            9.2            Recursive Patterns.......................78
            9.3            Quickscan and Fullscan...................78
            9.4            Other Primitive Patterns.................80
            9.5            Other Functions..........................82
            9.6            Other Unary Operators....................83
            9.7            Run-time Compilation.....................83

          Chapter 10     Debugging and Program Efficiency                86
            10.1           Debugging and Tracing....................86
            10.2           Execution Tracing........................90
            10.3           Program Efficiency.......................93

          Chapter 11     Concluding Remarks                              95


                            PART III -- REFERENCE MANUAL

          Chapter 12     Introduction                                    96
            12.1           Language Background......................96

          Chapter 13     Running a SNOBOL4 Program                       98
            13.1           Basic Command Line Format................98
            13.2           Providing Your Own Parameters............99
            13.3           Command Line Examples...................100

          Chapter 14     Statements                                     101
            14.1           Comment Statements......................101
            14.2           Control Statements......................101
            14.3           Program Statements......................102
            14.4           Continuation Statements.................105
            14.5           Multiple Statements.....................105
            14.6           The END Statement.......................106

          Chapter 15     Operators                                      107
            15.1           Unary Operators.........................107
            15.2           Binary Operators........................108

          Chapter 16     Keywords                                       109
            16.1           Protected Keywords......................109
            16.2           Unprotected Keywords....................110
            16.3           Special Names...........................112

          Chapter 17     Data Types and Conversion                      113
            17.1           Data Type Names.........................113
            17.2           Data Type Conversion....................116





                                       - ii -                                





          Chapter 18     Patterns and Pattern Functions                 121
            18.1           Primitive Patterns......................121
            18.2           Primitive Pattern Functions.............122

          Chapter 19     Built-In Functions                             124

          Chapter 20     System Messages                                138
            20.1           Initial Messages........................138
            20.2           Termination Messages....................138
            20.3           Compilation Messages....................139
            20.4           Execution Error Messages................141
            20.5           Execution Trace Messages................144













































                                      - iii -                                





                                                                  Chapter 1


                                                               INSTALLATION
          -----------------------------------------------------------------

            Welcome to the world of SNOBOL4!  It's a world where you can
          manipulate text and search for patterns in a simple and natural
          manner.  SNOBOL4 is a completely general programming language,
          and its magic extends far beyond the world of text processing.
          Concise, powerful programs are easy to write.  In addition,
          SNOBOL4's pattern programming provides a new way to work with
          computers.  If you would like to add SNOBOL4 to your repertoire
          of problem-solving tools, and learn why so many people are
          excited about it, read on.


                                1.1 ABOUT THIS MANUAL

            This manual is divided into three parts.  This part, "Getting
          Started," shows you how to create and run small programs with
          SNOBOL4.

            Part II, "Tutorial," is addressed to the beginning SNOBOL4 pro-
          grammer.  It assumes a modest knowledge of general programming
          concepts, and experience with another high-level language, such
          as BASIC, C, FORTRAN, or Pascal.  Readers without any programming
          background may wish to consult books written with them in mind:
          "A SNOBOL4 Primer" and "SNOBOL Programming for the Humanities,"
          listed in the file SNOBOL4.DOC.

            Part III, "Reference," is a complete description of Vanilla
          SNOBOL4.  If you are already familiar with the SNOBOL4 language,
          you may wish to skip the tutorial section and proceed directly to
          the reference section for specific details.  Later, you can
          return to the tutorial section for fresh insight into the lan-
          guage's use.


                           1.2 INSTALLING VANILLA SNOBOL4


          1.2.1 System Requirements

            SNOBOL4 requires the following:

            1. IBM PC, XT, AT, or any other 8086/88/186/286/386 family com-
               puter.  Your computer need not be an IBM PC look-alike;
               SNOBOL4 requires MS-DOS compatibility only.

            2. PC- or MS-DOS, Version 2.0 or above.

            3. 105K bytes of free RAM memory.




       Getting Started                 - 1 -                     Installation





          1.2.2 Making a Backup Copy

            The Vanilla SNOBOL4 distribution disk should never be used for
          production work.  Always make a backup copy, and use it for your
          day-to-day activities:

            1. Use the DOS FORMAT command to initialize a new, blank
               diskette.

            2. If your system has two 5-1/4 inch diskette drives, place the
               SNOBOL4 diskette in drive A, and the new disk in drive B,
               and type:

               DISKCOPY A: B:

            3. If you have only one diskette drive, enter:

               DISKCOPY A: A:

               and follow the instructions for swapping diskettes.  The
               Vanilla SNOBOL4 diskette is the Source diskette, while the
               newly formatted diskette is the Target.

            If you have a fixed disk, you may create a subdirectory for
          SNOBOL4, and copy all of the SNOBOL4 disk to it.


          1.2.3 Initial Checkout

            Place your backup disk in the default drive, and play a game of
          Tick-Tack-Toe.  Our examples will assume a two-drive system,
          using drive B as the default drive.  If you have a one-drive sys-
          tem, or are running SNOBOL4 from the fixed disk, your screen will
          display a different default drive letter (A or C).  Enter:

               B>SNOBOL4 TICTAC

            The SNOBOL4 program should load, and compile the Tick-Tack-Toe
          program.  The game will begin execution, and display instruc-
          tions.


                                   1.3 AN EXAMPLE

            Just to get a feel for where we're going, let's take a look at
          a small SNOBOL4 program.  It produces a sorted list of the words
          in a file, along with a count of how many times each word ap-
          pears.  Don't be concerned if you don't understand the program; I
          just want to give you a taste of the language:








       Getting Started                 - 2 -                     Installation





               * Trim input, set up constants, and create table to
               *  hold word counts
                       &TRIM   =  1
                       WRDPAT  =  BREAK(&LCASE) SPAN(&LCASE "-'") . WORD
                       TALLY   =  TABLE()
               
               * Read a line, convert upper case letters to lower case
               READ    LINE     =  REPLACE(INPUT,&UCASE,&LCASE) :F(CONVERT)
               
               * Get and remove next word from LINE, place in variable WORD
               NEXTWRD LINE  WRDPAT =                            :F(READ)
               
               * Increment the count for this word
                       TALLY[WORD] =  TALLY[WORD] + 1            :(NEXTWRD)
               
               * Convert the table to an array
               CONVERT RESULT  =  CONVERT(TALLY, "ARRAY")        :F(NONE)
               
               * Display the results
                       OUTPUT  =  "Word Counts"
                       I       =  1
               PRINT   OUTPUT  =  RESULT[I,1] " - " RESULT[I,2]  :F(END)
                       I       =  I + 1                          :(PRINT)
               NONE    OUTPUT  =  "There aren't any words!"
               END

            Running the program with the sample text on the disk as input
          would produce a usage count like this:

               Word Counts
               hark - 2
               the - 1
               lark - 1
               at - 2
               heaven's - 1
                . . .

            Notice some of the things that seem to occur so effortlessly
          here:  A word is defined to be any combination of lower case let-
          ters, hyphen, and apostrophe.  Data from the file are converted
          to lower case.  A table of word counts uses the words themselves
          as subscripts.  The table is converted to an array in one state-
          ment, and printed without any knowledge of the array's size.
          Finally, because the definition of a word is contained in one
          succinct pattern, it's easy to modify the program to catalog
          other kinds of text patterns.

            Excluding comments and the END statement, there are 12 working
          statements in this program---and this program uses only a frac-
          tion of SNOBOL4's power.  How much work would it be to write such
          a program in any other language you are familiar with?  Is it
          possible that there is something unique about SNOBOL4?

            Let's go on now to write a simple first program.



       Getting Started                 - 3 -                     Installation





                                                                  Chapter 2


                                                              FIRST PROGRAM
          -----------------------------------------------------------------


                                 2.1 A FIRST PROGRAM

            For the following exercises, you should have SNOBOL4 available
          on your default disk drive, or in your default directory if a
          fixed disk is used.  This manual assumes that drive B is your
          default disk drive, and will show the DOS prompt as "B>".  Users
          with other hardware configurations may see "A>" or "C>".

            We will begin with a very simple program, one that prints a
          greeting on your computer's display screen.  It will familiarize
          you with the mechanics of running a SNOBOL4 program.  Every line
          you enter from the keyboard (or "console") should end by pressing
          the ENTER key (marked ).

            You start the system by typing SNOBOL4 CON at the DOS command
          prompt B>.  SNOBOL4 displays two title lines and prompts you to
          enter your program with a question mark on each line:

               B>SNOBOL4 CON
               
               Vanilla SNOBOL4      Version 2.14.
               (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
               Enter program, terminate with "END"
               ?

            Now enter the program.  Use the tab character to begin the
          indented line, and be sure to place blanks on each side of the
          equal sign:

               ?       OUTPUT = 'Hello world!'
               ?END
               
               No errors
               
               Hello world!
               
               B>

            As you enter each line, it is compiled into a compact internal
          notation.  The first program line begins with a tab; the second
          is flush left.  The word END is special; it signals SNOBOL4 that
          you have finished entering program lines.  It must appear at the
          left margin to be recognized.  After the END statement is
          entered, SNOBOL4 begins to run your program.

            This program consists of one "assignment statement."
          Assignment takes the value on the right side of the equals sign,



       Getting Started                 - 4 -                    First Program





          and stores it in the "variable" on the left.  The value on the
          right is the character string literal 'Hello world!'.  The
          variable's name is OUTPUT, which is a special name in SNOBOL4;
          values assigned to it are displayed on the screen.  After the
          assignment statement is performed, control flows into the END
          statement and the program stops.

            SNOBOL4 only provides DOS in-line editing as you enter your
          program.  It is not a program editor, and does not save your pro-
          gram or let you correct mistakes in previous program lines.  Usu-
          ally, you'll want to prepare your program in a disk file.

            Try creating a program file in DOS.  The symbol ^Z represents
          the DOS End-of-File character, which terminates the DOS COPY com-
          mand.  It is created by entering control-Z or pressing function
          key 6.

               B>COPY CON HELLO.SNO
                       OUTPUT = 'Hello world!'
               END
               ^Z
                       1 File(s) copied
               B>

            Now you can have SNOBOL4 read and execute your program from
          file HELLO.SNO:

               B>SNOBOL4 HELLO.SNO
               
               Vanilla SNOBOL4      Version 2.14.
               (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
               
               No errors
               
               Hello world!
               
               B>

            Of course, the program file could also have been created with
          your program text editor.  If you are using a word processor,
          remember to produce an unadulterated ASCII file, free of any spe-
          cial format controls.

            SNOBOL4 assigns a unique number to each program statement.  The
          statement number and line number are displayed whenever an error
          message is produced.  To get a listing of your program with
          SNOBOL4's statement numbers, try:










       Getting Started                 - 5 -                    First Program





               B>SNOBOL4 HELLO /L=CON
               
               Vanilla SNOBOL4      Version 2.14.
               (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
               1               OUTPUT = 'Hello world!'
               2       END
               
               No errors
               
               Hello world!
               
               B>

            The first line, on which you typed SNOBOL4, is called the com-
          mand line.  It may contain options that alter SNOBOL4's behavior.
          The option /L= tells SNOBOL4 to send a listing of your source
          file to the specified file or device.  Another device, such as
          PRN:, would print a listing on your printer.  Other command line
          options are discussed in Chapter 13, "Running a SNOBOL4 Program."

            In this example we omitted the file name extension.  SNOBOL4
          will supply the .SNO extension for the source file if it is
          absent.

            You've now run a simple SNOBOL4 program in two ways: by typing
          it in directly, and by creating a disk file.


                         2.2 INTERACTIVE STATEMENT EXECUTION

            It's very helpful to "try out" simple statements as they are
          introduced in the text.  There is a SNOBOL4 program called
          CODE.SNO on the distribution diskette to help you do this.  Try
          it now with a few simple statements.  Type END or control-Z to
          stop the program.

               B>SNOBOL4 CODE
               
               Vanilla SNOBOL4      Version 2.14.
               (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
               
               No errors
               
               Enter SNOBOL4 statements:
               ?       OUTPUT = 'HELLO AGAIN!'
               HELLO AGAIN!
               Success
               ?       OUTPUT = 16
               16
               Success
               ?END
               
               B>




       Getting Started                 - 6 -                    First Program





            Feel free to experiment---you can't break anything by using
          this program.  At most, you will get a SNOBOL4 error, and return
          to the DOS command prompt.  In that case, just start SNOBOL4 and
          CODE.SNO over again.

            Whenever you see examples in the text that begin with a ques-
          tion mark, they are meant to be tried with CODE.SNO.  In the text
          I'll omit the word Success most of the time unless it is relevant
          to the concept being presented, although it will still appear on
          your display.  I'll also try to restrict the examples to upper
          case, so you can set the CAPS LOCK mode on your computer, and
          type without using the shift key.

            Let's now proceed to the tutorial.











































       Getting Started                 - 7 -                    First Program





                                                                  Chapter 3


                                                               FUNDAMENTALS
          -----------------------------------------------------------------

            SNOBOL4 is really a combination of two kinds of languages:  a
          conventional language, with several data types and a simple but
          powerful control structure, and a pattern language, with a struc-
          ture all its own.  The conventional language is not block struc-
          tured, and may appear old-fashioned.  The pattern language,
          however, remains unsurpassed, and is unique to SNOBOL4.

            You should try to master the conventional portion of SNOBOL4
          first.  When you're comfortable with it, you can move on to pat-
          tern matching.  Pattern matching by itself is a very large sub-
          ject, and this manual can only offer an introduction.  The sample
          programs accompanying Vanilla SNOBOL4, as well as the many
          SNOBOL4 books available from Catspaw can be studied for a deeper
          understanding of patterns and their application.

            We'll begin by discussing data types, operators, and variables.


                                3.1 SIMPLE DATA TYPES

            SNOBOL4 has several different basic types, but has a mechanism
          to define hundreds more as aggregates of others.  Initially,
          we'll discuss the two most basic:  integers and strings.


          3.1.1 Integers

            An integer is a simple whole number, without a fractional part.
          In SNOBOL4, its value can range from -32767 to +32767.  It ap-
          pears without quotation marks, and commas should not be used to
          group digits.  Here are some acceptable integers:

               14    -234    0    0012    +12832    -9395    +0

            These are incorrect in SNOBOL4:

             13.4             fractional part is not allowed

             49723            larger than 32767

             -                number must contain at least one digit

             3,076            comma is not allowed

            Use the CODE.SNO program to test different integer values.  Try
          both legal and illegal values.  Here are some sample test lines:





       Tutorial                        - 8 -                     Fundamentals





               Enter SNOBOL4 statements:
               ?       OUTPUT = 42
               42
               ?       OUTPUT = -825
               -825
               ?       OUTPUT = 73768
               Compilation error: Erroneous integer, re-enter:


          3.1.2 Reals

            Vanilla SNOBOL4 does not include real numbers.  They are
          available in SNOBOL4+, Catspaw's highly enhanced implementation
          of the SNOBOL4 programming language.


          3.1.3 Strings

            A string is an ordered sequence of characters.  The order of
          the characters is important: the strings AB and BA are different.
          Characters are not restricted to printing characters; all of the
          256 combinations possible in an 8-bit byte are allowed.

            Normally, the maximum length of a string is 5,000 characters,
          although you can tell SNOBOL4 to accept longer strings.  A string
          of length zero (no characters) is called the null string.  At
          first, you may find the idea of an empty string disturbing:  it's
          a string, but it has no characters.  Its role in SNOBOL4 is simi-
          lar to the role of zero in the natural number system.

            Strings may appear literally in your program, or may be created
          during execution.  To place a literal string in your program, en-
          close it in apostrophes (')1 or double quotation marks (").
          Either may be used, but the beginning and ending marks must be
          the same. The string itself may contain one type of mark if the
          other is used to enclose the string.  The null string is repre-
          sented by two successive marks, with no intervening characters.
          Here are some samples to try with CODE.SNO:










          ____________________

            1 Apostrophe (single quote) should not be confused with the
          grave accent mark (`) which appears next to it on some computer
          keyboards.  The grave accent may not be used as a string
          delimiter.



       Tutorial                        - 9 -                     Fundamentals





               ?       OUTPUT = 'STRING LITERAL'
               STRING LITERAL
               ?       OUTPUT = "So is this"
               So is this
               ?       OUTPUT = ''
               
               ?       OUTPUT = 'WHO COINED THE WORD "BYTE"?'
               WHO COINED THE WORD "BYTE"?
               ?       OUTPUT = "WON'T"
               WON'T


                                3.2 SIMPLE OPERATORS

            If data is the raw material, operators are the tools that do
          the work.  Some operators, such as + and -, appear in all pro-
          gramming languages, and pocket calculators.  But SNOBOL4 provides
          many more, some of which are unique to the SNOBOL4 language.
          SNOBOL4 also allows you to define your own operators.  We'll
          examine just a few basic operators below.


          3.2.1 Unary vs. Binary

            SNOBOL4 operators require either one or two items of data,
          called operands.  For example, the minus sign (-) can be used
          with one object.  In this form, the operator is considered unary:

               -6

          or as a binary operator with two operands:

               4 - 1

            In the first case, the minus sign negates the number.  The sec-
          ond example subtracts 1 from 4.  The minus sign's meaning depends
          on the context in which it appears.  SNOBOL4 has a very simple
          rule for determining if an operator is binary or unary:

               Unary operators are placed immediately to the left of
               their operand.  No blank or tab character may appear
               between operator and operand.

               Binary operators have one or more blank or tab charac-
               ters on each side.

            The blank or tab requirement for binary operators causes prob-
          lems for programmers first learning SNOBOL4.  Most other lan-
          guages make these white space characters optional.  Omitting the
          right hand blank after a binary operator will produce a unary
          operator, and while the statement may be syntactically correct,
          it will probably produce unexpected results.  Fortunately, blanks
          and binary operators quickly become a way of SNOBOL4 life, and
          after some initial forgetfulness there are few problems.



       Tutorial                        - 10 -                    Fundamentals





          3.2.2 Some Binary Operators


          Operation:     Assignment
          Symbol:        = (equals sign)

            You've already met one binary operator, the equals sign (=).
          It appeared in the first sample program:

                       OUTPUT = 'Hello world!'

            It assigns, or transfers, the value of the object on the right
          ('Hello world!') to the object on the left (variable OUTPUT).


          Operation:     Arithmetic
          Symbols:       **, *, /, +, -

            These characters provide the arithmetic operations---exponenti-
          ation, multiplication, division, addition, and subtraction
          respectively.  Each is assigned a priority, so SNOBOL4 knows
          which to perform first if more than one appear in an expression.
          Exponentiation is performed first, followed by multiplication,
          division, and finally addition and subtraction.  SNOBOL4 is
          unusual in giving multiplication higher priority than division;
          most programming languages treat them equally.

            You may use parentheses to change the order of operations.
          Division of an integer by another integer will produce a trun-
          cated integer result; the fractional result is discarded.  Try
          the following:

               ?       OUTPUT = 3 - 6 + 2
               -1
               ?       OUTPUT = 2 * (10 + 4)
               28
               ?       OUTPUT = 7 / 4
               1
               ?       OUTPUT = 3 ** 5
               243
               ?       OUTPUT = 10 / 2 * 5
               1
               ?       OUTPUT = (10 / 2) * 5
               25

            When the same operator occurs more than once in an expression,
          which one should be performed first?  The governing principle is
          called associativity, and is either left or right.  Multiple
          instances of *, /, + and - are performed left to right, while
          **'s are performed right to left.  Again, parentheses may be used
          to change the default order.  Try a few examples:






       Tutorial                        - 11 -                    Fundamentals





               ?       OUTPUT = 24 / 4 / 2
               3
               ?       OUTPUT = 24 / (4 / 2)
               12
               ?       OUTPUT = 2 ** 2 ** 3
               256
               ?       OUTPUT = (2 ** 2) ** 3
               64

            Here's the first bit of SNOBOL4 magic: what happens if either
          operand is a string rather than an integer or real number?  The
          action taken is one which is widespread throughout the SNOBOL4
          language; the system tries to convert the operand to a suitable
          data type.  Given the statement

               ?       OUTPUT = 14 + '54'
               68

          SNOBOL4 detects the addition of an integer and a string, and
          tries to convert the string to a numeric value.  Here the conver-
          sion succeeds, and the integers 14 and 54 are added together.  If
          the characters in the string do not form an acceptable integer,
          SNOBOL4 produces the error message "Illegal data type."

            SNOBOL4 is strict about the composition of strings being con-
          verted to numeric values: leading or trailing blanks or tabs are
          not allowed.  The null string is permitted, and converted to
          integer 0.  Try producing some arithmetic errors:

               ?       OUTPUT = 14 + ' 54'
               Execution error #1, Illegal data type
               Failure
               ?       OUTPUT = 'A' + 1
               Execution error #1, Illegal data type
               Failure

          Note:  Error numbers are listed in Chapter 20, "System Messages."


          Operation:     Concatenation
          Symbols:       blank or tab

            This is the fundamental operator for assembling strings.  Two
          strings are concatenated simply by writing one after the other,
          with one or more blank or tab characters between them.  There is
          no explicit symbol for concatenation (it is special in this
          regard), the white space between two objects serves to define
          this operator.  The blank or tab character merely specifies the
          operation; it is not included in the resulting string.

            The string that results from concatenation is the right string
          appended to the end of the left.  The two strings remain
          unchanged and a third string emerges as the result.  Try a few
          simple concatenations with CODE.SNO:



       Tutorial                        - 12 -                    Fundamentals





               ?       OUTPUT = 'CONCAT' 'ENATION'
               CONCATENATION
               ?       OUTPUT = 'ONE,' 'TWO,' 'THREE'
               ONE,TWO,THREE
               ?       OUTPUT = 'A'                 'B'       'C'
               ABC
               ?       OUTPUT = 'BEGINNING '   'AND '   'END.'
               BEGINNING AND END.

            The string resulting from concatenation can not be longer than
          the maximum allowable string size.

            The concatenation operator works only on character strings, but
          if an operand is not a string, SNOBOL4 will convert it to its
          string form.  For example,

               ?       OUTPUT = (20 - 17)  ' DOG NIGHT'
               3 DOG NIGHT
               ?       OUTPUT = 19  (12 / 3)
               194

            In the first case, concatenation's right operand is the string
          ' DOG NIGHT', but the left operand is an integer expression
          (20 - 17).  SNOBOL4 performs the subtraction, converts the result
          to the string '3', and produces the final result '3 DOG NIGHT'.
          In the second example, the integer operands are converted to the
          strings '19' and '4', to produce the result string '194'.  This
          is not exactly good math, but it is correct concatenation.

            You must be careful however.  If you accidentally omit an
          operator, SNOBOL4 will think you intended to perform concatena-
          tion.  In the example above, perhaps we omitted a minus sign and
          had really meant to say:

               ?       OUTPUT = 19 - (12 / 3)
               15

            It is always possible for concatenation to automatically con-
          vert a number to a string.  But there is one important exception
          when SNOBOL4 doesn't try to do this: if either operand is the
          null string, the other operand is returned unchanged.  It is not
          coerced into the string data type.  If the first example were
          changed to:

               ?       OUTPUT = (20 - 17)  ''
               3

          the result is the INTEGER 3.  You'll find you'll use this aspect
          of null string concatenations extensively in your SNOBOL4 pro-
          gramming.

            Before we proceed, let's think about the null string one more
          time as the string equivalent of the number zero.  First of all,
          adding zero to a number does not change its value, and concatena-



       Tutorial                        - 13 -                    Fundamentals





          ting the null string with an object doesn't change it, either.
          Second, just as a calculator is cleared to zero before adding a
          series of numbers, the null string can serve as the starting
          place for concatenating a series of strings.


          3.2.3 Some Unary Operators

            There aren't many interesting unary operators at this point in
          your tour of SNOBOL4.  Most of them appear in connection with
          pattern matching, discussed later.  Note, however, that all unary
          operations are performed before binary operations, unless prece-
          dence is altered by parentheses.


          Operation:     Arithmetic
          Symbols:       +, -

            These unary operators require a single numeric operand, which
          must immediately follow the operator, without an intervening
          blank or tab.  Unary minus (-) changes the arithmetic sign of its
          operand; unary plus (+) leaves the sign unchanged.  If the
          operand is a string, SNOBOL4 will try to convert it to a number.
          The null string is converted to integer 0.  Coercing a string to
          a number with unary plus is a noteworthy technique.  Try unary
          plus and minus with CODE.SNO:

               ?       OUTPUT = -(3 * 5)
               -15
               ?       OUTPUT = +''
               0


                                    3.3 VARIABLES

            A variable is a place to store an item of data.  The number of
          variables you may have is unlimited, provided you give each one a
          unique name.  Think of a variable as a box, marked on the outside
          with a permanent name, able to hold any data value or type.  Many
          programming languages require that you formally declare what kind
          of entity the box will contain---integer, real, string, etc.---
          but SNOBOL4 is more flexible.  A variable's contents may change
          repeatedly during program execution.  The size of the box con-
          tracts or expands as necessary.  One moment it might contain an
          integer, then a 2,000 character string, then the null string; in
          fact, any SNOBOL4 data type.

            There are only a few rules about composing a variable's name
          when it appears in your program:

            1. The name must begin with an upper- or lower-case letter.

            2. If it is more than one character long, the remaining charac-
               ters may be any combination of letters, numbers, or the



       Tutorial                        - 14 -                    Fundamentals





               characters period (.) and underscore (_).

            3. The name may not be longer than the maximum line length (120
               characters).

            Here are some correct SNOBOL4 names:

               WAGER     P23     VerbClause     SUM.OF.SQUARES     Buffer

            Normally, SNOBOL4 performs "case-folding" on names.  Lower-case
          alphabetic characters are changed to upper-case when they appear
          in names---Buffer and BUFFER are equivalent.  Naturally, case-
          folding of data does not occur within a string literal.  Case-
          folding can be disabled by the command line option /C.

            In some languages, the initial value of a new variable is
          undefined.  SNOBOL4 guarantees that a new variable's initial
          value is the null string.  However, except in very small pro-
          grams, you should always initialize variables.  This prevents
          unexpected results when a program is modified or a program seg-
          ment is reexecuted.

            You store something in a variable by making it the object of an
          assignment operation.  You can retrieve its contents simply by
          using it wherever its value is needed.  Using a variable's value
          is nondestructive; the value in the box remains unchanged.  Try
          creating some variables using CODE.SNO:

               ?       ABC = 'EGG'
               ?       OUTPUT = ABC
               EGG
               ?       D = 'SHELL'
               ?       OUTPUT = abc d             (Same as ABC D)
               EGGSHELL
               ?       OUTPUT = NONESUCH          (New variable is null)
               
               ?       OUTPUT = ABC NULL D
               EGGSHELL
               ?       N1 = 43
               ?       D = 17
               ?       OUTPUT = N1 + D
               60
               ?       output = ABC D
               EGG17

            OUTPUT is a variable with special properties; when a value is
          stored in its box, it is also displayed on your screen.  There is
          a corresponding variable named INPUT, which reads data from your
          keyboard.  Its box has no permanent contents.  Whenever SNOBOL4
          is asked to fetch its value, a complete line is read from the
          keyboard and used instead.  If INPUT were used twice in one
          statement, two separate lines of input would be read.  Try these
          examples:




       Tutorial                        - 15 -                    Fundamentals





               ?       OUTPUT = INPUT
               TYPE ANYTHING YOU DESIRE
               TYPE ANYTHING YOU DESIRE
               ?       TWO.LINES = INPUT '-AND-' INPUT
               FIRST LINE
               SECOND LINE
               ?       OUTPUT = TWO.LINES
               FIRST LINE-AND-SECOND LINE

            SNOBOL4 variables are global in scope---any variable may be
          referenced anywhere in the program.














































       Tutorial                        - 16 -                    Fundamentals





                                                                  Chapter 4


                                                 CONTROL FLOW AND FUNCTIONS
          -----------------------------------------------------------------


                               4.1 SUCCESS AND FAILURE

            Success and failure are as important in SNOBOL4 as they are in
          life.  Success and failure are unmistakable signals; something
          either worked, or it didn't.  Significant program conciseness is
          achieved by recognizing that data values and signals are funda-
          mentally different entities.

            The elements of a statement provide values and signals as com-
          putation proceeds.  SNOBOL4 accumulates both, and stops executing
          a particular statement when it finds it cannot succeed.  Program
          flow can be altered based upon this success or failure.

            The success signal will have a value result associated with it.
          In situations in which the signal itself is the desired object,
          the result value may only be the null string.  The failure signal
          has no associated value.  (In some instances, it may be helpful
          to view failure as meaning "failure to produce a result.")

            Previously, we introduced the variable INPUT, which reads a
          line from the keyboard.  In general, INPUT can be made to read
          from any disk file.  The line read may be any character string,
          including the null string if it is an empty line.  If any string
          might appear, then there is no special value we can test for to
          detect End-of-File.  Success and failure provide an elegant
          alternative to testing for special values.

            When we retrieve a value from INPUT, we normally get a string
          and a success signal.  But when End-of-File is encountered, we
          get a failure signal instead, and no value.

            Since control-Z (or function key 6) allows you to enter an End-
          of-File from the keyboard, we can easily demonstrate this type of
          failure.  As you've noticed, the CODE.SNO program reports the
          success or failure of each statement.  So far, all examples have
          succeeded.  Now try this one:

               ?       OUTPUT = INPUT
               ^Z
               Failure

            Success and failure are control signals, and appear only during
          the execution of a statement.  They cannot be stored in a vari-
          able, which holds values only.

            There is much more which can be done with success and failure,
          but to understand their use, you'll need to know how SNOBOL4



       Tutorial                        - 17 -      Control Flow and Functions





          statements are constructed.


                               4.2 A SNOBOL4 STATEMENT

            In general, a SNOBOL4 statement looks like this:

               Label   Statement body                                 :GOTO

            The label is optional, and is omitted by placing a blank or tab
          in the first character position.  The GOTO is also optional, and
          can be eliminated simply by omitting it and the colon.  In fact,
          even the statement body is optional.  You can have a program line
          consisting of just a label or a GOTO field.


          4.2.1 The Label Field

            SNOBOL4 normally executes the statements of a program in
          sequence.  The ability to transfer control from one statement to
          another, perhaps conditionally, makes SNOBOL4 much more usable.

            Labels provide names for statements.  If present, they must
          begin in the first character position of a statement, and must
          start with a letter or number.  Additional characters may be any-
          thing but blank or tab.  Like variable names, lower-case letters
          are equivalent to upper-case when case-folding (the default).


          4.2.1 The GOTO Field

            Transfer of control is made possible by the GOTO.  It inter-
          rupts the normal sequential execution of statements by telling
          SNOBOL4 which statement to execute after the present one.  The
          GOTO field appears at the end of the statement, preceded by a
          colon (:), and has one of these forms:

                                                :(label)
                                                :S(label)
                                                :F(label)
                                                :S(label1) F(label2)

            White space is required before the colon.  "Label" is the name
          given the target statement, and must be enclosed in parentheses.
          If the first form is used, execution resumes at the referenced
          statement, unconditionally.  In the second and third forms,
          transfer occurs only if the statement has succeeded or failed,
          respectively.  Otherwise, execution proceeds to the next state-
          ment in line.  If the fourth form is used, transfer is made to
          label1 if the statement succeeded, or to label2 if it failed.  A
          statement with a label and a GOTO would look like this:

               COPY    OUTPUT = INPUT           :F(DONE)




       Tutorial                        - 18 -      Control Flow and Functions





            Now let's write a short program which copies keyboard input to
          the screen, and reports the total number of lines.  If you are an
          accurate typist, you can type it into SNOBOL4 directly.  Other-
          wise, you should use your text editor to create a file containing
          the program text.  First stop the CODE.SNO program by typing END:

               ?END
               
               B>SNOBOL4 CON
               
               Vanilla SNOBOL4      Version 2.14.
               (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
               Enter program, terminate with "END"
               ?       N = 0
               ?COPY   OUTPUT = INPUT           :F(DONE)
               ?       N = N + 1                :(COPY)
               ?DONE   OUTPUT = 'THERE WERE ' N ' LINES'
               ?END
               
               No errors
               
               TYPE IN A TEST LINE
               TYPE IN A TEST LINE
               
               AND ANOTHER
               AND ANOTHER
               
               ^Z
               THERE WERE 2 LINES
               
               B>

            We start the line count in variable N at 0.  The next statement
          has a label, COPY, a statement body, and a GOTO field.  It is an
          assignment statement, and begins execution by reading a line of
          input.  If INPUT successfully obtains a line, the result is
          stored in OUTPUT.  The GOTO field is only testing for failure, so
          SNOBOL4 proceeds to the next statement, where N is incremented,
          and the unconditional GOTO transfers back to statement COPY.

            When an End-of-File is read, variable INPUT signals failure.
          Execution of this statement terminates immediately, without per-
          forming the assignment, and transfers to the statement labeled
          DONE.  The number of lines is displayed, and control flows into
          the END statement, stopping the program.


                               4.3 BUILT-IN FUNCTIONS

            A function is analogous to an operator; it operates on data to
          produce a result.  The data objects are called the arguments of
          the function.  The result returned---the function of the argu-
          ments---may have two components: the success or failure signal;
          and for success, a value.  The value may be any data type.



       Tutorial                        - 19 -      Control Flow and Functions





            A function is used by writing its name and a list of arguments
          enclosed by parentheses:

                       FUNCTION_NAME(ARG1, ARG2, ..., ARGn)

            It may appear in your program anywhere a constant is allowed---
          in expressions, patterns, even as the argument of another func-
          tion.  If the function has more than one argument, they should be
          separated by commas.  If trailing arguments are omitted, SNOBOL4
          will supply the null string instead.  Some functions, such as one
          that returns the current date, have no arguments at all.

            SNOBOL4 provides a large number of predefined functions, and
          allows you to define your own.  The large repertoire of built-in
          functions makes SNOBOL4 programming easier.  Most functions are
          concerned with pattern matching, input/output, and advanced fea-
          tures of the language.  Here we'll introduce a few simple
          conditional, numeric, and string functions to give you an idea of
          the variety.  Try them interactively with CODE.SNO.


          4.3.1 Conditional Functions

            These functions fail or succeed depending upon their arguments.
          They are sometimes called predicate functions because the success
          of an expression using them is predicated upon their success.  If
          they succeed, they return the null string as their value.

             Function         Succeeds if:

             IDENT(S,T)       S and T are identical.  S and T may be con-
                              stants or variables with any data type.  To
                              be identical, the arguments must have the
                              same data type and value.  Since omitted ar-
                              guments default to the null string, IDENT(S)
                              succeeds if S is the null string.

             DIFFER(S,T)      S and T are different.  DIFFER is the oppo-
                              site of IDENT.  DIFFER(S) succeeds if S is
                              not the null string.

             EQ(X,Y)          Integers X and Y are equal.  X and Y must be
                              integers, or strings which can be converted
                              to integers.

             NE(X,Y)          Integers X and Y are not equal.

             GE(X,Y)          Integer X is greater than or equal to Y.

             GT(X,Y)          Integer X is greater than Y.

             LE(X,Y)          Integer X is less than or equal to Y.

             LT(X,Y)          Integer X is less than Y.



       Tutorial                        - 20 -      Control Flow and Functions





             INTEGER(X)       X is an integer, or a string which can be
                              converted to an integer.

             LGT(S,T)         String S is lexically greater than string T
                              using a character-by-character comparison.

            Leading blanks may be used in front of a argument for readabil-
          ity.  Here are some exercises for CODE.SNO:

               ?       N = 3
               ?       EQ(N, 3)
               Success
               ?       IDENT(N, 3)
               Success
               ?       EQ(3, "3")
               Success
               ?IDENT(3, "3")                   (integer and string)
               Failure
               ?       EQ(N, 4)
               Failure
               ?       NE(N, 4)
               Success
               ?       INTEGER(N)
               Success
               ?       INTEGER('47')
               Success
               ?       DIFFER('ABC', 'abc')
               Success
               ?       IDENT('a' 'b' 'c', 'abc')
               Success
               ?       LGT('ABC', 'ABD')
               Failure

            When any of these functions succeed, they return a null string.
          Since other statement elements are not altered when concatenated
          with the null string, this provides an easy way to interpose
          tests and construct loops.  Suppose we execute the statement:

                       N = LT(N,10) N + 1       :S(LOOP)

            Function LT fails if N is 10 or greater.  If the statement
          fails, the assignment is not performed, and execution continues
          with the next statement.  However, if LT succeeds, its null
          string value is concatenated with the expression N + 1, and the
          result is assigned to N.  This has the effect of increasing N by
          1 and transferring to statement LOOP until N reaches 10.

            If we concatenated several conditional functions together, and
          they all succeeded, the result would still be the null string.
          If any function failed, the entire concatenation would fail.
          This gives us a simple way to produce a successful result if a
          number of conditions are all true.  For example, the expression:

                       INTEGER(N) GE(N,5) LE(N,100)



       Tutorial                        - 21 -      Control Flow and Functions





          succeeds if N is an integer between 5 and 100.


          4.3.2 Other Functions

            These functions always succeed; all but REMDR and SIZE return a
          string result.

             DATE()           Return current date and time as a string.

             DUPL(S,N)        Duplicate string S, N times.

             REMDR(X,Y)       Produce the remainder (modulus) of X / Y.

             REPLACE(S1,S2,S3)     Return string S1 after performing the
                              character replacements specified by strings
                              S2 and S3.  S2 specifies which characters to
                              replace, and S3 specifies what to replace
                              them with.

             SIZE(S)          Return the number of characters in string S.

             TRIM(S)          Return string S with trailing blanks removed.

            Exercises for CODE.SNO:

               ?       OUTPUT = 'THE DATE AND TIME ARE: ' DATE()
               THE DATE AND TIME ARE: 10-19-87 11:49:33.90
               ?       OUTPUT = DUPL('ABC', 20)
               ABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABCABC
               ?       OUTPUT = SIZE('ZIPPY')
               5
               ?       OUTPUT = SIZE('')
               0
               ?       OUTPUT = TRIM('TRAILING BLANKS  ') 'GONE'
               TRAILING BLANKSGONE
               ?       OUTPUT = REPLACE('spoon','po','PO')
               sPOOn



















       Tutorial                        - 22 -      Control Flow and Functions





                                                                  Chapter 5


                                                  INPUT/OUTPUT AND KEYWORDS
          -----------------------------------------------------------------


                                  5.1 INPUT/OUTPUT

            We've already performed simple input and output with variables
          INPUT and OUTPUT.  In this chapter, you'll learn more about
          SNOBOL4's I/O capabilities.

            SNOBOL4 can communicate with up to 16 different files at once.
          A "file" is either a disk file or a device, such as a printer.
          Every file is identified by a "unit number," which is an integer
          between 1 and 16.  You chose the numbers when you select the
          files you wish to use.  The particular numbers chosen have no
          special significance; they just distinguish one file from
          another.

            Actual input or output of data is performed by "associating" a
          variable with a unit number and a direction.  When a statement
          tries to use the variable's value, a line is read from the asso-
          ciated file.  When a value is stored in the variable, a line is
          written to the associated file.  INPUT and OUTPUT are variables
          whose association with the keyboard and screen were preset by
          SNOBOL4.  For historical reasons, they use unit numbers 5 and 6
          respectively.

            Strings are the only data type which can be transferred to and
          from files.  A successful input operation always returns a
          string.  During output, nonstring objects, such as integers, are
          automatically converted to their string form.

            The functions INPUT and OUTPUT (not to be confused with the
          variables INPUT and OUTPUT) are provided to attach a unit number
          to a variable, and optionally, to a particular file.  Their names
          are distinguished from the variables of the same names by appear-
          ing as functions, that is, with parentheses and an argument list.


          5.1.1 Associating File Names and Units

            There are two ways to tell SNOBOL4 what file will be used with
          a particular unit number:

            1. As an option on the SNOBOL4 command line, like this:

               B>SNOBOL4 PROGRAM /2=ADDRESS.TXT /8=RESULT.DAT

               Here, unit number 2 is associated with the file named
               'ADDRESS.TXT', and unit number 8 with file 'RESULT.DAT'.  It
               will still be necessary to use the INPUT or OUTPUT function



       Tutorial                        - 23 -       Input/Output and Keywords





               described below to associate variables with these unit num-
               bers.  This method works best when different files will be
               used each time the program is run.

            2. Use a string containing the file name as the fourth argument
               to the INPUT or OUTPUT function, as in:

                       INPUT(..., 2, ..., 'ADDRESS.TXT')

               This method is better when the file name will not change, or
               is a string derived from a dialogue with the user, or is
               produced from a string calculation.

               A file name consisting of a single hyphen ("-") is reserved,
               and specifies the MS-DOS standard input file when used with
               the INPUT function, or the standard output file when used
               with the OUTPUT function.  These standard input or output
               files may be redirected on the command line using the MS-DOS
               redirection operators ("<filename" and ">filename").


          5.1.2 Input

            This function associates a variable with data read from a file:

                       INPUT('variable', unit, length, 'file')

            It succeeds and returns the null string if the file was found
          and successfully opened, and fails otherwise.  Length is an
          optional integer that specifies the line length.  If the file
          name argument is omitted, SNOBOL4 consults the command line to
          find the file to use with this unit.

            For example, to open file TEXT.IN for input as unit 1, and as-
          sociate variable READLINE with it, we would say

                       INPUT('READLINE', 1, , 'TEXT.IN')    :S(OK)
                       OUTPUT = 'Could not find file'       :(END)
               OK       . . .

            If the file name were specified on the command line as
          /1=TEXT.IN, we only need the first two arguments to INPUT:

                       INPUT('READLINE', 1)            :S(OK)
                       OUTPUT = 'Could not find file'  :(END)
               OK       . . .

            To read a line from the file, we simply use READLINE in a
          statement.  The statement fails when the End-of-File is read:

                       LINE = READLINE                 :F(END.OF.FILE)






       Tutorial                        - 24 -       Input/Output and Keywords





            Each file-associated variable will have a line length associ-
          ated with it (80 characters unless SNOBOL4 is told otherwise in
          the length argument).  Normally, reading stops at each end-of-
          line character (carriage return).  If more than the line length
          has been read, the extra characters are discarded.  If a short
          line is encountered, SNOBOL4 pads the line with blanks to produce
          the full line length.  The end-of-line character is not included
          in the string returned.

            Blank padding is another historic feature from the days when
          most input was on punch cards.  The next section, "Keywords,"
          will show you how to disable it.  You can also use the TRIM func-
          tion to remove superfluous trailing blanks.  The previous state-
          ment then becomes:

                       LINE = TRIM(READLINE)      :F(END.OF.FILE)

            When READLINE encounters the End-of-File, its failure signal is
          propagated outward, causing function TRIM to fail.  This failure
          is detected in the GOTO field in the usual manner.


          5.1.3 Output

            This function associates a variable with data written to a
          file.  If the file does not exist, it is created.  If it already
          exists, its previous contents are discarded.

                       OUTPUT('variable', unit, length, 'file')

            The function succeeds and returns the null string if the file
          was successfully opened for output, and fails otherwise.

            We write data to the file by assigning it to the associated
          variable.  In this example, we will use a variable called PRINT,
          and the DOS device PRN: with a line length of 132 characters:

                       OUTPUT('PRINT', 2, 132, 'PRN:')   :S(PRTOK)
                       OUTPUT = 'Could not attach printer'  :(END)
               PRTOK   PRINT  = 'Text Listing - ' DATE()
                        . . .

            If the string assigned to an output variable is longer than the
          line length, SNOBOL4 will output as many lines as necessary of
          the standard line length to accommodate the string.  SNOBOL4 sup-
          plies the carriage return and line feed characters at the end of
          each line.

            Once again, the output file name could be given on the command
          line (/2=PRN:).  The function call would then look like this:

                       OUTPUT('PRINT', 2, 132)           :S(PRTOK)
                        . . .




       Tutorial                        - 25 -       Input/Output and Keywords





          5.1.4 Changing I/O Defaults

            Having INPUT and OUTPUT associated with the keyboard and screen
          may be altered in the SNOBOL4 command line.  A surprising number
          of programs can be written this way, using only the variables
          INPUT and OUTPUT for I/O.  The command line phrase /I=FILENAME,
          associates INPUT with the named file, and /O=FILENAME does the
          same for OUTPUT.  SNOBOL4 makes all the associations for you; no
          call to the INPUT or OUTPUT function is required.

            SNOBOL4 also provides the pre-associated variable SCREEN.
          Using SCREEN allows your program to post messages to the display
          even if OUTPUT has been redirected elsewhere.

            If we have a program written in terms of variables INPUT and
          OUTPUT, it can be run without alteration with different data
          files.  For example, the following program will copy INPUT to
          OUTPUT, and place the line length and a blank in front of each
          line:

               LOOP    S = TRIM(INPUT)                 :F(END)
                       OUTPUT = SIZE(S) ' ' S          :(LOOP)
               END

            Suppose we associate file TEXT.IN with INPUT, and TEXT.OUT with
          OUTPUT.  We've supplied the morning song from Shakespeare's
          Cymbeline in file TEXT.IN, and the program above in file
          LENGTH.SNO.  You can run it like this:

               B>SNOBOL4 LENGTH /I=TEXT.IN /O=TEXT.OUT
               
               Vanilla SNOBOL4      Version 2.14.
               (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.
               
               No errors
               
               B>TYPE TEXT.OUT
               44 Hark! hark! the lark at heaven's gate sings,
                . . .

            SNOBOL4 will supply the default file name extensions .IN and
          .OUT for the /I and /O options, so the command line could be
          shortened to:

               B>SNOBOL4 LENGTH /I=TEXT /O=TEXT












       Tutorial                        - 26 -       Input/Output and Keywords





                                    5.2 KEYWORDS

            Input/Output allows your program to communicate with the out-
          side world.  Your program may also communicate with the SNOBOL4
          system itself.  Keywords allow you to modify SNOBOL4's behavior,
          and to obtain information from the system.  A keyword consists of
          the ampersand character (&) followed by an alphabetic name.  They
          are used in a statement in the same way as a variable.  They
          either provide values or have values assigned to them.  Numeric
          keywords are restricted to integer values.

          -----------------------------------------------------------------

          &TRIM                            Remove trailing blanks

            Normally, short lines read from a file are padded with blank
          characters to the standard line length.  In LENGTH.SNO, we used
          the function TRIM(INPUT) to remove those blanks.  A simpler
          method assigns an integer value to keyword &TRIM to control
          padding.  If &TRIM is set to a nonzero value, blanks are not
          appended, and any trailing blanks are removed.  A statement to do
          this looks like this:

                       &TRIM = 1

            Since trailing blanks are usually not desired, you'll often see
          this statement at the beginning of many SNOBOL4 programs.

          -----------------------------------------------------------------

          &MAXLNGTH                        Maximum string length

            This keyword controls the maximum permissible string length.
          Its initial value is 5000, but it may be set to any positive
          integer from 0 to 32767.  Setting it to 0 is going to severly re-
          strict what you can do, since only the null string will be avail-
          able to you!

          -----------------------------------------------------------------

          &DUMP                            Termination dump of variables

            This keyword is useful for debugging programs because it tells
          SNOBOL4 to display the values of your variables when your program
          terminates.  Setting &DUMP to a positive, nonzero integer causes
          the variable names to be sorted alphabetically.  A negative inte-
          ger produces a unsorted dump.  Zero is the default value, inhibi-
          ting the dump.  Only variables with nonnull values are displayed.









       Tutorial                        - 27 -       Input/Output and Keywords





          -----------------------------------------------------------------

          &ALPHABET                        Complete character set

            This keyword contains a 256 character string, the computer's
          entire character set in ascending sequence.  It is called a pro-
          tected keyword because it cannot be modified by your program.

          -----------------------------------------------------------------

          &LCASE                           Lower case letters

            This keyword contains the 26 lower case alphabetic characters,
          "abcdefghijklmnopqrstuvwxyz".

          -----------------------------------------------------------------

          &UCASE                           Upper case letters

            This keyword contains the 26 upper case alphabetic characters,
          "ABCDEFGHIJKLMNOPQRSTUVWXYZ".


                        5.3 PROGRAMS WITHOUT PATTERN MATCHING

            You now have the ingredients to create some simple programs.
          However, if this were all of the SNOBOL4 language, there would be
          very little reason to use it.  We'll get to pattern matching
          shortly, where you'll find many new, challenging concepts.
          First, however, you should be comfortable with the preceding
          material.

            Take a few minutes to examine and run the following programs.


          5.3.1 File Counts - FCOUNTS.SNO

            This program counts the number of characters and lines in a
          file.  Because real numbers are not available in Vanilla SNOBOL4,
          you should only use this program with input files smaller than
          32,767 characters.

                       &TRIM  = 1
                       CHARS  = 0
               NEXTL   CHARS  = CHARS + SIZE(INPUT)    :F(DONE)
                       LINES  = LINES + 1              :(NEXTL)
               DONE    OUTPUT = CHARS ' characters'
                       OUTPUT = +LINES ' lines read'
               END








       Tutorial                        - 28 -       Input/Output and Keywords





            In such a small program, it's permissible to rely upon the fact
          that the system initializes LINES to the null string.  The first
          use of the statement:

                       LINES  = LINES + 1              :(NEXTL)

          converts LINES from the null string to an integer value.  We used
          the expression +LINES in the last statement to produce an integer
          0 (instead of the null string), if the input file were empty.  To
          count the characters and lines in a file, use the /I= option, as
          in:

               B>SNOBOL4 FCOUNTS.SNO /I=TEXT.IN


          5.3.2 Formatting Text - TRIPLET.SNO

            This program reformats a file by centering the lines and ar-
          ranging them in groups of three.  Note that statements containing
          an asterisk in column one are considered comments by SNOBOL4.

               * Trim input, count input lines:
                       &TRIM = 1
                       N = 0
               
               * Read next input line, all done if End-of-File.
               LOOP    S = INPUT                       :F(END)
               
               * Precede with blanks to center within 80 character line:
                       OUTPUT = DUPL(' ', (80 - SIZE(S)) / 2) S
               
               * Increment count, but reset to 0 every third line.
               * Also, output a blank line when count resets:
                       N = REMDR(N + 1, 3)
                       OUTPUT = EQ(N, 0)               :(LOOP)
               END

            This program uses the DUPL function to produce the leading
          blanks required to center a line.  A simple calculation based on
          each line's width determines the number of blanks needed.

            The last two statements break the file lines into triplets.
          The REMDR function returns the integer remainder (modulus) when
          the first argument is divided by the second.  In this case,
          assigning the result to variable N causes N to continually cycle
          through the values 0, 1, 2, 0, 1, ....  When N is 0, the last
          statement assigns the null string to OUTPUT, producing a blank
          line.  If N is 1 or 2, EQ fails, and the assignment fails.

            Try running the program with the sample text file:

               B>SNOBOL4 TRIPLET /I=TEXT





       Tutorial                        - 29 -       Input/Output and Keywords





                                                                  Chapter 6


                                                           PATTERN MATCHING
          -----------------------------------------------------------------


                                  6.1 INTRODUCTION

            Pattern matching examines a "subject" string for some combina-
          tion of characters, called a "pattern."  The matching process may
          be very simple, or extremely complex.  For example:

            1. The subject contains several color names.  The pattern is
               the string "BLUE".  Does the subject string contain the word
               "BLUE"?

            2. The subject contains a nucleic acid (DNA) sequence.  The
               pattern searches for a subsequence that is replicated in two
               other places in the string.

            3. The subject contains a paragraph of English text.  The
               pattern describes the spacing rules to be applied after
               punctuation.  Does the subject string conform to the
               punctuation rules?

            4. The subject string represents the current board position in
               a game of Tick-Tack-Toe.  The pattern examines this string
               and determines the next move.

            5. The subject contains a program statement from a prototype
               computer language.  The pattern contains the grammar of that
               language.  Is the statement properly formed according to the
               grammar?

            Most programming languages provide rudimentary facilities to
          examine a string for a specific character sequence.  SNOBOL4 pat-
          terns are far more powerful, because they can specify complex
          (and convoluted) interrelationships.  The colors of a painting,
          the words of a sentence, the notes of a musical score have lim-
          ited significance in isolation.  It is their "relationship" with
          one another which provides meaning to the whole.  Likewise,
          SNOBOL4 patterns can specify "context;" they may be qualified by
          what precedes or follows them, or by their position in the
          subject.


          6.1.1 Knowns and Unknowns

            Patterns are composed of "known" and "unknown" components.

            "Knowns" are specific character strings, such as the string
          "BLUE" in the first example above.  We are looking for a yes/no
          answer to the question: "Does this known item appear in the sub-



       Tutorial                        - 30 -                Pattern Matching





          ject string?"

            "Unknowns" specify the "kind" of subject characters we are
          looking for; the specific characters are not identifiable in
          advance.  We might want to match only characters from a
          restricted alphabet, or any substring of a certain length, or
          some arbitrary number of repetitions of a string.  If the pattern
          matches, we can then "capture" the particular subject substring
          matched.


                           6.2 SPECIFYING PATTERN MATCHING

            A pattern match requires a subject string and a pattern.  The
          subject is the first statement element after the label field (if
          any).  The pattern appears next, separated from the subject by
          white space (blank or tab).  If SUBJECT is the subject string,
          and PATTERN is the pattern, it looks like this:

               label   SUBJECT PATTERN

            The pattern match "succeeds" if the pattern is found in the
          subject string; otherwise it fails.  This success or failure may
          be tested in the GOTO field:

               label   SUBJECT PATTERN            :S(label1) F(label2)

            A real point of confusion is the distinction between pattern
          matching and concatenation.  How do you tell the difference?
          Where does the subject end and the pattern begin?  In this case,
          parentheses should be placed around the subject, since SNOBOL4
          always uses the first "complete" statement element as the
          subject.  In the statement

                       X Y Z

          X is the subject, and Y concatenated with Z is the pattern.
          Whereas

                       (X Y) Z

          indicates the subject is string X concatenated with string Y, and
          the pattern is Z.


                                 6.3 SUBJECT STRING

            The subject string may be a literal string, a variable, or an
          expression.  If it is not a string, its string equivalent will be
          produced before pattern matching begins.  For example, if the
          subject is the integer 48, integer to string conversion produces
          the character string "48".  Remember, if the subject includes
          concatenated elements, they should be enclosed in parentheses.




       Tutorial                        - 31 -                Pattern Matching





                       6.4 PATTERN SUBSEQUENTS AND ALTERNATES

            Arithmetic expressions are composed of elements and simpler
          subexpressions.  Similarly, patterns are composed of simpler sub-
          patterns which are joined together as "subsequents" and "alter-
          nates."  If P1 and P2 are two subpatterns, the expression

                       P1 P2

          is also a pattern.  The subject must contain whatever P1 matches,
          immediately followed by whatever P2 matches.  P2 is the "subse-
          quent" of P1.  The white space (blank or tab) between P1 and P2
          is the same binary concatenation operator previously used to join
          strings; its use with patterns is completely analogous.  The pre-
          ceding pattern matches pattern P1 "followed by pattern" P2.

            The binary "alternation" operator is the vertical bar (|).  As
          it is a binary operator, it must have white space on each side.
          The pattern

                       P1 | P2

          matches whatever P1 matches, "or" whatever P2 matches.  SNOBOL4
          tries the various alternatives from left to right.

            Normally, concatenation is performed before alternation, so the
          pattern

                       P1 | P2 P3

          matches P1 alone, or P2 "followed by" P3.  Parentheses can be
          used to alter the grouping of subpatterns.  For example:

                       (P1 | P2) P3

          matches P1 "or" P2, followed by P3.

            When a pattern successfully matches a portion of the subject,
          the matching subject characters are "bound" to it.  The next pat-
          tern in the statement must match beginning with the very next
          subject character.  If a subsequent fails to match, SNOBOL4 back-
          tracks, unbinding patterns until another alternative can be
          tried.  A pattern match fails when SNOBOL4 cannot find an alter-
          native that matches.

            The null string may appear in a pattern.  It always matches,
          but does not bind any subject characters.  We can think of it as
          matching the invisible space "between" two subject characters.
          One possible use is as the last of a series of alternatives.  For
          example, the pattern

                       ROOT ('S' | 'ES' | '')

          matches the pattern in ROOT, with an optional suffix of 'S' or



       Tutorial                        - 32 -                Pattern Matching





          'ES'.  If ROOT matches, but is not followed by 'S' or 'ES', the
          null string matches and successfully completes the clause.  Its
          presence gives the pattern match a successful escape.

            The conditional functions of the previous chapter may appear in
          patterns.  If they fail when evaluated, the current alternative
          fails.  If they succeed, they match the null string, and so do
          not consume any subject characters.  They behave like a gate,
          allowing the match to proceed beyond them only if they are true.
          This pattern will match 'FOX' if N is 1, or 'WOLF' if N is 2:

                       EQ(N,1) 'FOX' | EQ(N,2) 'WOLF'

            Parentheses may be used to factor a pattern.  The strings
          'COMPATIBLE', 'COMPREHENSIBLE', and 'COMPRESSIBLE' are matched by
          the pattern:

                       'COMP' ('AT' | 'RE' ('HEN' | 'S') 'S') 'IBLE'


                             6.5 SIMPLE PATTERN MATCHES

            Here are examples of pattern matches using a string literal or
          variable for the subject.  The patterns consist entirely of known
          elements.  Use the CODE.SNO program to experiment with them:

               ?       'BLUEBIRD' 'BIRD'
               Success
               ?       'BLUEBIRD' 'bird'
               Failure
               ?       B = 'THE BLUEBIRD'
               ?       B 'FISH'
               Failure
               ?       B 'FISH' | 'BIRD'
               Success
               ?       B ('GOLD' | 'BLUE') ('FISH' | 'BIRD')
               Success

            The first statement shows that the matching substring ('BIRD')
          need not begin at the start of the subject string.  This is
          called "unanchored" matching.  The second statement fails because
          strings are case sensitive, unlike names and labels.  The third
          statement creates a variable to be used as the subject.  The
          fifth statement employs an alternate: we are matching for 'FISH'
          or 'BIRD'.

            The last statement uses subsequents and alternates.  We are
          looking for a substring in B that contains 'GOLD' or 'BLUE', fol-
          lowed by 'FISH' or 'BIRD'.  It will match 'GOLDFISH', 'GOLDBIRD',
          'BLUEFISH' or 'BLUEBIRD'.  If the parentheses were omitted, con-
          catenation of 'BLUE' and 'FISH' would be performed before alter-
          nation, and the pattern would match 'GOLD', 'BLUEFISH', or
          'BIRD'.




       Tutorial                        - 33 -                Pattern Matching





                              6.6 THE PATTERN DATA TYPE

            If we execute the statement

               ?       COLOR = 'BLUE'

          the variable COLOR contains the string 'BLUE', and could appear
          in the pattern portion of a statement:

               ?       B COLOR
               Success

            Even though it is used as a pattern, COLOR has the "string"
          data type.  However, complicated patterns may be stored in a
          variable just like a string or numeric value.  The statement

               ?       COLOR = 'GOLD' | 'BLUE'

          will create a "structure" describing the pattern, and store it in
          the variable COLOR.  COLOR now has the "pattern" data type.  The
          preceding example can now be written as:

               ?       CRITTER = 'FISH' | 'BIRD'
               ?       BOTH = COLOR CRITTER
               ?       B BOTH
               Success


                             6.7 CAPTURING MATCH RESULTS

            If the pattern match

                       B BOTH

          succeeds, we may want to know which of the many pattern alterna-
          tives were used in the match.  The binary operator "conditional
          assignment" assigns the matching subject substring to a variable.
          The operator is called conditional, because assignment occurs
          ONLY if the entire pattern match is successful.  Its graphic
          symbol is a period (.).  It assigns the matching substring on its
          left to the variable on its right.  Note that the direction of
          assignment is just the opposite of the statement assignment oper-
          ator (=).  Continuing with the previous example, we'll redefine
          COLOR and CRITTER to use conditional assignment:

               ?       COLOR = ('GOLD' | 'BLUE') . SHADE
               ?       CRITTER = ('FISH' | 'BIRD') . ANIMAL
               ?       BOTH = COLOR CRITTER
               ?       B BOTH
               Success
               ?       OUTPUT = SHADE
               BLUE
               ?       OUTPUT = ANIMAL
               BIRD



       Tutorial                        - 34 -                Pattern Matching





            The substrings that match the subpatterns COLOR and CRITTER are
          assigned to SHADE and ANIMAL respectively.  The statement

                       BOTH = COLOR CRITTER

          had to be reexecuted because its previous execution captured the
          old values of COLOR and CRITTER, without the conditional assign-
          ment operators.  The redefinition of COLOR and CRITTER was not
          reflected in BOTH until the statement was reexecuted.

            Conditional assignment may appear at any level of pattern nest-
          ing, and may include other conditional assignments within its
          embrace.  The pattern

               (('B' | 'F' | 'N') . FIRST 'EA' ('R' | 'T') . LAST) . WORD

          matches 'BEAR', 'FEAR', 'NEAR', 'BEAT', 'FEAT', or 'NEAT',
          assigning the first letter matched to FIRST, the last letter to
          LAST, and the entire result to WORD.

            The variable OUTPUT may be used as the target of conditional
          assignment.  Try:

               ?       'B2' ('A' | 'B') . OUTPUT (1 | 2 | 3) . OUTPUT
               B
               2
               Success


                                    6.8 UNKNOWNS

            All of the previous examples used patterns created from literal
          strings.  We may also want to specify the "qualities" of a match
          component, rather than its specific characters.  Using unknowns
          greatly increases the power of pattern matching.  There are two
          types, primitive patterns and pattern functions.


          6.8.1 Primitive Patterns

            There are seven primitive patterns built into the SNOBOL4
          system.  The two used most frequently will be discussed here.
          Chapter 9, "Advanced Topics," introduces the remaining five.

          -----------------------------------------------------------------

          REM                              Match remainder of subject

            REM is short for the REMainder pattern.  It will match zero or
          more characters at the end of the subject string.  Try the
          following:






       Tutorial                        - 35 -                Pattern Matching





               ?       'THE WINTER WINDS' 'WIN' REM . OUTPUT
               TER WINDS
               Success

            The subpattern 'WIN' matched its first occurrence in the sub-
          ject, at the beginning of the word 'WINTER'.  REM matched from
          there to the end of the subject string---the characters 'TER
          WINDS'---and assigned them to the variable OUTPUT.  If we change
          the pattern slightly, to:

               ?       'THE WINTER WINDS' 'WINDS' REM . OUTPUT
               
               Success

          then 'WINDS' matches at the end of the subject string, leaving a
          null remainder for REM.  REM matches this null string, assigns it
          to OUTPUT, and a blank line is displayed.

            The pattern components to the left of REM must successfully
          match some portion of the subject string.  REM begins where they
          left off, matching all subject characters through the end of
          string.  There are no restrictions on the particular characters
          matched.

          -----------------------------------------------------------------

          ARB                              Match arbitrary characters

            ARB matches an ARBitrary number of characters from the subject
          string.  It matches the shortest possible substring, including
          the null string.  The pattern components on either side of ARB
          determine what is matched.  Try the statements

               ?       'MOUNTAIN' 'O' ARB . OUTPUT 'A'
               UNT
               Success
               ?       'MOUNTAIN' 'O' ARB . OUTPUT 'U'
               
               Success

            In the first statement, the ARB pattern is constrained on
          either side by the known patterns 'O' and 'A'.  ARB expands to
          match the subject characters between, 'UNT'.  In the second
          statement, there is nothing between 'O' and 'U', so ARB matches
          the null string.  ARB behaves like a spring, expanding as needed
          to fill the gap defined by neighboring patterns.


          6.8.2 Cursor Position

            During a pattern match, the "cursor" is SNOBOL4's pointer into
          the subject string.  It is integer valued, and points "between"
          two subject characters.  The cursor is set to zero when a pattern
          match begins, corresponding to a position immediately to the left



       Tutorial                        - 36 -                Pattern Matching





          of the first subject character.  As the pattern match proceeds,
          the cursor moves right and left across the subject to indicate
          where SNOBOL4 is attempting a match.  The value of the cursor
          will be used by some of the pattern functions that follow.

            The "cursor position" operator assigns the current cursor value
          to a variable.  It is a unary operator whose graphic symbol is
          the "at sign" (@).  It appears within a pattern, preceding the
          name of a variable.  By using OUTPUT as the variable, we can
          display the cursor position on the screen.  For instance:

               ?       'VALLEY' 'A' @OUTPUT ARB 'E' @OUTPUT
               2
               5
               Success
               ?       'DOUBT' @OUTPUT 'B'
               0
               1
               2
               3
               Success
               ?       'FIX' @OUTPUT 'B'
               0
               1
               2
               Failure

            Cursor assignment is performed whenever the pattern match
          encounters the operator, including retries.  It occurs even if
          the pattern ultimately fails.  The element @OUTPUT behaves like
          the null string---it doesn't consume subject characters or inter-
          fere with the match in any way.


          6.8.3 Integer Pattern Functions

            These functions return a pattern based on their integer argu-
          ment.  The pattern produced can be used directly in a pattern
          match statement, or stored in a variable for later retrieval.

          -----------------------------------------------------------------

          LEN(integer)                     Match fixed-length string

            LEN(I) produces a pattern which matches a string exactly I
          characters long.  I must be an integer greater than or equal to
          zero.  Any characters may appear in the matched string.  For
          example, LEN(5) matches any 5-character string, and LEN(0)
          matches the null string.  LEN may be constrained to certain por-
          tions of the subject by other adjacent patterns:







       Tutorial                        - 37 -                Pattern Matching





               ?       S = 'ABCDA'
               ?       S LEN(3) . OUTPUT
               ABC
               ?       S LEN(2) . OUTPUT 'A'
               CD

            The first pattern match had only one constraint---the subject
          had to be at least three characters long---so LEN(3) matched its
          first three characters.  The second case imposes the additional
          restriction that LEN(2)'s match be followed immediately by the
          letter 'A'.  This disqualifies the intermediate match attempts
          'AB' and 'BC'.

            Using keyword &ALPHABET as the subject provides a simple way to
          convert a decimal character code between 0 and 255 to its one
          character equivalent.  For example, by consulting an ASCII char-
          acter code chart we find that the BEL character is decimal 7.  We
          can load that character into variable BEEP with one statement:

               ?       &ALPHABET LEN(7) LEN(1) . BEEP

          and produce five beeps on the speaker with:

               ?       OUTPUT = DUPL(BEEP,5)

            &ALPHABET contains all 256 members of the computer's character
          set, in ascending order.  LEN(7) matches the first seven charac-
          ters (codes 0 - 6), leaving BEL as the next match position for
          LEN(1).  This operation is analogous to the CHR$ function in
          BASIC.

            The inverse operation, obtaining the numerical value of a char-
          acter code, is also possible.  If variable CHAR contains a one
          character string, variable N will be set to its decimal equiva-
          lent with the second statement below:

               ?       CHAR = 'A'
               ?       &ALPHABET @N CHAR
               ?       OUTPUT = N
               65

            In Chapter 8, "Program Defined Objects," I'll demonstrate how
          you can define your own functions to encapsulate each of these
          operations.

          -----------------------------------------------------------------

          POS(integer), RPOS(integer)      Verify cursor position

            The POS(I) and RPOS(I) patterns do not match subject charac-
          ters.  Instead, they succeed only if the "current" cursor posi-
          tion is a specified value.  They often are used to tie points of
          the pattern to specific character positions in the subject.




       Tutorial                        - 38 -                Pattern Matching





            POS(I) counts from the left end of the subject string, succeed-
          ing if the current cursor position is equal to I.  RPOS(I) is
          similar, but counts from the right end of the subject.  If the
          subject length is N characters, RPOS(I) requires the cursor be
          (N - I).  If the cursor is not the correct value, these functions
          fail, and SNOBOL4 tries other pattern alternatives, perhaps
          extending a previous substring matched by ARB, or beginning the
          match further along in the subject.

            Continuing with CODE.SNO:

               ?       S = 'ABCDA'
               ?       S POS(0) 'B'
               Failure
               ?       S LEN(3) . OUTPUT RPOS(0)
               CDA
               ?       S POS(3) LEN(1) . OUTPUT
               D
               ?       S POS(0) 'ABCD' RPOS(0)
               Failure

            The first example requires a 'B' at cursor position 0, and
          fails for this subject.  POS(0) "anchors" the match, forcing it
          to begin with the first subject character.  Similarly, RPOS(0)
          anchors the end of the pattern to the tail of the subject.  The
          next example matches at a specific mid-string character position,
          POS(3).  Finally, enclosing a pattern between POS(0) and RPOS(0)
          forces the match to use the ENTIRE subject string.

            At first glance these functions appear to be "setting" the
          cursor to a specified value.  Actually, they never alter the
          cursor, but instead wait for the cursor to "come to them" as
          various match alternatives are attempted.  This, in turn, allows
          other patterns in the statement to be processed in an orderly
          fashion.  You can demonstrate this "waiting for the cursor"
          behavior like this:

               ?       S @OUTPUT POS(3)
               0
               1
               2
               3
               Success

          -----------------------------------------------------------------

          TAB(integer), RTAB(integer)      Match to fixed position

            These patterns are hybrids of ARB, POS(), and RPOS().  They use
          specific cursor positions, like POS and RPOS, but bind (match)
          subject characters, like ARB.  TAB(I) matches any characters from
          the current cursor position up to the specified position I.
          RTAB(I) does the same, except, as in RPOS(), the target position
          is measured from the end of the subject.



       Tutorial                        - 39 -                Pattern Matching





            TAB and RTAB will match the null string, but will fail if the
          current cursor is to the right of the target.  They also fail if
          the target position is past the end of the subject string.

            These patterns are useful when working with tabular data.  For
          example, if a data file contains name, street address, city and
          state in columns 1, 30, 60, and 75, this pattern will break out
          those elements from a line:

            P = TAB(29) . NAME TAB(59) . STREET TAB(74) . CITY REM . STATE

            The pattern RTAB(0) is equivalent to primitive pattern REM.
          One potential source of confusion is just what it is that RTAB
          matches.  It counts from the right end of the subject, but
          matches to the left of its target cursor.  Try:

               ?       'ABCDE' TAB(2) . OUTPUT RTAB(1) . OUTPUT
               AB
               CD
               Success

            TAB(2) matches 'AB', leaving the cursor at 2, between 'B' and
          'C'.  The subject is 5 characters long, so RTAB(1) specifies a
          target cursor of 5 - 1, or 4, which is between the 'D' and 'E'.
          RTAB matches everything from the current cursor, 2, to the
          target, 4.


          6.8.4 Character Pattern Functions

            These functions produce a pattern based on a string-valued
          argument.  Once again, the pattern may be used directly or stored
          in a variable.

          -----------------------------------------------------------------

          ANY(string), NOTANY(string)      Match one character

            Each function produces a pattern which matches one character
          based upon the subject string.  ANY(S) matches the next subject
          character IF it appears in the string S, and fails otherwise.
          NOTANY(S) matches a subject character only if it does NOT appear
          in S.  Here are some sample uses of each:

               ?       VOWEL = ANY('AEIOU')
               ?       DVOWEL = VOWEL VOWEL
               ?       NOTVOWEL = NOTANY('AEIOU')
               ?       'VACUUM' VOWEL . OUTPUT
               A
               ?       'VACUUM' DVOWEL . OUTPUT
               UU
               ?       'VACUUM' (VOWEL NOTVOWEL) . OUTPUT
               AC




       Tutorial                        - 40 -                Pattern Matching





            The argument string specifies a set of characters to be used in
          creating the ANY or NOTANY pattern.  It may contain duplicate
          characters, and the order of characters in S is immaterial.

          -----------------------------------------------------------------

          SPAN(string), BREAK(string)      Match a run of characters

            These are multicharacter versions of ANY and NOTANY.  Each
          requires a nonnull argument to specify a set of characters.

            SPAN(S) matches one or more subject characters from the set in
          S.  SPAN must match at least one subject character, and will
          match the LONGEST subject string possible.

            BREAK(S) matches "up to but not including" any character in S.
          The string matched must always be followed in the subject by a
          character in S.  Unlike SPAN and NOTANY, BREAK will match the
          null string.

            These two functions are called "stream" functions because each
          streams by a series of subject characters.  SPAN is most useful
          for matching a group of characters with a common trait.  For
          example, we can say an English word is composed of one or more
          alphabetic characters, apostrophe, and hyphen.  The statements

               ?       LETTERS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ'-"
               ?       WORD = SPAN(LETTERS)

          produce a suitable pattern in WORD.  To match the material
          between words (white space, punctuation, etc.), use the pattern:

               ?       GAP = BREAK(LETTERS)

            SPAN and BREAK are two of the most useful SNOBOL4 functions.
          Try some examples using CODE.SNO:





















       Tutorial                        - 41 -                Pattern Matching





               ?       'SAMPLE LINE' WORD . OUTPUT
               SAMPLE
               ?       'PLUS TEN DEGREES' ' ' WORD . OUTPUT
               TEN
               ?       GAPO = GAP . OUTPUT
               ?       WORDO = WORD . OUTPUT
               ?       ': ONE, TWO, THREE' GAPO WORDO GAPO WORDO
               :
               ONE
               ,
               TWO
                       DIGITS = '0123456789'
               ?       INTEGER = (ANY('+-') | '') SPAN(DIGITS)
               ?       'SET -43 VOLTS' INTEGER . OUTPUT
               -43
               ?       REAL = INTEGER '.' (SPAN(DIGITS) | '')
               ?       'SET -43.625 VOLTS' REAL . OUTPUT
               -43.625
               ?       S = '0ZERO,1ONE,2TWO,3THREE,4FOUR,5FIVE,'
               ?       S 4 BREAK(',') . OUTPUT
               FOUR

            If you require a version of SPAN which WILL match the null
          string, or a BREAK which will NOT match the null string, you can
          use the following constructions:

                       (SPAN(S) | '')
                       (NOTANY(S) BREAK(S))

            We need to introduce one more fundamental concept---replace-
          ment---before we can write some meaningful programs.


                        6.9 PATTERN MATCHING WITH REPLACEMENT

            Pattern matching identifies a subject substring with a particu-
          lar trait, specified by the pattern.  We used conditional assign-
          ment to copy that substring to a variable.  Replacement moves in
          the other direction, letting you alter the substring in the sub-
          ject.  The space occupied by the matching substring may be en-
          larged or contracted (or removed entirely), leaving adjacent sub-
          ject characters undisturbed.  If the pattern matched the entire
          subject, replacement behaves like a simple assignment statement.

            Replacement appears in a form similar to assignment:

                       SUBJECT PATTERN = REPLACEMENT

            First, the pattern match is attempted on the subject.  If it
          fails, execution of the statement ends immediately, and replace-
          ment does not occur.  If the match succeeds, any conditional
          assignments within the pattern are performed.  The replacement
          field is then evaluated, converted to a string, and inserted in
          the subject in place of the matching substring.  If the replace-



       Tutorial                        - 42 -                Pattern Matching





          ment field is empty, the null string replaces the matched sub-
          string, effectively deleting it.  Try a few examples with
          CODE.SNO:

               ?       T = 'MUCH ADO ABOUT NOTHING'
               ?       T 'ADO' = 'FUSS'
               Success
               ?       OUTPUT = T
               MUCH FUSS ABOUT NOTHING
               ?       T 'NOTHING' =
               Success
               ?       OUTPUT = T
               MUCH FUSS ABOUT
               ?       'MASH' 'M' = 'B'
               Execution error #8, Variable not present where required
               Failure

            The first replacement searches for 'ADO' in the subject string,
          replacing it with 'FUSS'.  The second replacement has a null
          string replacement value, and deletes the matching substring.
          The last example demonstrates that a variable must be the subject
          of replacement.  Variables can be changed; string literals---like
          'MASH'---cannot.

            The following will replace the 'M' in 'MASH' with a 'B':

               ?       VERB = 'MASH'
               ?       VERB 'M' = 'B'
               ?       OUTPUT = VERB
               BASH

            If the matched substring appears more than once in the subject,
          only the first occurrence is changed.  The remaining substrings
          must be found with a program loop.  For example, a statement to
          eliminate all occurrences of the letter 'A' from the subject
          looks like this:

               ALOOP   SUBJECT 'A' =                   :S(ALOOP)

            Here ALOOP is the statement label, SUBJECT is some variable
          containing the subject string, 'A' is the pattern, and the
          replacement field is empty.  If an 'A' is found, it is deleted by
          replacing it with the null string, and the statement succeeds.
          The success GOTO branches back to ALOOP, and another search for
          'A' is performed.  The loop continues until no 'A's remain in the
          subject, and the pattern match fails.  Of course, the pattern and
          replacement can be as complex as desired.

            Simple loops like this can be tried in CODE.SNO by appending a
          semicolon after the GOTO field.  (Semicolon is used with GOTO in
          CODE.SNO only; you would not use it in normal programs.)  Contin-
          uing with the previous example:





       Tutorial                        - 43 -                Pattern Matching





               ?       VOWEL = ANY('AEIOU')
               ?VL     T VOWEL = '*'                   :S(VL);
               ?       OUTPUT = T
               M*CH F*SS *B**T

            Since conditional assignment is performed before replacement,
          its results are available for use in the replacement field of the
          same statement.  Here's an example of removing the first item
          from a list, and placing it on the end:

               ?       RB = 'RED,ORANGE,YELLOW,GREEN,BLUE,INDIGO,VIOLET,'
               ?       CYCLE = BREAK(',') . ITEM LEN(1) REM . REST
               ?       RB CYCLE = REST ITEM ','
               Success
               ?       OUTPUT = ITEM
               RED
               ?       OUTPUT = RB
               ORANGE,YELLOW,GREEN,BLUE,INDIGO,VIOLET,RED,

            Pattern CYCLE matches the entire subject, placing the first
          color into ITEM, bypassing the comma with LEN(1), and placing the
          remainder of the subject into REST.  REST and ITEM are then
          transposed in the replacement field, and stored back into RB.


                                6.10 SAMPLE PROGRAMS

            I've introduced a lot of concepts in this chapter; it's time to
          see how they fit together into programs.  They're supplied on the
          Vanilla SNOBOL4 diskette.


          6.10.1 Word Counting

            The first program counts the number of words in the input file.
          Lines with an asterisk in the first column are comment lines---
          their contents are ignored by SNOBOL4.




















       Tutorial                        - 44 -                Pattern Matching





               *   Simple word counting program, WORDS.SNO.
               *
               *   A word is defined to be a contiguous run of letters,
               *   digits, apostrophe and hyphen.  This definition of
               *   legal letters in a word can be altered for specialized
               *   text.
               *
               *   If the file to be counted is TEXT.IN, run this program
               *   by typing:
               *       B>SNOBOL4 WORDS /I=TEXT
               *
                       &TRIM  =  1
                       WORD   =  "'-"  '0123456789' &UCASE &LCASE
                       WPAT   =  BREAK(WORD) SPAN(WORD)
               
               NEXTL   LINE   =  INPUT                      :F(DONE)
               NEXTW   LINE WPAT =                          :F(NEXTL)
                       N      =  N + 1                      :(NEXTW)
               
               DONE    OUTPUT =  +N ' words'
               END

            After defining the acceptable characters in a word, the real
          work of the program is performed in the three lines beginning
          with label NEXTL.  A line is read from the input file, and stored
          in variable LINE.  The next statement attempts to find the next
          word with pattern WPAT.  BREAK streams by any blanks and punctua-
          tion, stopping just short of the word, which SPAN then matches.
          Both the word and any preceding punctuation are removed from LINE
          by replacement with the null string.

            When no more words remain in LINE, the failure transfer to
          NEXTL reads the next line.  If the match succeeds, N is incre-
          mented, and the program goes back to NEXTW to search for another
          word.  When the End-of-File is encountered, control transfers to
          DONE and the number of words is displayed.

            It's simple to alter pattern WPAT to search for other things.
          For instance, if we wanted to count occurrences of double vowels,
          we could use:

                       WPAT = ANY('AEIOUaeiou') ANY('AEIOUaeiou')

            To count the occurrences of integers with an optional sign
          character, use:

                       WPAT = (ANY('+-') | '') SPAN('0123456789')

            Perhaps we want to count violations of simple punctuation
          rules: period with only one blank, or comma and semicolon fol-
          lowed by more than one blank:

                       WPAT = '. ' NOTANY(' ') | ANY(',;') ' ' SPAN(' ')




       Tutorial                        - 45 -                Pattern Matching





            Notice how closely WPAT parallels the English language descrip-
          tion of the problem.


          6.10.2 Word Crossing

            This program asks for two words, and displays all intersecting
          letters between them.  For example, given the words LOOM and
          HOME, the program output is:

                       H
                      LOOM
                       M
                       E
               
                        H
                      LOOM
                        M
                        E
               
                         H
                         O
                      LOOM
                         E

            A pattern match like this would find the first intersecting
          character:

                       HORIZONTAL ANY(VERTICAL) . CHAR

            However, we want to find all intersections, so will have to
          iterate our search.  In conventional programming languages, we
          might use numerical indices to remember which combinations were
          tried.  Here, we'll use place-holding characters like '*' and '#'
          to remove solutions from future consideration.  As seems to be
          the case with SNOBOL4, there are more comments than program
          statements:




















       Tutorial                        - 46 -                Pattern Matching





               * CROSS.SNO - Print all intersections between two words
               
                       &TRIM = 1
               
               *  Get words from user
               *
               AGAIN   OUTPUT = 'ENTER HORIZONTAL WORD:'
                       H      = INPUT                       :F(END)
               
                       OUTPUT = 'ENTER VERTICAL WORD:'
                       V      = INPUT                       :F(END)
               
               *       Make copy of horizontal word to track position.
                       HC     = H
               
               *  Find next intersection in horizontal word.  Save
               *  the number of preceding horizontal characters in NH.
               *  Save the intersecting character in CROSS.
               *  Replace with '*' to remove from further consideration.
               *  Go to AGAIN to get new words if horizontal exhausted.
               *
               NEXTH   HC @NH ANY(V) . CROSS = '*'          :F(AGAIN)
               
               *  For each horizontal hit, iterate over possible
               *  vertical ones.  Make copy of vertical word to track
               *  vertical position.
               *
                       VC     = V
               
               *  Find where the intersection was in the vertical word.
               *  Save the number of preceding vertical characters in NV.
               *  Replace with '#' to prevent finding it again in that
               *  position.  When vertical exhausted, try horizontal again.
               *
               NEXTV   VC @NV CROSS = '#'                   :F(NEXTH)
               
               *  Now display this particular intersection.
               *  We make a copy of the original vertical word,
               *  and mark the intersecting position with '#'.
               *
                       OUTPUT =
                       PRINTV = V
                       PRINTV POS(NV) LEN(1) = '#'
               
               *  Peel off the vertical characters one-by-one.  Each will
               *  be displayed with NH leading blanks to get it in the
               *  correct column.  When the '#' is found, display the full
               *  horizontal word instead.
               *  When done, go to NEXTV to try another vertical position.
               *
               PRINT   PRINTV LEN(1) . C =                   :F(NEXTV)
                       OUTPUT = DIFFER(C,'#') DUPL(' ',NH) C :S(PRINT)
                       OUTPUT = H                            :(PRINT)
               END



       Tutorial                        - 47 -                Pattern Matching





                        6.11 ANCHORED AND UNANCHORED MATCHING

            Most of the examples above match substrings which do not begin
          at the first subject character.  This is the "unanchored" mode of
          pattern matching.  Alternately, we can "anchor" the pattern match
          by requiring it to include the first subject character.  Setting
          keyword &ANCHOR to a nonzero value produces anchored matching.
          Anchored matching is usually faster than unanchored, because many
          futile attempts to match are eliminated.

            Even when the desired item is not at the beginning of the sub-
          ject, it is often possible to simulate anchored matching by pre-
          fixing the pattern with a subpattern which will stream out to the
          desired object.  The stream function spans the gap from the first
          subject character to the desired item.  Use CODE.SNO to experi-
          ment with &ANCHOR:

               ?       DIGITS = '0123456789'
               ?       &ANCHOR = 1
               ?       'THE NEXT 43 DAYS' BREAK(DIGITS) SPAN(DIGITS) . N

            This will assign substring '43' to N, even in anchored mode.
          In unanchored mode, the test lines:

               ?       &ANCHOR = 0
               ?       'THE NEXT 43 DAYS' SPAN(DIGITS) . N

          would ultimately succeed, but only after SPAN failed on each of
          the characters preceding the '4'.  The efficiency difference is
          more pronounced if the subject does not contain any digits.  In
          the first formulation, BREAK(DIGITS) fails and the anchored match
          then fails immediately.  The second construction fails only after
          SPAN is tried at each subject character position.

            When your program first begins execution, SNOBOL4 sets keyword
          &ANCHOR to zero, the unanchored mode.  If you can construct all
          your patterns as anchored patterns, you should set &ANCHOR
          nonzero for anchored matching.  Setting and resetting &ANCHOR
          throughout your program is error prone and not advised.  Another
          alternative is to leave &ANCHOR set to 0, but to 'pseudo-anchor'
          patterns by using POS(0) as the first pattern element.

            It always takes less time for a pattern to succeed than to
          fail.  Failure implies an exhaustive search of all combinations,
          whereas success stops the pattern match early.  You should try to
          construct patterns with direct routes to success, such as the use
          of BREAK above.  Wherever possible, impose restrictions on the
          number of alternatives to be tried.  Combinatorial explosion is
          the price of loose pattern programming.








       Tutorial                        - 48 -                Pattern Matching





                                                                  Chapter 7


                                        ADDITIONAL OPERATORS AND DATA TYPES
          -----------------------------------------------------------------

            In this chapter we will explore some additional SNOBOL4 opera-
          tors and data types.  Many of these concepts are entirely absent
          from other programming languages.  Far from being esoteric, they
          fit quite naturally into SNOBOL4, and add to its conciseness and
          power of expression.  In the following examples, we will continue
          to use the CODE.SNO program to illustrate each new idea.


                               7.1 INDIRECT REFERENCE

            In conventional programming languages, a variable's name may be
          specified only at the time the program is written.  In fact, once
          the run-time storage has been allocated, the textual form of the
          name can be discarded.  This is not the case in SNOBOL4; you can
          create new variables during execution, and reference existing
          ones from names specified in character strings.

            The unary operator dollar sign ($) is the "indirect reference
          operator."  By applying it to a variable you instruct SNOBOL4 to
          use its contents as the name of another variable, and to continue
          on to reference that variable.  SNOBOL4 "goes through" the oper-
          and to reach the variable.  Try the following simple example:

               ?       DOG = 'BARK'
               ?       CAT = 'MEOW'
               ?       ANIMAL = 'CAT'
               ?       OUTPUT = $ANIMAL
               MEOW
               ?       ANIMAL = 'DOG'
               ?       OUTPUT = $ANIMAL
               BARK

            These statements make their indirect reference through the
          string contained in variable ANIMAL.  ANIMAL's contents are
          treated as a "pointer" to the final destination.  That is, using
          ANIMAL by itself retrieves the string 'DOG', while $ANIMAL refers
          to the variable DOG.

            New variables may also be created by using an indirect refer-
          ence as the object of an assignment.  Here, $DOG causes variable
          BARK to be created, and assigned the string 'RUFF':

               ?       $DOG = 'RUFF'
               ?       OUTPUT = BARK
               RUFF

            Indirect referencing may proceed to any depth, provided the
          null string is never encountered as a variable name:



       Tutorial                        - 49 -        Operators and Data Types





               ?       OUTPUT = $ANIMAL '-' $$ANIMAL
               BARK-RUFF
               ?       OUTPUT = $RUFF
               Execution error #4, Null string in illegal context

            In the first example, $ANIMAL produces the contents of variable
          DOG, while $$ANIMAL refers to the variable BARK.  The second ex-
          ample attempts to go through RUFF---which was not previously de-
          fined---and obtains the null string.  Of course, the null string
          is not a valid variable name.


          7.1.1 Associative Programming

            Indirect referencing provides a means of "programming by asso-
          ciation."  Suppose we want to write a program that allows the
          user to enter a state name and receive the state's capital in
          response.  We've provided a data file called CAPITAL.DAT, in
          which each line contains a state name, comma, and the capital.
          The first part of the program will read the file and set up an
          associative data base:

               *  Trim input, attach data file to variable INFILE
                       &TRIM = 1
                       INPUT('INFILE', 1, , 'CAPITAL.DAT')       :F(ERR)
               
               *  Read a line from file.  Start querying upon EOF
               READF   LINE = INFILE                             :F(QUERY)
               
               *  Break out state and capital from line
                       LINE BREAK(',') . STATE LEN(1) REM . CAPITAL :F(ERR)
               
               *  Convert state name into a variable, and assign the
               *  capital city string to it.  Then read next line.
                       $STATE = CAPITAL                          :(READF)
               
               ERR     OUTPUT = 'Illegal data file'              :(END)
               QUERY    . . .

            We attach the file, and associate variable INFILE with it.
          Successive file lines are read into variable LINE.  Pattern
          matching assigns the state name and capital city to variables
          STATE and CAPITAL respectively.  We use an indirect reference
          through $STATE to create a new variable with the state's name,
          and assign the capital city to it.  For example, the file line
          'COLORADO,DENVER' creates variable COLORADO, containing 'DENVER'.

            Having established a data base, completing the program to ac-
          cess it is trivial:

               *  Read state name, access it as a variable
               QUERY   OUTPUT = $INPUT                           :S(QUERY)
               END




       Tutorial                        - 50 -        Operators and Data Types





            An input line is read from the user, and used for an indirect
          reference.  If the user types a state name, treating it as a
          variable name obtains the state capital.  An invalid state name
          would reference a new variable, whose value is the null string,
          and a blank line would be output.  A more complete program might
          test for this null string and produce an error message.

            The addition of one statement to the program loop creating the
          data base allows us to enter either the state name or capital
          city, and obtain the other:

                       $STATE = CAPITAL
                       $CAPITAL = STATE                          :(READF)

            How would we solve this problem in a language like BASIC?
          States and capitals could be stored in an array.  We would then
          use a loop to sequentially compare the user's input string with
          the array elements.  If a match were found, the result would be
          displayed from another array element.  In SNOBOL4, we did it all
          with one statement: OUTPUT = $INPUT.  Associative programming can
          often replace a conventional linear search.


          7.1.2 Variable Names

            Earlier I said that variable names were composed of letters,
          digits, and the characters period and underscore.  These restric-
          tions apply only to variables which appear in program text.  Var-
          iable names created or referenced with the indirect reference
          operator may be composed of ANY nonnull string of characters, and
          may be as long as any other string.  If we set keyword &DUMP
          nonzero, we would see a list of states and capitals when the
          program terminated.  The variable names created by $STATE are in
          the left column, and their string contents in the right column:

               ALABAMA = 'MONTGOMERY'
               ALASKA = 'JUNEAU'
                     . . .
               NEW HAMPSHIRE = 'CONCORD'
                     . . .

            The dump reveals a variable named NEW HAMPSHIRE, which contains
          a "blank" within its name.  Clearly, you cannot directly say:

                       NEW HAMPSHIRE = 'CONCORD'

          since SNOBOL4 sees this as a pattern match statement: variable
          NEW is the subject, and variable HAMPSHIRE contains the pattern.
          To reference this variable, we must use:

                       $'NEW HAMPSHIRE' = 'CONCORD'

            Try CODE.SNO with some unconventional variable names:




       Tutorial                        - 51 -        Operators and Data Types





               ?       $'"' = 'DOUBLE QUOTE'
               ?       $'$#@!*' = 53
               ?       OUTPUT = $'$#@!*' $'"'
               53DOUBLEQUOTE


          7.1.3 Indirect GOTOs

            Indirect referencing is not restricted to the main body of a
          statement.  It may be used in the GOTO field to transfer control
          to a label specified by a variable.  Suppose variable OP held the
          one-character string '+', '-', '*', or '/'.  This GOTO would
          transfer to one of four statements, labeled L+, L-, L*, or L/:

                       statement                            :($('L' OP))
               L+      statement
               L-      statement
                        . . .

            The string in OP is appended to string 'L', and the result is
          used with an indirect reference to obtain the final label name.

            Indirect referencing in the GOTO field is a more powerful ver-
          sion of the computed GOTO which appears in some languages.  It
          allows a program to quickly perform a multiway control branch
          based on an item of data.  Of course, the computed label name
          must be defined in the program.  SNOBOL4 provides an error mes-
          sage if your program transfers to an undefined label.

            Indirect referencing may not be used in a statement's label
          field.  Dynamically changing the name of a statement during exe-
          cution is excessive even by SNOBOL4 standards.


                             7.2 UNEVALUATED EXPRESSIONS

            The pattern data type appears when a pattern structure is
          stored in a variable for subsequent use in a pattern match.  For
          example, a pattern to capture the next N characters after a
          colon, and store them in variable ITEM could be written as:

                       NPAT = ':' LEN(N) . ITEM

            Notice that a definition such as this is static.  NPAT captures
          the value of variable N at the time of pattern construction.  If
          we subsequently alter N in the program, NPAT retains N's original
          value.  One way to use the current value of N is to explicitly
          specify the pattern each time it is needed:

                       SUBJECT ':' LEN(N) . ITEM

            Now the pattern is being constructed anew whenever the state-
          ment is executed.  But reconstructing a pattern whenever it is
          used is inefficient, so a one-time definition is preferable.



       Tutorial                        - 52 -        Operators and Data Types





            The "unevaluated expression" operator allows us to obtain the
          efficiency of the NPAT formulation, yet use the current value of
          N when NPAT is referenced.  It is a unary operator, whose graphic
          symbol is the asterisk (*).  Now we would specify NPAT like this:

                       NPAT = ':' LEN(*N) . ITEM

            The pattern is only constructed once, and assigned to NPAT.
          The current value of N is ignored at this time.  Later, when NPAT
          is used in a pattern match, the unevaluated expression operator
          tells SNOBOL4 to fetch the current value of N.

            The unevaluated expression operator may be used with the argu-
          ment of the pattern functions ANY, BREAK, LEN, NOTANY, POS, RPOS,
          RTAB, SPAN, or TAB.  It may also be applied to an alternate or
          subsequent clause or to an entire pattern.  Here's an example:

               ?       PAT = TAB(*I) . OUTPUT SPAN(*S) . OUTPUT
               ?       SUB = '123AABBCC'
               ?       I = 4
               ?       S = 'AB'
               ?       SUB PAT
               123A
               ABB
               ?       I = 3
               ?       SUB PAT
               123
               AABB

            It's worth noting that I and S were undefined when PAT was
          first constructed.  Later, we will apply this technique to con-
          struct recursive patterns.


                              7.3 IMMEDIATE ASSIGNMENT

            Our examples have made extensive use of the conditional assign-
          ment operator to capture matched substrings after a successful
          pattern match.  The "immediate assignment" operator allows us to
          capture intermediate results during the pattern match.

            Immediate assignment occurs whenever a subpattern matches, even
          if the entire pattern match ultimately fails.  Immediate assign-
          ment is a binary operator whose graphic symbol is the dollar sign
          ($).  Like conditional assignment, the matching substring on its
          left is assigned to the variable on its right.  Here are examples
          with CODE.SNO where we use variable OUTPUT to reveal the work of
          the pattern matcher:









       Tutorial                        - 53 -        Operators and Data Types





               ?       S = 'ABCDEFG'
               ?       S 'A' ARB $ OUTPUT 'E'
               
               B
               BC
               BCD
               Success
               ?       S ('B' LEN(2) | 'C' LEN(3)) $ OUTPUT 'G'
               BCD
               CDEF
               Success
               ?


          7.3.1 Immediate Assignment and Unevaluated Expressions

            As useful as immediate assignment is for revealing the inner
          workings of a pattern match, a more powerful use is possible.  It
          can be used with the unevaluated expression operator to develop a
          new class of patterns.  An interesting substring at the beginning
          of the subject is immediately assigned to a variable, and the
          variable is then subsequently used in the very same pattern.

            Suppose a number at the beginning of the subject specifies the
          length of a variable width field that follows.  We would like to
          capture the number into variable N, then use it with the LEN
          function to transfer the data into variable FIELD.  When used
          with LEN, N must be preceded by the unevaluated expression opera-
          tor, so that its new value is retrieved.  For instance:

               ?       FPAT = SPAN('0123456789') $ N LEN(*N) . FIELD
               ?       '12ABCDEFGHIJKLMNOPQ' FPAT
               Success
               ?       OUTPUT = FIELD
               ABCDEFGHIJKL

            SPAN matched the field length, 12, and immediately assigned it
          to N.  LEN(*N) then matched the next 12 characters.  Another sub-
          ject, with a different field length, would update N appropri-
          ately.  Type conversion was working quietly behind the scenes
          here:  N was assigned the string '12', yet it appeared as integer
          12 to the LEN function.

            Now here is an example which provides a glimpse of just how
          powerful SNOBOL4's pattern matching can be.  Problem:  Examine a
          subject for an arbitrary three-character substring which appears
          twice in a row, or bracketed in parentheses.  Solution:










       Tutorial                        - 54 -        Operators and Data Types





               ?       TWOPAT = LEN(3) $ X . OUTPUT *(X | "(" X ")")
               ?       'ABCDECDEFGH' TWOPAT
               CDE
               Success
               ?       'ABCDE(CDE)BA' TWOPAT
               CDE
               Success

            As you experiment with these types of patterns, you may dis-
          cover some which fail when they should succeed.  The problem is
          that SNOBOL4 stops matching when it believes further match at-
          tempts would be futile.  These "heuristics" are normally invisi-
          ble, and speed program execution.  At this time, we'll defer dis-
          cussing heuristics, and simply mention that they can be disabled
          with the statement:

                       &FULLSCAN = 1

            Let's take a break from pattern matching, and examine some
          other SNOBOL4 data types.


                                     7.4 ARRAYS


          7.4.1 Array Concepts

            Arrays in SNOBOL4 are similar to arrays in other programming
          languages.  They allow a single variable name to specify more
          than one data element; integer subscripts distinguish the indi-
          vidual members of an array.  Each array element may contain any
          data type, independent of the types in other array elements.

            A one-dimensional array is a "vector;" it is simply a list of I
          items.  A two-dimensional array is a "grid" composed of several
          adjacent vectors---an I by J array has I rows and J columns.  A
          three-dimensional array, I by J by K in size, is a rectangular
          solid consisting of K adjacent grids.  There's no limit to the
          number of dimensions allowed, but such arrays become increasingly
          difficult to visualize.

            In keeping with SNOBOL4's pliability, an array is defined dur-
          ing program execution, rather than at compilation time.  Its size
          and shape is specified by a string.  The definition of an array
          may be changed at any time, or the array may be deleted and its
          memory reused when it is no longer needed.


          7.4.2 Array Creation

            Arrays are created by the SNOBOL4 function ARRAY.  A program
          calls this function with a "prototype string" which specifies the
          number of dimensions and their sizes.  The function returns an
          "array pointer," which is stored in a variable; the array ele-



       Tutorial                        - 55 -        Operators and Data Types





          ments are referenced by applying subscripts to this variable.
          Here are two statements for use with CODE.SNO.  They create one-
          and two-dimensional arrays named LIST and BOX respectively:

               ?       LIST = ARRAY('25')
               ?       BOX = ARRAY('12,3')

            LIST points to a vector of 25 elements.  BOX points to a grid,
          12 rows high and 3 columns wide, containing 36 elements.  The
          ARRAY function initializes all array elements to the null string.


          7.4.3 Array Referencing

            Array subscripts are integer valued, and are specified by angu-
          lar or square brackets (<> or []).  Subscript values range from 1
          to the size of each dimension.  If you attempt to use a subscript
          outside this range, the array reference will fail, and the fail-
          ure may be detected in the GOTO portion of the statement.  Try
          some array references with CODE.SNO:

               ?       LIST<3> = 'MAPLE'
               ?       BOX[10,2] = 3
               ?       LIST[33] = 4
               Failure
               ?       OUTPUT = LIST[3] LIST[4] BOX<10,2>
               MAPLE3

            Angular and square brackets are interchangeable.  The reference
          to LIST[33] failed because the largest subscript allowed for that
          array is 25.  LIST[4] produced its initialized value, the null
          string, and had no effect on the concatenation.  The array
          pointer in LIST can be assigned to another variable:

               ?       B = LIST
               ?       OUTPUT = B[3]
               MAPLE
               ?       B<3> = 'WILLOW'
               ?       OUTPUT = LIST<3>
               WILLOW

            Assigning the pointer in LIST to B made both variables point to
          the same array.  Since there's but one actual array, array refer-
          ences made using LIST or B are equivalent.  The COPY function
          (Chapter 19) creates a duplicate copy of an entire array.

            Array elements may be used anywhere a variable name is
          allowed---expressions, patterns, function arguments, etc.  The
          fact that an array reference fails if a subscript is out-of-
          bounds can be used in a simple and natural way when scanning an
          array.  Rather than having to know an array's size, we simply
          loop until an array reference fails.  A program segment to dis-
          play the members of an array SCORE might look like this:




       Tutorial                        - 56 -        Operators and Data Types





                       I = 0
               PRINT   I = I + 1
                       OUTPUT = SCORE[I]                    :S(PRINT)
                        . . .


          7.4.4 Array Initialization

            Arrays may be created with an initial value other than the null
          string.  ARRAY accepts a second argument which specifies this
          initial value.  We can create a three-dimensional array with all
          elements initialized to the string 'PA-18' as follows:

               ?       A = ARRAY('2,3,4','PA-18')
               ?       OUTPUT = A[1,2,3]
               PA-18


          7.4.5 Other Array Bounds

            Ordinarily, subscripts range from 1 to the size of each dimen-
          sion.  However, if you find it more convenient, other subscript
          ranges may be used.  The prototype string for ARRAY's first
          argument has the general form:

                       'L1:H1,L2:H2,...,Ln:Hn'

            The L's and H's are integers specifying the lower and upper
          bounds of each dimension.  If the lower bound and colon are
          omitted from any dimension, the integer 1 is assumed.  Here is a
          five element vector, with allowed subscripts -2, -1, 0, 1 and 2:

               ?       A = ARRAY('-2:2','PIPER')
               ?       OUTPUT = A[-1]
               PIPER
               ?       OUTPUT = A[3]
               Failure

            Arrays are a traditional computer programming concept.  Now
          we'll see how SNOBOL4 takes the idea one important step further,
          with the concept of tables.


                                     7.5 TABLES


          7.5.1 Table Creation and Referencing

            A "table" is similar to a one-dimensional array, with two
          important differences.  First, a table's size is not fixed; it
          extends itself automatically whenever a new element is added to
          it.  Second, table subscripts are not limited to integers, but
          may be any SNOBOL4 data type.  Strings and patterns may be used
          as subscripts.  Tables combine the idea of associative program-



       Tutorial                        - 57 -        Operators and Data Types





          ming with the data grouping of arrays.

            Tables are created by the SNOBOL4 function TABLE.  No arguments
          are required, since a table's size is not fixed.  The function
          returns a table pointer, which you store in a variable.  Like
          arrays, table elements are referenced by applying subscripts to
          the variable.  Try this example with CODE.SNO:

               ?       T = TABLE()
               ?       T['ROSE'] = 'RED'
               ?       T['N'] = 6
               ?       OUTPUT = T['N'] T['THE'] T['ROSE']
               6RED
               ?       FLOWER = 'ROSE'
               ?       T[FLOWER] = T[FLOWER] ',THORNS'
               ?       OUTPUT = T[FLOWER]
               RED,THORNS

            Here, strings have been used as table subscripts.  The concept
          of an "out-of-bounds" subscript does not exist with tables.  The
          reference to T['THE'] created a new entry, and assigned it the
          null string.  Unlike arrays, no initial value for new entries may
          be specified in the call to TABLE; new table entries are always
          initialized to the null string.


          7.5.2 Conversion between Tables and Arrays

            In the above example, we know what values were used as table
          subscripts.  But if the table were constructed from data in a
          file, how can we determine what items were placed in the table?
          We need to know the subscripts to view the table, but the sub-
          scripts themselves are part of the table.  If this were an array,
          we could run an integer subscript over the array to see the data.
          Applying integer subscripts to a table only creates more entries.

            SNOBOL4 provides a simple solution to this dilemma---a method
          to convert a table to an array.  An N row by 2 column array can
          be created from a table.  The first array column contains the
          subscripts that were used to create the table.  The second column
          contains the data items stored with the corresponding table sub-
          script.  N is the number of table entries with nonnull values.

            Once the table is in array form, integer subscripts can be
          applied to the array to display the subscripts and their values.
          A table is converted to an array with the CONVERT function, which
          accepts a table argument and the word 'ARRAY', and returns a
          pointer to the new array.  Continuing with the previous example:









       Tutorial                        - 58 -        Operators and Data Types





               ?       A = CONVERT(T, 'ARRAY')
               Success
               ?       OUTPUT = A[1,1] '-' A[1,2]
               ROSE-RED,THORNS
               ?       OUTPUT = A[2,1] '-' A[2,2]
               N-6

            As you would expect with SNOBOL4, the inverse operation---con-
          version of an array to a table---is also possible.  The array
          must be rectangular, N rows by 2 columns.  The array entries in
          the first column become the table subscripts.  The array's second
          column becomes the table entry values:

               ?       W = CONVERT(A, 'TABLE')
               Success
               ?       OUTPUT = W['ROSE']
               RED,THORNS


          7.5.3 Counting Word Usage with a Table

            Tables are useful when we want to record a number of pair asso-
          ciations, where each half of the pair might have any data type.
          A classic example of a table's utility is a word usage program.
          Earlier, we developed a program to count the total number of
          words in a file.  We will modify that program to count the number
          of times each unique word appears.  The program begins like this:






























       Tutorial                        - 59 -        Operators and Data Types





               *       Simple word usage program, WORDU.SNO.
               *
               *  A word is defined to be a contiguous run of letters,
               *  digits, apostrophe and hyphen.  This definition of legal
               *  letters in a word can be altered for specialized text.
               *
               *  If the file to be counted is TEXT.IN, run as follows:
               *       B>SNOBOL4 WORDU /I=TEXT
               *
                       &TRIM  = 1
               
               *  Define the characters which comprise a 'word'
                       WORD   = "'-"  '0123456789' &LCASE
               
               *  Pattern to isolate each word as assign it to ITEM:
                       WPAT   = BREAK(WORD) SPAN(WORD) . ITEM
               
               *  Create a table to maintain the word counts
                       WCOUNT = TABLE()
               
               *  Read a line of input and obtain the next word
               NEXTL   LINE   = REPLACE(INPUT, &UCASE, &LCASE)   :F(DONE)
               NEXTW   LINE WPAT =                               :F(NEXTL)
               
               *  Use word as subscript, update its usage count
                       WCOUNT<ITEM> = WCOUNT<ITEM> + 1           :(NEXTW)
               DONE     . . .

            We'll convert the input to lower case, so words like 'The' and
          'the' are counted together.  WPAT has been changed to store each
          word in variable ITEM.  When a word is identified, it is used as
          a subscript for table WCOUNT.  When ITEM contains a new word, the
          first reference to WCOUNT<ITEM> creates a new table entry and
          returns the null string.  Integer 1 is added to the null string,
          and the result, 1, is stored back into WCOUNT<ITEM>.  If the same
          word is encountered again, WCOUNT<ITEM> for that word will be
          incremented to 2.

            The program reads the input file, building a table with entries
          for each unique word.  When End-of-File is read, control trans-
          fers to label DONE, where we display the words and their respec-
          tive counts.  We convert WCOUNT to an array, and use integer
          subscripts to retrieve the words and their counts.  Conversion
          fails if the table is empty.  Continuing with this program:













       Tutorial                        - 60 -        Operators and Data Types





               *  Convert table to array.  Fail if table is empty
               DONE    A = CONVERT(WCOUNT, 'ARRAY')              :F(EMPTY)
               
               *  Scan array, printing words and counts
                       I = 0
               PRINT   I = I + 1
                       OUTPUT = A<I,1> '--' A<I,2>     :S(PRINT) F(END)
               
               EMPTY   OUTPUT = 'No words'
               END

            The table subscripts were the file's words, and have been
          placed in the first column of the array, A<I,1>.  The count for
          each word was the table entry, now in the second column, A<I,2>.
          Tables are very convenient for recording information about data
          items, while conversion to an array makes it easy to systemati-
          cally examine the recorded information.


                                7.6 THE NAME OPERATOR

            The unary name operator provides the address or location in
          memory where a variable is stored.  Its graphic symbol is the
          period (.).  We'll introduce it here through an example.

            Consider the indirect reference operator mentioned earlier.
          Suppose we want to use a variable to point to different elements
          of an array or table.  If we try the following, we immediately
          discover a problem:

               ?       A = ARRAY('10,10')
               ?       A[4,2] = 'DOG'
               ?       V = 'A[4,2]'
               ?       OUTPUT = $V
               
               Success

            The indirect reference operator treats the string 'A[4,2]' as a
          variable name, rather than an array element.  Remember, any char-
          acter sequence can be used indirectly to create a variable.
          SNOBOL4 creates a variable called A[4,2] that has absolutely no
          connection with array A.  The fact that this character sequence
          happens to look like an array reference to us is purely coinci-
          dental from SNOBOL4's point of view.

            To make this work, the name operator is applied to A[4,2] to
          obtain the address of that array element.  The address can be
          stored in variable V, and referenced with the indirect operator:

               ?       V = .A[4,2]
               ?       OUTPUT = $V
               DOG

            The name operator provides a general method for specifying the



       Tutorial                        - 61 -        Operators and Data Types





          name of an object.  Both of these statements are correct for
          specifying the first argument to the INPUT function:

                       INPUT('INFILE', 1, , 'CAPITAL.DAT')
                       INPUT(.INFILE,  1, , 'CAPITAL.DAT')

            Either form, 'INFILE' or .INFILE, tells the INPUT function the
          name of the variable to be input associated.  However, using the
          name operator allows us to associate a file with an array or
          table element:

                       INPUT('A[4,2]', 1, , 'CAPITAL.DAT')  (incorrect)
                       INPUT(.A[4,2],  1, , 'CAPITAL.DAT')

            Note that alternate use of the indirect reference and name
          operators "cancel" one another, so

               ?       OUTPUT = $(.($(.A[4,2])))
               DOG

          is simply a reference to A[4,2].




































       Tutorial                        - 62 -        Operators and Data Types





                                                                  Chapter 8


                                                    PROGRAM-DEFINED OBJECTS
          -----------------------------------------------------------------

            SNOBOL4 is a very large and rich language, providing a diverse
          assortment of built-in features.  It is also an extensible lan-
          guage; it allows you to define new data types, functions, and
          operators.  You can, by creating your own entities, obtain
          another level of conciseness and power of expression.

            We will begin with program-defined functions because they allow
          a program to be partitioned into smaller, more manageable seg-
          ments.  As functions tend to be just a few lines long, transfers
          of control within them are usually obvious and manageable.  If
          your main program has complex, intertwined GOTOs, consider how
          the use of functions would clarify things.

            Functions also allow us to postpone the complete development of
          an algorithm.  We can design the overall program structure, using
          function names for components which will be developed later.
          Furthermore, if a particular function proves inefficient, it can
          be replaced later with an improved version.


                            8.1 PROGRAM-DEFINED FUNCTIONS

            The concept of a function should be clear from all the examples
          of SNOBOL4's built-in functions.  A function accepts some number
          of arguments, performs a computation based on their values, and
          returns a result and a success signal.  A function can also sig-
          nal failure, and not return any value.


          8.1.1 Function Definition

            We can define a new function by specifying its name and argu-
          ments.  The definition will be composed of "dummy arguments"---
          place holders that show how the arguments are to be used in the
          function.  Later, when the function is called, the actual argu-
          ments will replace the dummy arguments in the computation.

            We define a new function in SNOBOL4 by using the built-in func-
          tion DEFINE.  We call it with a "prototype string" containing the
          new function's name and arguments.  DEFINE makes the new func-
          tion's name known to SNOBOL4, so it can be used subsequently.

            Suppose we want to create a new function called SHIFT, which
          would circularly rotate a string through a specified number of
          character positions.  We'll define all rotations as being to the
          left---characters removed from the front of the string are placed
          back on the end.  For example, SHIFT('ENGRAVING',3) would return
          the string 'RAVINGENG'.



       Tutorial                        - 63 -         Program-Defined Objects





            We will begin by defining the function name and its dummy argu-
          ments, S and N.  Any names of your choosing can by used for dummy
          arguments.  In a program, it would look like this:

                       DEFINE('SHIFT(S,N)')

            It is important to realize that the DEFINE function must be
          executed for the definition to occur.  Most other programming
          languages process function definitions when a program is com-
          piled.  SNOBOL4's system is more flexible; the prototype string
          can itself be the result of other run-time computations.  In an
          extreme case, data input to a program could determine the names
          and kinds of functions to be defined.


          8.1.2 The Function Body

            Having declared the function name and dummy arguments, we need
          to provide the statements which will implement the function.  A
          very simple convention applies:

               When the function is used, SNOBOL4 transfers control to
               a statement label with the same name as the function.

            In this case, the first statement of the function would be
          labeled SHIFT.  There is no limit to the number of statements
          comprising the function body.


          8.1.3 Returning Function Results

            First, a function may return a value by assigning it to a vari-
          able with the same name as the function.  If no assignment
          occurs, the result is the null string.

            Second, the function must tell SNOBOL4 that it is finished, and
          that control should return back to the caller.  It does this by
          transferring to the special label RETURN.

            The label RETURN should not appear anywhere in your program.
          It is a special name, reserved by SNOBOL4 for just this purpose.

            With this information, we can now write our SHIFT function.  We
          will remove the first N characters from the beginning of the ar-
          gument string, and place them on the end.  The function body
          looks like this:

               SHIFT   S LEN(N) . FRONT REM . REST
                       SHIFT = REST FRONT                   :(RETURN)

            Each time SHIFT is called, the particular arguments used are
          placed in S and N.  The first statement splits S into two parts,
          assigning them to variables FRONT and REST.  The second statement
          reassembles them in the shifted order, and assigns them to vari-



       Tutorial                        - 64 -         Program-Defined Objects





          able SHIFT, to be returned as the function result.  The GOTO then
          transfers to label RETURN to return back to the caller.


          8.1.4 Function Failure

            What happens if we try the function call SHIFT('PEAR',7)?  As
          the function is defined above, the pattern match would fail,
          since LEN(7) is longer than the subject string.  The assignment
          to FRONT and REST would not take place, and the function would
          return an erroneous result.

            Now we could extend the definition of SHIFT to cycle the
          argument string multiple times.  In general, though, we want to
          develop a convenient method that allows a function to signal an
          exceptional condition back to the caller.  Function failure
          allows us to do just that.  Another convention is provided:

               Transferring to the special label FRETURN returns from
               a function signaling failure to the caller.  No value
               is returned as the function result.

            We can now rework the function body to signal failure when N is
          too large.  In this case, the pattern match fails, and we detect
          the failure in the GOTO field:

               SHIFT   S LEN(N) . FRONT REM . REST          :F(FRETURN)
                       SHIFT = REST FRONT                   :(RETURN)

            In general, the transfer to FRETURN does not need to be the
          result of the failure of a particular statement.  Any success or
          failure could be tested to produce a transfer to FRETURN.  For
          example, if we decided to explicitly test the length of S, the
          function could begin with:

               SHIFT   GT(N, SIZE(S))                       :S(FRETURN)
                        . . .


          8.1.5 Local Variables

            FRONT and REST were used in this function as temporary vari-
          ables to rearrange the argument string.  If they had appeared
          elsewhere in your program, their old values would be destroyed.
          Such inadvertent conflicts become harder to avoid as your func-
          tion library grows.  The prototype string used with DEFINE can
          specify "local variables" to be protected when the function is
          called.  For our SHIFT function, the call would look like this:

                       DEFINE('SHIFT(S,N)FRONT,REST')

            The local variables appear after the argument list.  When SHIFT
          is called, any existing values for FRONT and REST will be saved
          on a pushdown stack.  FRONT and REST are set to the null string,



       Tutorial                        - 65 -         Program-Defined Objects





          and control is transferred to the first statement of the function
          body.  When the function returns, FRONT and REST are restored to
          their previous values.

            Since the same potential problem exists for dummy arguments S
          and N, SNOBOL4 automatically saves their values before assigning
          the actual arguments to them.  And just like local variables,
          when the function returns, the dummy arguments are restored to
          their original values.


          8.1.6 Using Functions

            Once a function has been defined, it may be used in exactly the
          same manner as a built-in function.  It may appear in a statement
          anywhere its value is needed---in the subject, pattern, or
          replacement fields.  If used with the indirect reference opera-
          tion, functions may even be used in the GOTO field.  Of course, a
          function may be used as the argument of another function.

            The value returned by a function is not restricted to strings.
          Any SNOBOL4 data type, including patterns, may be returned.  Ear-
          lier, in the pattern match chapter, we showed how simple patterns
          could be tailored to our needs by using them in more complicated
          clauses.  The specific example was a variation of the BREAK pat-
          tern which would not match the null string.  Let's use a program-
          defined function to create a new function, BREAK1, with this
          property.  The definition statement might look like this:

                       DEFINE('BREAK1(S)')

          and the function body, like this:

               BREAK1  BREAK1 = NOTANY(S) BREAK(S)          :(RETURN)

            This function can now be used directly in a pattern match.  For
          example, BREAK1('abc') constructs a pattern which matches a non-
          null string, up to the occurrence of the letters 'a', 'b', or
          'c'.  Of course, the pattern returned by a function can be as
          complex as desired, giving us an elegant method to define our own
          pattern matching primitives.


          8.1.7 Organizing Functions

            SNOBOL4 does not know or care which statements belong to a
          particular function.  There is no explicit END statement for
          individual functions.  To keep programs readable, we'll have to
          impose some discipline of our own.  Also, having to execute the
          DEFINE function is a mixed blessing.  It offers tremendous flexi-
          bility, but requires us to place all our DEFINE's at the begin-
          ning of a program.  Here is the system proposed by Gimpel, which
          I like to use to manage functions and their definitions:




       Tutorial                        - 66 -         Program-Defined Objects





            We keep the function definition, any one-time initialization,
          and the function body together as a unit.  A GOTO transfers con-
          trol around the function body after the definition and initial-
          ization statements are executed.  Also present are comments de-
          scribing its use and any exceptional conditions.  Rewriting the
          SHIFT function in this form, and taking this opportunity to avoid
          rebuilding the pattern each time the function is called, it looks
          like this:

               * SHIFT(S,N)  -  Shift string S left N character positions.
               *  As characters are removed from the left side of the
               *  string, they are placed on the end.
               *
               *  The function fails if N is larger than the size of S.
               
                       DEFINE('SHIFT(S,N)FRONT,REST')
                       SHIFT_PAT = LEN(*N) . FRONT REM . REST :(SHIFT_END)
               
               SHIFT   S SHIFT_PAT                          :F(FRETURN)
                       SHIFT = REST FRONT                   :(RETURN)
               SHIFT_END

            Now this group of lines can be incorporated as a unit into the
          beginning of any program that wants to use it.  When execution
          begins, the first statement defines the SHIFT function.  Next we
          define a pattern, called SHIFT_PAT, for use when the function is
          called.  The pattern definition is only executed once, so we use
          the unevaluated expression operator (*N) to obtain the current
          value of N on each function call.  After defining the pattern, we
          "jump around" the function body, to label SHIFT_END.  (Remember,
          we are defining the function now, not executing it; falling into
          the function body would be an error.)  The function is now de-
          fined, and ready to be used.

            In general, functions should be prepared in this form:

               * Fname  - Description of use
               
                       DEFINE('Fname(arg1,...,argn)local1,...,localn')
                        . . .
               *       Any one-time initialization for Fname
                        . . .                               :(Fname_END)
               
               Fname   Function body
                        . . .
               Fname_END

            If you place your functions in individual disk files, they can
          be included in new programs as necessary.  By preparing functions
          in this form, they will all be defined and initialized when exe-
          cution begins.

            When discussing pattern matching, we used a pattern to convert
          a character to its ASCII decimal value.  In BASIC, two functions



       Tutorial                        - 67 -         Program-Defined Objects





          are provided for similar operations: ASC and CHR$.  We can create
          SNOBOL4 equivalents like this:

               * ASC(S) - Return the ASCII code for the first character of
               *          string S.
               *
               *       The value returned is an integer between 0 and 255.
               *       The function fails if S is null.
               
                       DEFINE('ASC(S)C')
                       ASC_ONE = LEN(1) . C
                       ASC_PAT = BREAK(*C) @ASC             :(ASC_END)
               
               ASC     S ASC_ONE                            :F(FRETURN)
                       &ALPHABET ASC_PAT                    :(RETURN)
               ASC_END

               * CHR(N) - Converts an integer ASCII code to a one character
               *          string.
               *
               *       The argument N is an integer between 0 and 255.
               *       The function fails if N is greater than 255.
               
                       DEFINE('CHR(N)')
                       CHR_PAT = TAB(*N) LEN(1) . CHR       :(CHR_END)
               
               CHR     &ALPHABET CHR_PAT              :S(RETURN) F(FRETURN)
               CHR_END

            Note that both functions were written to work correctly regard-
          less of the anchoring mode in use by the calling program.

            (The CHR function is shown here as an example only.  Vanilla
          SNOBOL4 provides a built-in function, CHAR(N), for this purpose.
          See Chapter 19, "Built-in Functions.")


          8.1.8 Call by Value and Call by Name

            Function calls in SNOBOL4 transmit the "value" of the argument
          to the function.  Variables used in the function call cannot be
          harmed by the function.  This type of function usage is referred
          to as "call by value."  Occasionally, we might want the function
          to access the argument variables themselves.  The name operator
          introduced in the previous chapter provides this ability.  The
          function call still transmits a value, but the value used is the
          "name" of a variable.

            Consider a function called SWAP, which will exchange the con-
          tents of two variables.  If we wanted to exchange the contents of
          variables COUNT and OLDCOUNT, we would say SWAP(.COUNT,
          .OLDCOUNT).  The function looks like this:





       Tutorial                        - 68 -         Program-Defined Objects





               * SWAP(.V1, .V2) - Exchange the contents of two variables.
               *  The variables must be prefixed with the name operator
               *  when the function is called.
               
                       DEFINE('SWAP(X,Y)TEMP')              :(SWAP_END)
               
               SWAP    TEMP = $X
                       $X = $Y
                       $Y = TEMP                            :(RETURN)
               SWAP_END

            The name operator allows us to access the argument variables.
          If we had not used it, the function would be called with the
          variables' values, with no indication of where they came from.
          Calls to SWAP are not limited to simple variable arguments.  Any-
          thing capable of receiving the name operator, such as array and
          table elements, could be used:  SWAP(.A<4,3>, .T<'YOU'>).

            There are certain situations where call by name occurs implic-
          itly.  If the argument is an array or table name, or a program-
          defined data type (discussed below), it points to the actual data
          object, which can then be modified by the function.  For example,
          if FILL were a function which loads an array with values read
          from a file, the statements

                       A = ARRAY(25)
                       FILL(A)

          would cause array A to be altered.


          8.1.9 Functions and CODE.SNO

            The CODE.SNO program was provided to allow interactive experi-
          ments with SNOBOL4 statements.  If you create functions using the
          preceding format, they also can be tested using CODE.SNO.

            Use your text editor to create a disk file containing the SHIFT
          function.  (Be certain to include the GOTO that transfers around
          the function body.)  Call the file SHIFT.SNO.  Now, start the
          CODE.SNO program, and type the following:

               ?       SLOAD('SHIFT.SNO')
               Success
               ?       OUTPUT = SHIFT('COTTON',4)
               ONCOTT
               ?       OUTPUT = SHIFT('OAK',4)
               Failure









       Tutorial                        - 69 -         Program-Defined Objects





          8.1.10 Recursive Functions

            The statements that comprise a function are free to call any
          functions they choose, including the function they are defining.
          Of course, for this to make sense, they must call themselves with
          a simplified version of the original problem, or an endless loop
          would result.  Eventually, the function calls itself with an arg-
          ument so simple that it can return an answer without any further
          recursive calls.  It's like winding a clock spring up.  The
          central, non-recursive answer to the innermost call provides an
          answer to the next turn out, with the recursive calls unwinding
          until the original problem can be solved.

            There is no explicit declaration for recursion; any function
          can be used recursively if it is designed properly.  However, all
          local variables should be declared in the DEFINE function so they
          will be saved and restored during recursive calls.

            Sometimes, recursion can produce dramatically smaller programs.
          "Algorithms in SNOBOL4" provides a example with the recursive
          function, ROMAN.  It convert's an integer in the range 0 to 3999
          to its Roman numeral equivalent.  Two premises are required:

            1. We know the Roman numerals for the numbers 0 to 9 (null, I,
               II, ..., IX), and can perform this conversion with a simple
               pattern match.

            2. We can use the REPLACE function to "multiply" a number in
               Roman form by 10 by replacing I by X, V by L, X by C, etc.

            The function uses these two rules to produce a recursive solu-
          tion for some integer N.  The algorithm looks like this:

               The rightmost digit is removed from the argument and
               converted by premise 1.  Removing the digit effectively
               divides the argument by 10, simplifying the problem.

               The reduced argument is then converted by calling ROMAN
               recursively and "multiplying" the result by 10 accord-
               ing to premise 2.

               The previously converted unit's digit is appended to
               the result.

          Here's the function (note that a "plus sign" in column one allows
          a statement to be continued over several lines):











       Tutorial                        - 70 -         Program-Defined Objects





               * ROMAN(N) - Convert integer N to Roman numeral form.
               *
               *  N must be positive and less than 4000.
               *
               *  An asterisk appears in the result if N >= 4000.
               *
               *  The function fails if N is not an integer.
               
                       DEFINE('ROMAN(N)UNITS')              :(ROMAN_END)
               
               *  Get rightmost digit to UNITS and remove it from N.
               *  Return null result if argument is null.
               ROMAN   N RPOS(1) LEN(1) . UNITS =           :F(RETURN)
               
               *  Search for digit, replace with its Roman form.
               *  Return failing if not a digit.
                       '0,1I,2II,3III,4IV,5V,6VI,7VII,8VIII,9IX,'  UNITS
               +         BREAK(',') . UNITS                 :F(FRETURN)
               
               *  Convert rest of N and multiply by 10.  Propagate a
               *  failure return from recursive call back to caller.
                       ROMAN = REPLACE(ROMAN(N), 'IVXLCDM', 'XLCDM**')
               +               UNITS            :S(RETURN) F(FRETURN)
               ROMAN_END

            The first call to ROMAN may have an integer argument.  The
          statement labeled ROMAN causes N to be converted to a string, and
          subsequent recursive calls use a string argument.  The recursive
          calls cease when reducing N finally produces a null string
          argument---the match at statement ROMAN fails, and the function
          returns immediately with a null result.


                           8.2 PROGRAM-DEFINED DATA TYPES

            With the exception of arrays and tables, a variable may have
          only one item of data in it at a time.  In many applications, it
          is convenient if several data items can be associated with a
          variable.  For example, if we wanted to work with complex num-
          bers, a variable should contain two numbers---the real and imagi-
          nary parts.  In an inventory system, an individual product might
          require values such as name, price, quantity, and manufacturer.

            Program-defined data types enlarge SNOBOL4's repertoire to
          include new objects such as COMPLEX or PRODUCT.  SNOBOL4 only
          provides a system for managing these new types; defining a data
          type does not magically invest SNOBOL4 with a knowledge of com-
          plex arithmetic or inventory accounting.  It is still up to you
          to provide the computational support for each new type.


          8.2.1 Data Type Definition

            A program-defined data type will consist of a number of fields,



       Tutorial                        - 71 -         Program-Defined Objects





          each containing an individual data element.  We begin by select-
          ing names for the data type and fields.  An inventory system
          might use the data type name PRODUCT, and field names NAME,
          PRICE, QUANTITY, and MFG.

            A data type is defined by providing a prototype string to the
          built-in DATA function.  The prototype assumes a form similar to
          a function call, with the data type taking the place of the func-
          tion name, and the field names replacing the arguments.  The form
          of the prototype string is:

                       'TYPENAME(FIELD1,FIELD2,...,FIELDn)'

            Blanks are not permitted within a prototype.  Try creating a
          new data type using the CODE.SNO program:

               ?       DATA('PRODUCT(NAME,PRICE,QUANTITY,MFG)')
               Success

            The DATA function tells SNOBOL4 to define an object creation
          function with the new data type's name:

                       PRODUCT(arg1, arg2, arg3, arg4)

            This new function can be called whenever we wish to create a
          new object with the PRODUCT data type.  Its arguments are the
          initial values to be given to the four fields which comprise a
          PRODUCT.  The function returns a pointer to the new object, which
          can be stored in a variable, array, or table.  Try creating two
          new objects as follows:

               ?      ITEM1 = PRODUCT('CAPERS', 2, 48, 'BRINE BROTHERS')
               ?      ITEM2 = PRODUCT('PICKLES', 1, 72, 'PETER PIPER INC.')


          8.2.2 Data Type Use

            The defining call to the DATA function also created several
          field reference functions.  In this case, their names would be:

               NAME(arg)    PRICE(arg)    QUANTITY(arg)    MFG(arg)

            The argument used with each function is an object created by
          the PRODUCT function.  Try accessing ITEM1's fields:

               ?       OUTPUT = MFG(ITEM1)
               BRINE BROTHERS
               ?       OUTPUT = PRICE(ITEM1) * QUANTITY(ITEM1)
               96

            We can alter the value of a field after an object is created.
          Field reference functions can also be used as the object of an
          assignment, so:




       Tutorial                        - 72 -         Program-Defined Objects





               ?       QUANTITY(ITEM2) = QUANTITY(ITEM2) - 12

          changes the QUANTITY field of ITEM2 from 72 to 60.


          8.2.3 Copying Data Items

            It is important to recognize that variables like ITEM1 and
          ITEM2 contain "pointers" to the data.  Assigning ITEM1 to another
          variable, say LASTITEM, merely copies the pointer; both variables
          still point to the same physical packet of data in memory.
          Altering the QUANTITY field of ITEM1 would alter the QUANTITY
          field of LASTITEM.  This is the same behavior observed earlier
          for array and table names.

            The built-in COPY function creates a unique copy of an object--
          one which is independent of the original.  Try using it with
          CODE.SNO:

               ?       LASTITEM = COPY(ITEM1)
               ?       QUANTITY(ITEM1) = 24
               ?       OUTPUT = QUANTITY(LASTITEM)
               48


          8.2.4 Creating Structures

            Our inventory example used string and integer values as the
          field contents.  In fact, any SNOBOL4 data type may be stored in
          a field, including pointers to other program-defined types.  Com-
          plex structures, such as queues, stacks, trees, and arbitrary
          graphs may be created.

            For example, if we wanted to link together all products made by
          the same manufacturer, PRODUCT could be defined with an addi-
          tional field.  We won't go through the exercise with CODE.SNO,
          but will sketch out the changes:

               DATA('PRODUCT(NAME,PRICE,QUANTITY,MFG,MFGLINK')

            As each product is defined, we will determine if we have
          another product from the same manufacturer.  If so, MFGLINK is
          set to point to that other product.  If not, it is set to the
          null string.  A table M provides a convenient way to keep track
          of manufacturers.  Assume variable COMPANY contains the manufac-
          turer's name as each product is defined.  Then all of the requi-
          site searching and linking can be accomplished in one statement:

               M<COMPANY> = PRODUCT(..., ..., ..., COMPANY, M<COMPANY>)

            If this is the company's first appearance, it is not in the
          table, and the last argument to the PRODUCT function sets MFGLINK
          to the null string.  The assignment statement uses the company as
          the table subscript, and the entry points to the current product.



       Tutorial                        - 73 -         Program-Defined Objects





          If another product definition uses the same company, MFGLINK will
          point to the previous product, and the table will be updated to
          point to the current product.  In this manner, all products from
          a manufacturer will be threaded together.  Each thread starts
          with a table entry, and goes through each product's MFGLINK
          field, ending with a null string in the last product's MFGLINK.

            Now if we wanted to display all products supplied by a particu-
          lar manufacturer, we select and follow the appropriate thread:

                       X      =  M<COMPANY>
               LOOP    OUTPUT =  DIFFER(X) NAME(X)          :F(DONE)
                       X      =  MFGLINK(X)                 :(LOOP)
               DONE


          8.2.5 The DATATYPE Function

            The DATATYPE function allows you to learn the type of data in a
          particular variable.  It is useful when the kind of processing to
          be performed depends on the data type.  The formal data type name
          is returned as an upper-case string:

               ?       OUTPUT = DATATYPE(54)
               INTEGER
               ?       OUTPUT = DATATYPE(ITEM1)
               PRODUCT


                            8.3 PROGRAM-DEFINED OPERATORS

            If you can define new functions and data types, why not new
          operators too?  Indeed, SNOBOL4 provides this feature, although
          most programs can be written without it.  For the sake of com-
          pleteness, we'll provide a brief discussion.


          8.3.1 Operators and Functions

            Unary or binary operators can be thought of as functions of one
          or two arguments.  For example, A + B can be written in func-
          tional form as PLUS(A,B), where PLUS is some function which im-
          plements addition.  Operators can be redefined by specifying a
          function to replace them.  We still write our program in terms of
          the operator's graphic symbol, but SNOBOL4 will use the specified
          function whenever the operator must be performed.











       Tutorial                        - 74 -         Program-Defined Objects





            The built-in function OPSYN creates synonyms and new defini-
          tions for operators.  Synonyms permit different names or symbols
          to be used in place of a function or operator.  The general form
          of OPSYN is:

                       OPSYN(new name, old name, i)

            The new name is defined as a synonym of the old name.  The
          third argument is 0, 1, or 2 if we are defining functions, unary
          operators, or binary operators respectively.


          8.3.2 Function Synonyms

            We can make the name LENGTH a synonym for the SIZE function:

               ?       OPSYN('LENGTH', 'SIZE', 0)
               ?       OUTPUT = LENGTH('RABBIT')
               6

            The word synonym is not quite an accurate description of OPSYN.
          The name LENGTH becomes associated with the "code" that imple-
          ments the SIZE function, not with the word SIZE per se.  If SIZE
          was subsequently redefined---perhaps as a program-defined
          function--LENGTH would continue to return the number of
          characters in a string.


          8.3.3 Operator Synonyms

            Take a moment to examine the tables in Chapter 15, "Operators,"
          in the reference section.  Note that in each table there are a
          number of operator symbols whose definition is <none>.

            If you use an undefined binary operator, you'll get an error:

               ?       OUTPUT = 1 # 1
               Execution error #5, Undefined function or operation

            However, we could make this operator synonymous with the DIFFER
          function (which also uses two arguments) and use it instead:

               ?       OPSYN('#', 'DIFFER', 2)
               ?       OUTPUT = 1 # 2
               Failure

            Conversely, we can define a function in place of an operator:

               ?       OPSYN('PLUS', '+', 2)
               ?       OUTPUT = PLUS(4, 5)
               9






       Tutorial                        - 75 -         Program-Defined Objects





            Unary operators can be similarly treated, using 1 as the third
          argument:

               ?       OPSYN('!', 'ANY', 1)
               ?       'ABC321' !'3C' . OUTPUT
               C

            Operators can be created to maintain a stack, or navigate
          around a tree.  The full generality of functions and program-
          defined data types are available to your operators.  Through this
          technique you can make SNOBOL4 speak the language of your
          particular problem.













































       Tutorial                        - 76 -         Program-Defined Objects





                                                                  Chapter 9


                                                            ADVANCED TOPICS
          -----------------------------------------------------------------

            The material presented so far allows you to write powerful
          SNOBOL4 programs.  In this chapter, we will examine other inter-
          esting and useful features of the language.


                               9.1 THE ARBNO FUNCTION

            This function produces a pattern which will match zero or more
          consecutive occurrences of the pattern specified by its argument.
          As its name implies, ARBNO is useful when an arbitrary number of
          instances of a pattern may occur.  For example, ARBNO(LEN(3))
          matches strings of length 0, 3, 6, 9, ....  There is no restric-
          tion on the complexity of the pattern argument.

            Like the ARB pattern, ARBNO is shy, and tries to match the
          shortest possible string.  Initially, it simply matches the null
          string.  If a subsequent pattern component fails to match,
          SNOBOL4 backs up, and asks ARBNO to try again.  Each time ARBNO
          is retried, it supplies another instance of its argument pattern.
          In other words, ARBNO(PAT) behaves like

                       ( '' |  PAT | PAT PAT | PAT PAT PAT | ... )

            Also like ARB, ARBNO is usually used with adjacent patterns to
          "draw it out."  Let's consider a simple example.  We want to
          write a pattern to test for a list.  We'll define a list as being
          one or more numbers separated by comma, and enclosed by parenthe-
          ses.  Use CODE.SNO to try this definition:

               ?       ITEM = SPAN('0123456789')
               ?       LIST = POS(0) '(' ITEM  ARBNO(',' ITEM) ')' RPOS(0)
               ?       '(12,345,6)' LIST
               Success
               ?       '(12,,34)' LIST
               Failure

            ARBNO is retried and extended until its subsequent, ')', fi-
          nally matches.  POS(0) and RPOS(0) force the pattern to be ap-
          plied to the entire subject string.

            Alternation may be used within ARBNO's argument.  This pattern
          matches any number of pairs of certain letters:

               ?       PAIRS = POS(0) ARBNO('AA' | 'BB' | 'CC') RPOS(0)
               ?       'CCBBAAAACC' PAIRS
               Success
               ?       'AABBB' PAIRS
               Failure



       Tutorial                        - 77 -                 Advanced Topics





                               9.2 RECURSIVE PATTERNS

            This is the pattern analogue of a recursive function---a pat-
          tern is defined in terms of itself.  The unevaluated expression
          operator makes the definition possible.

            Suppose we wanted to expand the previous definition of a list
          to say that a list item may be a span of digits, or another list.
          The definition proceeds as before, except that the unevaluated
          expression operator is used in the first statement; the concept
          of a list has not yet been defined:

               ?       ITEM = SPAN('0123456789') | *LIST
               ?       LIST = '(' ITEM  ARBNO(',' ITEM) ')'
               ?       TEST = POS(0) LIST RPOS(0)
               ?       '(12,(3,45,(6)),78)' TEST
               Success
               ?       '(12,(34)' TEST
               Failure

            Recursion occurs because LIST is defined in terms of ITEM,
          which is defined in terms of LIST, and so on.  Note that func-
          tions POS(0) and RPOS(0) were "moved out one level," to TEST, be-
          cause LIST must now match substrings within the subject.

            In our previous discussion of recursive functions, we said they
          work because successive calls present the function with progres-
          sively simpler problems, until the problem can be solved without
          further recursion.  Similarly, patterns ITEM and LIST are applied
          to successively smaller substrings, until ITEM can use its SPAN()
          alternative instead of invoking LIST again.

            In general, you will need an alternative somewhere in the re-
          cursive loop to allow the pattern matcher "a way out."  Also, you
          should place recursive objects last in a series of alternatives,
          so that the simpler, nonrecursive patterns are attempted first
          and "recursive plunges" can be avoided.

            SNOBOL4 saves information on a "pattern stack" during the pat-
          tern match process.  Heavily recursive patterns and long subject
          strings can sometimes result in stack overflow.  If this occurs,
          you should break the problem apart into several smaller pattern
          matches.

            As recursive patterns use the unevaluated expression operator,
          it is sometimes necessary to disable SNOBOL4's heuristics by
          setting &FULLSCAN = 1.


                             9.3 QUICKSCAN AND FULLSCAN

            Pattern matching can be very time-consuming because of the num-
          ber of possibilities which must be attempted.  In the normal
          "quickscan" mode, SNOBOL4 stops searching for a match when it



       Tutorial                        - 78 -                 Advanced Topics





          thinks further efforts would be futile.  The heuristics are com-
          plex, but can be summarized as follows: pattern matching fails
          when there are insufficient subject characters to satisfy the re-
          maining pattern components.

            The cursor operator can be used to demonstrate at what point
          SNOBOL4 gives up.  For example, in the pattern match

               ?       'ABCD' @OUTPUT 'X' LEN(3)
               0
               Failure

            SNOBOL4 does not attempt to match 'X' against 'B' because fewer
          than 3 subject characters remain after it, and LEN(3) could never
          succeed.

            A second type of heuristic is the "one character assumption"
          for unevaluated expressions.  SNOBOL4 assumes that unevaluated
          expressions will match at least one character.  This heuristic
          was originally provided to break recursive loops, but can cause
          programming problems when an unevaluated expression must match
          the null string.  Consider a pattern which succeeds if 'B' is at
          least 4 character positions beyond an 'A' in the subject:

               ?       P = 'A' ARB $ X 'B' *GE(SIZE(X), 4)
               ?       'A12345BC' P
               Success
               ?       'A12345B' P
               Failure

            The characters between 'A' and 'B' are matched by ARB, and im-
          mediately assigned to X.  The size of X is then compared to 4 by
          the GE function, which succeeds and returns the null string.
          This null string result should not interfere with the pattern
          match, but we find the pattern misbehaves when 'B' is the last
          character of the subject.  The unevaluated expression operator
          made SNOBOL4 assume a one character length for the GE function,
          and matching 'B' against the last subject character was never at-
          tempted.

            For most pattern matching, heuristics are invisible.  However,
          there are circumstances when we would like SNOBOL4 to be exhaus-
          tive in its match attempts.  We can disable heuristics and enter
          "fullscan" mode by setting keyword &FULLSCAN nonzero:













       Tutorial                        - 79 -                 Advanced Topics





               ?       &FULLSCAN = 1
               ?       'A12345B' P
               Success
               ?       'ABCD' @OUTPUT 'X' LEN(3)
               0
               1
               2
               3
               4
               Failure

            The quickscan mode can be reinstated by setting &FULLSCAN = 0.


                            9.4 OTHER PRIMITIVE PATTERNS

            We can accomplish quite a lot with just the primitive patterns
          ARB and REM.  However, there are five additional patterns which
          you should be aware of:

          -----------------------------------------------------------------

          ABORT                            End pattern match

            The ABORT pattern causes immediate failure of the entire pat-
          tern match, without seeking other alternatives.  Usually a match
          succeeds when we find a subject sequence which satisfies the pat-
          tern.  The ABORT pattern does the opposite: if we find a certain
          pattern, we will abort the match and fail immediately.  For exam-
          ple, suppose we are looking for an 'A' or 'B', but want to fail
          if '1' is encountered first:

               ?       '--AB-1-' (ANY('AB') | '1' ABORT)
               Success
               ?       '--1B-A-' (ANY('AB') | '1' ABORT)
               Failure

            The last example may be confusing because the ANY function ap-
          pears as the first alternative, fostering the illusion that it
          will find the 'B' in the subject before the other pattern alter-
          native is tried.  However, that is not the order of pattern
          matching; ALL pattern alternatives are tried at cursor position
          zero in the subject.  If none succeed, the cursor is advanced by
          one, and all alternatives are tried again.  When the cursor is in
          front of subject character '1', ANY still does not match, but the
          second alternative now does.  As the '1's match, ABORT is
          reached, causing failure.










       Tutorial                        - 80 -                 Advanced Topics





          -----------------------------------------------------------------

          BAL                              Match balanced string

            The BAL pattern matches the shortest nonnull string in which
          parentheses are balanced.  (A string without parentheses is also
          considered to be balanced.)  These strings are balanced:

               (X)      Y      (A!(C:D))      (AB)+(CD)      9395

            These are not:

               )A+B     (A*(B+)      (X))

            BAL is concerned only with left and right parentheses.  The
          matching string does not have to be a well-formed expression in
          the algebraic sense; in fact, it needn't be an algebraic expres-
          sion at all.  Like ARB, BAL is most useful when constrained on
          both sides by other pattern components.

          -----------------------------------------------------------------

          FAIL                             Seek other alternatives

            The FAIL pattern signals failure of this portion of the pattern
          match, causing the pattern matcher to backtrack and seek other
          alternatives.  FAIL will also suppress a successful match, which
          can be very useful when the match is being performed for its side
          effects, such as immediate assignment.  For example, in unan-
          chored mode, this statement will display the subject characters,
          one per line:

                       SUBJECT LEN(1) $ OUTPUT FAIL

            LEN(1) matches the first subject character, and immediately as-
          signs it to OUTPUT.  FAIL tells the pattern matcher to try again,
          and since there are no other alternatives, the entire match is
          retried at the next subject character.  Forced failure and re-
          tries continue until the subject is exhausted.

          -----------------------------------------------------------------

          FENCE                            Prevent match retries

            Pattern FENCE matches the null string and has no effect when
          the pattern matcher is moving left to right in a pattern.  How-
          ever, if the pattern matcher is backing up to try other alterna-
          tives, and encounters FENCE, the match fails.

            FENCE can be used to "lock in" an earlier success.  Suppose we
          want to succeed if the first 'A' or 'B' in the subject is fol-
          lowed by a plus sign.  In the following example, the 'A's match,
          we go through the FENCE, and find '+' does not match the next
          subject character, 'B'.  SNOBOL4 tries to backtrack, but is



       Tutorial                        - 81 -                 Advanced Topics





          stopped by the FENCE and fails:

               ?       '1AB+' ANY('AB') FENCE '+'
               Failure

            If FENCE were omitted, backtracking would match ANY to 'B', and
          then proceed forward again to match '+'.

            If FENCE appears as the first component of a pattern, SNOBOL4
          cannot back up through it to try another subject starting posi-
          tion.  This results in an anchored pattern, even if the &ANCHOR
          keyword specifies unanchored mode:

               ?       'ABC' FENCE 'B'
               Failure

          -----------------------------------------------------------------

          SUCCEED                          Match always

            This pattern matches the null string and always succeeds.  If
          the scanner is backtracking when it encounters SUCCEED, it re-
          verses and starts forward again.  Placing a pattern between
          SUCCEED and FAIL causes the pattern matcher to oscillate.


                                 9.5 OTHER FUNCTIONS

            I'd like to briefly point out a few more built-in functions.
          See Chapter 19 for a complete description of their use.

             APPLY            Allows an indirect call to a function through
                              a variable.

             CONVERT          Provides explicit conversion from one data
                              type to another.  Chapter 17, "Data Types and
                              Conversion," describes the conversions
                              possible.

             ENDFILE          Closes a file and detaches all variables
                              associated with it.

             ITEM             Allows an indirect reference to an array or
                              table.

             LPAD & RPAD      These are padding functions, which will pad a
                              string on its left or right side with blanks
                              or a given character.  Padding is provided to
                              a specified width, and is useful when produc-
                              ing columnar output.







       Tutorial                        - 82 -                 Advanced Topics





                              9.6 OTHER UNARY OPERATORS


          Operation:     Negation
          Symbol:        ~ (tilde)

            The negation operator, or tilde (~), inverts the success or
          failure result of its operand.  If the expression X succeeds,
          then ~X fails.  Conversely, if X fails, ~X succeeds and returns
          the null string.


          Operation:     Interrogation
          Symbol         ? (question mark)

            Unary question mark is called the interrogation operator, al-
          though "value annihilation" might be more descriptive.  If X is
          an expression which fails, ?X also fails.  However, if X suc-
          ceeds, ?X also succeeds, and returns the null string.  In other
          words, any value component of X is replaced by the null string.


                              9.7 RUN-TIME COMPILATION

            The two functions described below are among the most esoteric
          features, not just of SNOBOL4, but of all programming languages
          in existence.  While your program is executing, the entire
          SNOBOL4 compiler is just a function call away.

            A SNOBOL4 program is nothing more than a string of characters.
          The functions EVAL and CODE let you supply the compiler with
          character strings from within the program itself.


          9.7.1 The EVAL Function

            This function is used to evaluate an expression.  Its argument
          may take a number of forms:

            1. If the argument is an integer, or a number in string form,
               the number is returned as the function result:

               ?       OUTPUT = EVAL(19)
               19

            2. If the argument is an unevaluated expression, it is evalu-
               ated using current values for any variables it might con-
               tain.  EVAL returns the expression's value as its result:

               ?       E = *('N SQUARED IS ' N ** 2)
               ?       N = 15
               ?       OUTPUT = EVAL(E)
               N SQUARED IS 225




       Tutorial                        - 83 -                 Advanced Topics





               This is similar to our earlier use of unevaluated expres-
               sions with patterns.  In this case, however, the unevaluated
               expression operator (*) must be applied to the entire ex-
               pression to create an object with the EXPRESSION data type.

            3. If the argument is a string (other than a simple number),
               EVAL tries to compile it as a SNOBOL4 expression.  Only an
               expression is permitted---not an entire SNOBOL4 statement:

               ?       OUTPUT = EVAL('3 * N + 2')
               47

               If the string compiles without error, EVAL then evaluates
               the expression and returns the result.

            It is this last use of EVAL---to compile a string---which is
          the most interesting.  Here is a trivial program which behaves
          like a simple desk calculator.

               LOOP    OUTPUT = EVAL(INPUT)                 :S(LOOP)
               END

            You can easily try it by placing a semicolon after the GOTO to
          protect it from CODE.SNO's own machinations:

               ?LOOP   OUTPUT = EVAL(INPUT)                 :S(LOOP);
               4 * (5 - 2) / 2
               6
               N + 1
               16
               ^Z

            The program reads a line of input, compiles and evaluates it,
          and displays the result.  Each expression you enter must be well-
          formed according to SNOBOL4's syntax rules.  In particular, this
          means there must be blanks around the binary operators.

            The BNF program included with Vanilla SNOBOL4 demonstrates that
          EVAL's power is useful even if the input data does not conform to
          SNOBOL4 syntax.  It reads a definition of a grammar from a file,
          and converts it to SNOBOL4 patterns.

            EVAL fails if evaluation of the argument fails, or if the argu-
          ment contains a syntax error.  The SNOBOL4 keyword &ERRTEXT will
          contain a string describing the error.

            The expressions used with EVAL may return any SNOBOL4 data
          type, not just numbers.  For instance, the expression might con-
          struct a new pattern, and return it as the result:

                       ITEM = EVAL('SPAN("0123456789") | *LIST')

            Note that EVAL can only call the compiler with a string argu-
          ment.  If we used a pattern as the argument, we would produce an



       Tutorial                        - 84 -                 Advanced Topics





          execution error:

                       ITEM = EVAL(SPAN("0123456789") | *LIST)  (incorrect)


          9.6.2 The CODE Function

            CODE accepts a string argument containing one or more state-
          ments to be compiled.  Multiple statements are separated by
          semicolons (;).  Statements may be labeled, and can include all
          the usual components---subject, pattern, replacement, and GOTO.
          However, comment and continuation statements are not permitted.

            The CODE function compiles the statements, and returns a poin-
          ter to the resulting object code.  It fails if any statement
          contains an error, and places an error message in &ERRTEXT.

            There are two ways to execute the new object code.

            1. Transfer to a label which is defined in the new code:

               *  Compile a sample piece of code:
                       S = 'L OUTPUT = N; N = LT(N,10) N + 1  :S(L)F(DONE)'
                       CODE(S)
               *  Transfer to a label in it:
                                                              :(L)
               *  Come here when the new code transfers back.
               DONE     . . .

               Notice how we placed a GOTO from the new code back to label
               DONE in the main program.  If we had not done this, SNOBOL4
               would terminate when execution "fell out of the bottom" of
               the new code block.

            2. The pointer returned by the CODE function can be used in a
               "direct GOTO" to transfer to the first statement in the code
               block.  A direct GOTO is performed by enclosing the pointer
               in angular brackets in the GOTO field:

               *  Compile a sample piece of code:
                       S = 'L OUTPUT = N; N = LT(N,10) N + 1  :S(L)F(DONE)'
                       C = CODE(S)
               *  Transfer to the first statement in the block:
                                                              :<C>
               DONE     . . .

            Labels contained in the new program fragment override any
          labels of the same name in your main program.  This provides the
          ability to write self-modifying SNOBOL4 programs, and makes the
          division between "code" and "data" far less distinct than in
          other high-level languages.






       Tutorial                        - 85 -                 Advanced Topics





                                                                 Chapter 10


                                           DEBUGGING AND PROGRAM EFFICIENCY
          -----------------------------------------------------------------


                             10.1 DEBUGGING AND TRACING

            You are probably well aware of the diversity of potential er-
          rors when writing computer programs.  They range from simple
          typographical errors made while entering a program, to subtle de-
          sign problems which may only be revealed by unexpected input
          data.

            Debugging a SNOBOL4 program is not fundamentally different than
          debugging programs written in other languages.  However,
          SNOBOL4's syntactic flexibility and lack of type declarations for
          variables produce some unexpected problems.  By way of compensa-
          tion, an unusually powerful trace capability is provided.

            Of course, there may come a time when you can't explain your
          program's behavior, and decide "the system" is at fault.  No
          guarantee can ever be made that SNOBOL4 is completely free of
          errors.  However, its internal algorithms have been in use in
          other SNOBOL4 systems since 1967, and all known errors have been
          removed.  Often the problem is a misunderstanding of how a func-
          tion works with exceptional data, and a close reading of the ref-
          erence section clears the problem up.  In short, suspect the
          system last.


          10.1.1 Compilation Errors

            Compilation errors are the simplest to find; SNOBOL4 displays
          the erroneous line on your screen with its statement number, and
          places a marker below the point where the error was encountered.
          The source file name, line number, and column number of the error
          are displayed for use by your text editor.  Only the first error
          in a statement is identified, so you should also carefully check
          the remainder of the statement.  A typical line looks like this:

               32              ,OUTPUT = CNT+ 1
                                ^
               test.sno(57,10) : Compilation Error : Erroneous statement

            Here, the comma preceding the word OUTPUT is misplaced.  The
          message indicates that ",OUTPUT" is not a valid language element.

            Programs containing compilation errors can still be run, at
          least until a statement containing an error is encountered.  When
          that happens, SNOBOL4 will produce an execution error message,
          and stop.




       Tutorial                        - 86 -        Debugging and Efficiency





            A complete description of error messages is provided in Chapter
          20, "System Messages."


          10.1.2 Execution Errors

            Once a program compiles without error, testing can begin.  Two
          kinds of errors are possible: SNOBOL4 detectable errors, such an
          incorrect data type or calling an undefined function, and program
          logic errors that produce incorrect results.

            With the first type of error, you'll get a SNOBOL4 error mes-
          sage with statement and line numbers.  Inspecting the offending
          line will often reveal typing errors, such as a misspelled func-
          tion name, keyword, or label.  If the error is due to incorrect
          data in a variable---such as trying to perform arithmetic on a
          non-numeric string---you'll have to start debugging to discover
          how the incorrect data was created.  Placing output statements in
          your program, or using the trace techniques described below, will
          usually find such errors.

            Here are some common errors to look for first:

            1. Setting keywords &ANCHOR, &FULLSCAN, and &TRIM improperly.
               We may have written a program with anchored pattern matching
               in mind, but let an unanchored match slip in inadvertently.
               Forgetting to set &TRIM to 1 causes blanks to be appended to
               input lines, and they usually interfere with pattern match-
               ing and conversion of a string to an integer.

            2. Misspelled variable names.  Using PUTPUT instead of OUTPUT,
               as in:

                       PUTPUT = LINE1

               creates a new variable and assigns LINE1 to it.  Worse still
               is using a misspelled name as a value source, since it will
               return a null string value.

               The first type of error is relatively easy to find---produce
               an end-of-run dump by using the SNOBOL4 command line option
               /D.  You can study the list of variables for an unexpected
               name.  The second type of error is naturally much harder to
               find, because variables with null string values are omitted
               from the end-of-run dump.  In this case, you will have to
               study the source program closely for misspellings.

            3. Spurious spaces between a function name and its argument
               list.  A line like:

                       LINE = TRIM (INPUT)

               is not a call to the TRIM function.  The blank between TRIM
               and the left parenthesis is interpreted as concatenating



       Tutorial                        - 87 -        Debugging and Efficiency





               variable TRIM with the expression (INPUT).  TRIM used as a
               variable is likely to be the null string, so INPUT is
               returned unchanged.

            4. No blank space after a binary operator.   SNOBOL4 sees a
               unary operator instead, with completely unexpected results.
               For instance:

                       X = Y -Z

               concatenates Y with the expression -Z.

            5. Confusion occurring when a variable contains a number in
               string form.  When used as an argument to most functions,
               conversion from string to number is automatic, and proper
               execution results.  However, functions IDENT and DIFFER do
               not convert their arguments, and seemingly equal values are
               thought to be different.  For example, if we want to test an
               input line for the number 3, the statements:

                       N = INPUT
                       IDENT(N, 3)                               :S(OK)

               are not correct.  N contains a string, which is a different
               data type from the integer 3.  This could be corrected by
               using IDENT(+N, 3), or EQ(N, 3).  Once again, &TRIM should
               be 1, or the blanks appended to N will prevent its conver-
               sion to an integer.

            6. Omitting the assignment operator when we wish to remove the
               matching substring from a subject, resulting in a program
               which loops forever.  For example, our word-counting program
               replaced each word with the null string:

               NEXTWRD LINE WRDPAT =                        :F(READ)

               However, by omitting the equal sign we would repeatedly find
               the same first word in LINE:

               NEXTWRD LINE WRDPAT                          :F(READ)

            7. Unexpected statement failure, with no provision for detect-
               ing it in the GOTO field.  For example, the CONVERT function
               fails if the table being converted is empty:

                       RESULT = CONVERT(TALLY, "ARRAY")

               RESULT will not be set if CONVERT fails, and a subsequent
               array reference to RESULT would produce an execution error.

            8. Failure can be detected but misinterpreted when there are
               several causes for it in a statement.  This statement fails
               when an End-of-File is read, or if the input line does not
               contain any digits:



       Tutorial                        - 88 -        Debugging and Efficiency





                       INPUT SPAN('0123456789') . N         :F(EOF)

               In the latter case, if we want to generate an error message,
               the statement should be split in two:

                       N = INPUT                            :F(EOF)
                       N SPAN('0123456789') . N             :F(WARN)

            9. Using operators such as alternation (|) and conditional as-
               signment (.) for purposes other than pattern construction.
               Using them in the subject field will produce an 'Illegal
               data type' error message.  Using them in the replacement
               field produces a pattern, intended for subsequent use in a
               pattern match statement.  For example, this statement sets N
               to a pattern; it does not replace it with the words 'EVEN'
               or 'ODD', as was probably intended:

                       N = EQ(REMDR(N,2),0) 'EVEN' | 'ODD'

               We note in passing that SNOBOL4+, Catspaw's professional
               SNOBOL4 package, provides language extensions that allow
               just that:

                       N = (EQ(REMDR(N,2),0) 'EVEN', 'ODD')

            10 Forgetting that functions like TAB and BREAK bind subject
               characters.  This won't matter for simple pattern matching,
               but for matching with replacement, problems can appear.  For
               example, suppose we wanted to replace the 50th character in
               string S with '*'.  If we used:

                       S TAB(49) LEN(1) = '*'

               we would find the first 50 characters replaced by a single
               asterisk.  Instead, we should say:

                       S POS(49) LEN(1) = '*'

               or, even more efficiently:

                       S TAB(49) . FRONT LEN(1) = FRONT '*'

            11 Omitting the unevaluated expression operator when defining a
               pattern containing variable arguments.  For example, the
               pattern

                       NTH_CHAR = POS(*N - 1) LEN(1) . CHAR

               will copy the Nth subject character to variable CHAR.  The
               pattern adjusts automatically if N's value is subsequently
               changed.  Omitting the asterisk would capture the value of N
               at the time the pattern is defined (probably the null
               string).




       Tutorial                        - 89 -        Debugging and Efficiency





          10.1.3 Simple Debugging

            These simple methods should find a majority of your bugs:

            1. Set keyword &DUMP nonzero, or use command line option /D to
               get an end-of-run dump.  Examine it closely for reasonable
               values and variable names.  Dumps can also be produced at
               any time during execution by calling the built-in function
               DUMP.

            2. Use keyword &STLIMIT to end execution after a fixed number
               of statements.

            3. Use the keyboard Control-C key to interrupt a program which
               is looping endlessly, and record the statement number.

            4. Use the GOTO :F(ERROR) to detect unexpected failures and
               data errors.  Do not define the label ERROR---SNOBOL4 will
               display the statement number of the error if an attempt is
               made to transfer to label ERROR.

            5. Assign values to OUTPUT to monitor data values.  Use immedi-
               ate assignment and cursor assignment (to OUTPUT) to observe
               the operation of a pattern match.

            6. Produce end-of-run statistics with the command line option
               /S.  Are the number and kind of operations reasonable?

            7. Use the CODE.SNO program to setup simple test cases.  This
               is particularly useful when pattern-matching statements do
               not behave as expected.

            More subtle errors can be pinpointed using SNOBOL4's trace fa-
          cility, described below.


                               10.2 EXECUTION TRACING

            Tracing the flow of control and data in a program is usually
          the best way to find difficult problems.  SNOBOL4 allows tracing
          of data in variables and some keywords, transfers of control to
          specified labels, and function calls and returns.  Two keywords
          control tracing: &FTRACE and &TRACE.


          10.2.1 Function Tracing

            Keyword &FTRACE is set nonzero to produce a trace message each
          time a program-defined function is called or returns.  The trace
          message displays the statement number where the action occurred,
          the name of the function, and the values of its arguments.  Func-
          tion returns display the type of return and value, if any.  Each
          trace message decrements &FTRACE by one, and tracing ends when
          &FTRACE reaches zero.  A typical trace messages looks like this:



       Tutorial                        - 90 -        Debugging and Efficiency





               STATEMENT 39: LEVEL 0 CALL OF SHIFT('SKYBLUE',3),TIME = 140
               STATEMENT 12: LEVEL 1 RETURN OF SHIFT = 'BLUESKY',TIME = 141

            The level number is the overall function call depth.  The pro-
          gram execution time in tenths of a second is also provided.


          10.2.2 Selective Tracing

            Keyword &TRACE will also produce trace messages when it is set
          nonzero.  However, the TRACE function must be called to specify
          what is to be traced.  Tracing can be selectively ended by using
          the STOPTR function.  The TRACE function call takes the form:

                       TRACE(name, type, string, function)

            The name of the item being traced is specified using a string
          or the unary name operator.  Besides variables, it is also possi-
          ble to trace a particular element of an array or table:

                       TRACE('VAR1', ...
                       TRACE(.A<2,5>, ...
                       TRACE('SHIFT', ...

            "Type" is a string describing the kind of trace to be per-
          formed.  If omitted, a VALUE trace is assumed:

             'VALUE'          Trace whenever name has a value assigned to
                              it.  Assignment statements, as well as condi-
                              tional and immediate assignments within pat-
                              tern matching will all produce trace mes-
                              sages.

             'CALL'           Produce a trace whenever function name is
                              called.

             'RETURN'         Produce a trace whenever function name
                              returns.

             'FUNCTION'       Combine the previous two types: trace both
                              calls and returns of function name.

             'LABEL'          Produce a trace when a GOTO transfer to
                              statement name occurs.  Flowing sequentially
                              into the labeled statement does not produce a
                              trace.

             'KEYWORD'        Produce a trace when keyword name's value is
                              changed by the system.  The name is specified
                              without an ampersand.  Only keywords
                              &ERRTYPE, &FNCLEVEL, &STCOUNT, and &STFCOUNT
                              may be traced.





       Tutorial                        - 91 -        Debugging and Efficiency





            When the first argument is specified with the unary name opera-
          tor, the third argument, string, will be displayed to identify
          the item being traced:

                       TRACE(.T<"zip">, "VALUE", "Table entry 'zip'")

            The last argument, function, is usually omitted.  Its use is
          described in the next section.

            The form of trace message displayed for each type of trace is
          listed in Chapter 20, "System Messages."

            Each time a trace is performed, keyword &TRACE is decreased by
          one.  Tracing stops when it reaches zero.  Tracing of a particu-
          lar item can also be stopped by function STOPTR:

                       STOPTR(name, type)


          10.2.4 Program Trace Functions

            Normally, each trace action displays a descriptive message,
          such as:

               STATEMENT 371: SENTENCE = 'Ed ran to town',TIME = 810

            Instead, we can instruct SNOBOL4 to call our own program-
          defined function.  This allows us to perform whatever trace
          actions we wish.  We define the trace function in the normal way,
          using DEFINE, and then specify its name as the fourth argument of
          TRACE.  For example, if we want function TRFUN called whenever
          variable COUNT is altered, we would say:

                       &TRACE = 10000
                       TRACE('COUNT', 'VALUE', , 'TRFUN')
                       DEFINE('TRFUN(NAME,ID)')             :(TRFUN_END)
                        . . .

            TRFUN will be called with the name of the item being traced,
          'COUNT', as its first argument.  If a third argument was provided
          with TRACE, it too is passed to your trace function, as ID.
          (Here the argument was omitted.)  To use trace functions effec-
          tively, we must pause to describe a few more SNOBOL4 keywords:

             &LASTNO          The statement number of the previous SNOBOL4
                              statement executed.

             &STCOUNT         The total number of statements executed.
                              Incremented by one as each statement begins
                              execution.

             &ERRTYPE         Error message number of the last execution
                              error.




       Tutorial                        - 92 -        Debugging and Efficiency





             &ERRLIMIT        Number of nonfatal execution errors allowed
                              before SNOBOL4 will terminate.

            The first three keywords are continuously updated by SNOBOL4 as
          a program is executed.

            Now, let's consider debugging a program where variable COUNT is
          inexplicably being set to a negative number.  Continuing with the
          previous example, the function body would look like this:

                       &TRACE = 10000
                       TRACE('COUNT', 'VALUE', , 'TRFUN')
                       DEFINE('TRFUN(NAME,ID)TEMP')         :(TRFUN_END)
               
               TRFUN   TEMP = &LASTNO
                       GE($NAME, 0)                         :S(RETURN)
                       OUTPUT = 'COUNT negative in statement ' TEMP  :(END)
               TRFUN_END

            The first statement of the function captures the number of the
          last statement executed---the statement that triggered the trace.
          We then check COUNT, and return if it is satisfactory.  If it is
          negative, we print an error message and stop the program.

            When a trace function is invoked, keywords &TRACE and &FTRACE
          are temporarily set to zero.  Their values are restored when the
          trace function returns.  There is no limit to the number of func-
          tions or items which may be traced.

            Tracing keyword &STCOUNT will call your trace function before
          every program statement is executed.

            Program CODE.SNO traces keyword &ERRTYPE to trap nonfatal exe-
          cution errors from your sample statements, and produce an error
          message.  Keyword &ERRLIMIT must be set nonzero to prevent
          SNOBOL4 from terminating when an error occurs.


                               10.3 PROGRAM EFFICIENCY

            To a greater extent than other languages, SNOBOL4 programs are
          sensitive to programming methods.  Often, there are many differ-
          ent ways to formulate a pattern match, and some will require many
          more match attempts than others.

            As you work with SNOBOL4, you will develop an intuitive feel
          for the operation of the pattern matcher, and will write more
          efficient patterns.  I can, however, start you off with some gen-
          eral rules:

            1. Try to use anchored, quickscan, and trim modes when possi-
               ble.  If operating unanchored, artificially anchor whenever
               possible by using POS(0) or FENCE as the first subpattern.




       Tutorial                        - 93 -        Debugging and Efficiency





            2. Try to use BREAK and SPAN instead of ARB.

            3. Use ANY instead of an explicit list of one-character strings
               and the alternation operator.

            4. LEN, TAB and RTAB are faster than POS and RPOS.  The former
               "step over" subject characters in one operation; the latter
               continually fail until the subject cursor is positioned cor-
               rectly.  But be careful of misusing them with replacement
               and replacing more than you expected.

            5. Use conditional assignment instead of immediate assignment
               in pattern matching.

            6. Use IDENT and DIFFER to compare strings for equality,
               instead of pattern matching.  Since each unique string is
               stored only once in SNOBOL4, these functions merely compare
               one-word pointers, regardless of string length.  By con-
               trast, pattern matching and functions such as LGT must
               perform character by character comparisons.

            7. Avoid ARBNO and recursion if possible.

            8. Pattern construction is time-consuming.  Preconstruct pat-
               terns and store them in variables whenever possible.

            9. Keep strings modest in length.  Although SNOBOL4 allows
               strings to be thousands of characters long, operating upon
               them is very time-consuming.  They use large amounts of
               memory, and force SNOBOL4 to frequently rearrange storage.

            10 Use functions to modularize a program and make it easier to
               understand and maintain.

            11 Avoid algorithms that make a linear search of an array or
               list.  The algorithms can usually be rewritten using tables
               and indirect references for associative programming.

            Efficiency should not be measured purely in terms of program
          execution time.  With the relatively low cost of microcomputers,
          the larger picture of time spent designing, coding, and debugging
          a program also must be considered.  A direct approach, empha-
          sizing simplicity, robustness, and ease of understanding usually
          outweighs the advantages of tricky algorithms and shortcut tech-
          niques.  (But we admit that tricky pattern matching is fun!)












       Tutorial                        - 94 -        Debugging and Efficiency





                                                                 Chapter 11


                                                         CONCLUDING REMARKS
          -----------------------------------------------------------------

            For much of this tutorial we've been concerned with the de-
          tailed mechanics of pattern matching---the functions, primitive
          patterns, and heuristics of applying a pattern to a character
          string.  SNOBOL4 provides so many primitive functions and opera-
          tions that it's easy to get lost in the forest.  Let's step back
          and consider SNOBOL4's larger significance.

            It would be a mistake to think of SNOBOL4 only as a text pro-
          cessing language.  For example, programmers in the artificial
          intelligence field think in terms of lists, and have used the
          LISP language for some time.  As Shafto demonstrates, SNOBOL4 can
          be made to emulate LISP, and go well beyond it, using pattern
          matching, backtracking, and associative programming (see file
          SNOBOL4.DOC for information on Shafto's report on AI SNOBOL4
          programming.)

            SNOBOL4's pattern matching provides a very powerful and com-
          pletely general recognition system, in which character strings
          happen to be the medium of expression.  Other recognition pro-
          blems can be solved by mapping the object to be examined into a
          subject string, and the recognition criteria into SNOBOL4
          patterns.

            In the past, use of SNOBOL4 has been hindered by the high cost
          and inconvenience of running it on mainframe computers.  Now it's
          on your desk top, with computer time essentially free.

            What new insights can SNOBOL4 bring to your problems?  Can you
          find other general applications for SNOBOL4's unique abilities?

            The future of the language is in your hands.




















       Tutorial                        - 95 -              Concluding Remarks





                                                                 Chapter 12


                                           REFERENCE MANUAL -- INTRODUCTION
          -----------------------------------------------------------------

            The reference section describes the SNOBOL4 system.  It will
          tell you how to create and run SNOBOL4 programs, and catalogs all
          the standard language features.  The tutorial section can be con-
          sulted for illustrative uses of various functions and operators.

            SNOBOL4 is a full implementation of the powerful development
          language SNOBOL4 for the IBM PC and the entire 8086/286/386 fam-
          ily of computers.  It has all the features of mainframe SNOBOL4,
          plus numerous useful extensions.  Compatibility with mainframe
          SNOBOL4 is achieved by basing this product on the Macro Implemen-
          tation used on such mainframes as the IBM 370 and the CDC 7600.
          Thus, it incorporates a thoroughly tested implementation in its
          entirety.  All SNOBOL4 string and pattern matching facilities
          available in the mainframe environment are now available to the
          personal computer user.

            The SNOBOL4 program contains both a compiler and interpreter.
          They are inseparable, and share many common routines.  Your
          source program is compiled into a compact internal notation,
          which is interpreted during execution.  More information on the
          internal code may be found in Griswold's "The Macro Implementa-
          tion of SNOBOL4;" see file SNOBOL4.DOC for ordering information.


                              12.1 LANGUAGE BACKGROUND

            In 1962, several researchers at Bell Telephone Laboratories
          (BTL) were applying computers to problems such as factoring mul-
          tivariate polynomials and symbolic integration.  Available tools
          were the Symbolic Communication Language (SCL), an internal BTL
          product for processing symbolic expressions, and COMIT, designed
          for natural-language analysis.  Both proved inadequate, and
          frustration with them led the researchers to attempt the design
          of a new language.

            The original SNOBOL was developed by David J. Farber, Ralph E.
          Griswold, and Ivan P. Polonsky, and was first implemented on an
          IBM 7090 computer in 1963.  The name, SNOBOL, came after the im-
          plementation, and ostensibly stands for StriNg Oriented symBOlic
          Language.

            It was soon discovered that SNOBOL was applicable to a much
          wider range of problems.  In fact, the language proved more in-
          teresting than the problems it was intended to solve.  As more
          people used it, new features such as recursive functions were
          added, and its generality grew.  By 1964, it had become SNOBOL3,
          and was available on such machines as the IBM 7094, CDC 3600, SDS
          930, Burroughs 5500, and the RCA 601.  Because these implementa-



       Reference                       - 96 -                    Introduction





          tions were all written from scratch, each machine introduced its
          own dialect of the language.

            SNOBOL3 had only one data type, the string.  The desire for ad-
          ditional data types, more complex pattern matching, and other
          features led to a major redesign of the language in 1966, by
          Ralph Griswold, Jim Poage, and Ivan Polonsky.  The new lan-
          guage---SNOBOL4---was also designed to be portable to other
          machines.  Most of SNOBOL4 was completed by 1967, although some
          features, such as operator redefinition, did not appear until
          1969.  Portability was achieved by writing the system in a macro
          assembly language for an abstract machine, hence the name "Macro
          Implementation of SNOBOL4."  By 1970 it was available on nine
          different types of mainframes.  Currently, it is available on
          most large- and medium-scale computers.

            The SNOBOL4 language evolved on computers whose primary input/
          output devices were the card reader, card punch, and line
          printer.  The current breed of microcomputers are interactive,
          rather than batch-oriented.  Thus, SNOBOL4 contains slight alter-
          ations of the language to conform to the personal computer envi-
          ronment.  For example, the preassigned output keyword PUNCH has
          been replaced by SCREEN.  Experienced SNOBOL4 programmers will
          find little incompatibility with familiar implementations.  Most
          existing SNOBOL4 programs should operate correctly using SNOBOL4
          with little or no change.































       Reference                       - 97 -                    Introduction





                                                                 Chapter 13


                                                  RUNNING A SNOBOL4 PROGRAM
          -----------------------------------------------------------------


                           13.1 BASIC COMMAND LINE FORMAT

            The format for the command line is:

               SNOBOL4 file options ;Comments

            Options are specified by a slash (/) or minus sign (-), and one
          or more option letters.  When the option requires a file name, an
          equal sign may be used between the option letter and file name
          for readability.

             File             The source file contains your SNOBOL4 pro-
                              gram.  If no file is specified, CON: is as-
                              sumed, and programs may be entered directly
                              from the keyboard.  Disk files will have ex-
                              tension .SNO supplied if none is specified.

            The source and input files may be assigned to any disk file or
          valid input device.  The listing, output, and error message files
          may be assigned to any disk file or valid output device.  If the
          output disk file does not exist, it will be created.

             /I=file          The input file is associated with the vari-
                              able INPUT when execution begins, as I/O unit
                              5.  The default is CON:, your keyboard.  Disk
                              files will have extension .IN supplied if
                              none is specified.

             /L=file          The listing file receives a listing of your
                              program, with assigned statement numbers.
                              Default is NUL:, that is, the listing is dis-
                              carded.  If /L appears without a file name,
                              the source program file name will be used,
                              with the extension changed to .LST.

             /O=file          The output file is associated with the vari-
                              able OUTPUT when execution begins.  This will
                              be I/O unit 6.  The default is CON:, which is
                              usually your computer's display screen.  Disk
                              files will have extension .OUT supplied if
                              none is specified.  Execution dumps and trac-
                              ings are sent to I/O unit 6.








       Reference                       - 98 -       Running a SNOBOL4 Program





             /E=file          A list of compilation and runtime error mes-
                              sages is written to this file.  Default is
                              CON:, that is, error messages are displayed
                              on the screen.  If /E appears without a file
                              name, the source program file name will be
                              used, with the extension changed to .ERR.

            In addition to the /I and /O options, the INPUT and OUTPUT
          variables may also be assigned to files by using the MS-DOS redi-
          rection operators < and > on the command line.

            Other I/O files may be specified explicitly within the INPUT
          and OUTPUT functions, or on the command line with a unit number:

             /n=file          The specified file becomes associated with
                              unit number n.  N must be in integer between
                              1 and 16.  If your program calls the INPUT or
                              OUTPUT function without a file name, the file
                              specified here will be used.  This command
                              line option merely makes an association; the
                              file is not opened or created until the INPUT
                              or OUTPUT function is called.

            File names may be a disk file, or any DOS device, such as NUL:,
          CON:, LPT2:, etc.

            The remaining option switches alter SNOBOL4's behavior:

             /B               Termination messages and statistics are nor-
                              mally displayed via I/O unit 7 (SCREEN).  The
                              /B (batch) option instead directs them to I/O
                              unit 6 (OUTPUT).

             /C               SNOBOL4 defaults to case-folding, making
                              lower and upper case alphabetics equivalent
                              for names and labels.  Specifying this option
                              inhibits case-folding:  upper and lower case
                              names are unique and distinct.

             /D               Sets the &DUMP keyword to 1.  This is useful
                              when you decide you want an end-of-run vari-
                              able dump, and don't want to edit the source
                              file.

             /H               Displays summary of options and Vanilla
                              SNOBOL4 license information.

             /NX              No execution after compilation.

             /NP              Suppress column position information in error
                              messages.

             /P               Displays additional product information.




       Reference                       - 99 -       Running a SNOBOL4 Program





             /S               Provide statistics upon termination.

            Vanilla SNOBOL4 works very nicely with text editors that allow
          a program to be compiled from within the editor.  If a compila-
          tion or runtime error occurs, you are returned to your editor
          with the cursor positioned on the troublesome statement.  To use
          with your editor, you will need to use the command line option
          "/BE-".  This writes errors messages to standard output, where
          they can be captured by your text editor.


                         13.2 PROVIDING YOUR OWN PARAMETERS

            The keyword &PARM contains the command line string.  It begins
          with the blank following the word SNOBOL4, and contains all char-
          acters up to the terminating carriage return.  Since SNOBOL4's
          command processor ignores all characters after a semicolon, com-
          ments placed there can easily communicate additional instructions
          to your program.  Break them out with the statement:

                       &PARM ';' REM . INSTRUCTIONS


                             13.3 COMMAND LINE EXAMPLES

            The command line:

               SNOBOL4 PROG

          will compile and run a source program from file PROG.SNO, discard
          the listing, and run it with keyboard input and screen output.
          The command line:

               SNOBOL4 CONVERT /I=DATA /O=RESULT /2=STYLE.DAT ;DRAFT

          will run a program that presumably transforms input file DATA.IN
          to output file RESULT.OUT according to program option 'DRAFT'.
          I/O unit number 2 is associated with the file STYLE.DAT.  The
          program can use the variable SCREEN to post error and status mes-
          sages to the user, regardless of the reassignment of the input
          and output files.

               SNOBOL4 SOURCE /I=SOURCE.SNO /L=OUTPUT /O=OUTPUT.LST /BCS

          sets up a "conventional" batch job, with source program and input
          data on file SOURCE.SNO (following the END statement), listing
          and program output to OUTPUT.LST, no case-folding, and end-of-run
          statistics.









       Reference                      - 100 -       Running a SNOBOL4 Program





                                                                 Chapter 14


                                                                 STATEMENTS
          -----------------------------------------------------------------

            Each line of input to SNOBOL4 consists of a sequence of ASCII
          characters, terminated by a carriage return.

            Comment and control statements are always one line long.  How-
          ever, a program statement may occupy several lines if necessary.
          A continuation mark (plus sign or period) is placed in the first
          column of the additional lines.


                               14.1 COMMENT STATEMENTS

            An asterisk (*) in character position one denotes a comment
          card.  All text through the end-of-line is copied to the listing
          file, but is otherwise ignored by SNOBOL4.


                               14.2 CONTROL STATEMENTS

            Control statements provide instructions to the SNOBOL4 com-
          piler.  They begin with a minus (-) in character position one.
          Controls may be specified in upper- or lower-case, regardless of
          the current state of case-folding.  Unrecognized controls are
          ignored.

             -CASE n          Fold lower-case names to upper-case if n is
                              nonzero.  Treat upper- and lower-case names
                              as distinct if n is zero or absent.

             -EJECT           Start a new page on the listing file.

             -LIST            Equivalent to -LIST LEFT.

             -LIST LEFT       Turn on list output, produce statement num-
                              bers at left end of line.

             -LIST RIGHT      Turn on list output, produce statement num-
                              bers at right end of line.

             -UNLIST          Turn off list output.  Errors are not shown
                              on the screen.

            SNOBOL4 defaults to -LIST LEFT and -CASE 1.


                               14.3 PROGRAM STATEMENTS

            If a line is not a control or comment statement, it is consid-
          ered SNOBOL4 program text.  A SNOBOL4 statement may have up to



       Reference                      - 101 -                      Statements





          five components.  The general form of a statement is:

               LABEL  SUBJECT PATTERN = REPLACEMENT              :GOTO

            Statement elements are separated by blank or tab.

            Ignoring the LABEL and GOTO fields for a moment, the remaining
          elements may appear in various combinations to create different
          types of statements:


          Evaluate expression: SUBJECT

            The expression comprising the subject is evaluated.  It may in-
          voke primitive and program-defined functions.


          Assignment statement: SUBJECT = REPLACEMENT

            The value on the right is assigned to the variable on the left.
          If failure occurs when evaluating the subject or replacement com-
          ponents, the assignment does not occur.


          Pattern match: SUBJECT PATTERN

            The subject and pattern expressions are evaluated, and the
          specified pattern is applied to the subject string, producing
          success or failure.


          Pattern match with replacement: SUBJECT PATTERN = REPLACEMENT

            If the pattern match succeeds, the replacement expression is
          evaluated and replaces the portion of the subject matched.  Only
          the matched portion is replaced; characters adjacent to the
          matching substring are not disturbed.

            If the equal sign (=) is present but the replacement field is
          absent, the null string is assumed as the value of the replace-
          ment field.

            The GOTO field provides two-way branching to test the success
          or failure of the preceding statement elements.


          14.3.1 Label Field

            If a label is present, it must begin with the first character
          of the line.  Labels provide a name for the statement, and serve
          as the target for transfer of control from the GOTO field of any
          statement.  Labels must begin with a letter or digit, optionally
          followed by an arbitrary string of characters.  The label field
          is terminated by the character blank, tab, or semicolon.  If the



       Reference                      - 102 -                      Statements





          first character of a line is blank or tab, the label field is
          absent.

            If case-folding is in effect, lower-case letters are converted
          to upper-case before defining the label.


          14.3.2 Subject Field

            The subject field specifies the string which will be the sub-
          ject of pattern matching.  It also specifies the left side of a
          simple assignment statement if pattern matching is absent.

            In an assignment statement, the subject must be a variable
          name, an unprotected keyword, or a field-reference function from
          a program-defined data type.  If a string is produced by evaluat-
          ing an expression, the indirect ($) operator must be used to
          reference the underlying variable.

            If the subject appears in pattern matching without replacement,
          the subject must evaluate to a string.  The string is scanned
          left to right during the pattern match.  If the subject evaluates
          to an integer, it is automatically converted to a string.  If re-
          placement is present, the same subject restrictions of assignment
          statements apply.  Thus, a literal string is a valid subject only
          if replacement is absent.

            If the expression comprising the subject contains the concate-
          nation operator, the subject must be surrounded by parenthesis.
          This allows SNOBOL4 to distinguish concatenation blanks within
          the subject from the blank between subject and pattern.


          14.3.3 Pattern Field

            The pattern may be a simple string, or a complex expression in-
          volving primitive pattern functions.  The pattern specifies one
          or more strings which are systematically searched for in the sub-
          ject.  The pattern match succeeds if a match is found, and fails
          otherwise.  The &FULLSCAN keyword determines whether the search
          is exhaustive, or if heuristics will be applied to prevent futile
          match attempts.

            The pattern may assign various matching components to variables
          with the binary assignment operators dot and dollar sign (., $).


          14.3.4 Replacement Field

            In an assignment statement, there are very few restrictions on
          the replacement field.  If the subject is an unprotected keyword,
          the replacement field must evaluate to an integer value.  If the
          subject is a variable, the replacement field is assigned directly
          to it, without type conversion.



       Reference                      - 103 -                      Statements





            If there is pattern matching on the left side of the statement,
          the replacement field must evaluate to a string, so that it may
          be inserted into the matched portion of the subject string.

            Replacement occurs only if evaluation of the subject, pattern,
          and replacement succeed.  Primitive functions which return suc-
          cess or failure may be used in the replacement field as predicate
          functions.  Since they return the null string, they do not alter
          the replacement value.  However, their failure can prevent re-
          placement from occurring, and can be tested in the GOTO field.


          14.3.5 GOTO Field

            Statement execution normally proceeds sequentially from one
          statement to the next.  The GOTO field allows this flow to be al-
          tered by directing the SNOBOL4 system to continue execution else-
          where.  The GOTO field is set off from the preceding statement
          elements by blank or tab, and colon (:).  It may assume three
          forms:  unconditional, conditional, and direct.

            The "unconditional GOTO" causes control to be transferred to
          the specified labeled statement.  The label is enclosed in paren-
          thesis, and may be a name, or the result of evaluating an expres-
          sion and applying the indirect operator ($).  Transfer is made to
          the labeled statement regardless of the success or failure out-
          come of the earlier parts of the statement.

            The "conditional GOTO" similarly specifies control transfer to
          a labeled statement, but it depends on the success or failure of
          the statement.  The letter S precedes the parenthesized label
          where control goes next if the statement succeeds.  The letter F
          specifies the branch to be taken if the statement fails.  For
          example:

             :S(LOOP)         Branches to label LOOP if the statement suc-
                              ceeds.

             :F(ERROR)        Branches to label ERROR if the statement
                              fails.

             :S(OK) F(NOGO)   Branches to label OK on success, to NOGO on
                              failure.

             :(AGAIN)         Unconditionally transfers control to label
                              AGAIN.

             :($('VAR' N))    Branches to the label obtained by concatenat-
                              ing the string 'VAR' with the value of vari-
                              able N.

            The "direct GOTO" is used to branch to a block of code compiled
          with the CODE function.  If the code contains labels, a regular
          GOTO could branch to the label and begin execution in the code



       Reference                      - 104 -                      Statements





          block.  The direct GOTO will branch to the start of the code
          block, labeled or not.  A direct GOTO is specified by placing in
          angle brackets the name of the variable which points to the code
          block.

            Direct GOTOs may be made conditional by preceding them with S
          or F.  They may also appear with regular GOTOs:

                       VAR = CODE(string)         :S<VAR> F(COMPILE_ERROR)

            The lower-case letters "s" and "f" may be used interchangeably
          with "S" and "F", regardless of case-folding.

            The GOTO field may appear on a line without any subject, pat-
          tern, and replacement.  The absent SNOBOL4 statement is assumed
          to have succeeded.


                            14.4 CONTINUATION STATEMENTS

            A SNOBOL4 statement may be divided across several lines by
          placing a plus (+) or period (.) in character position one of the
          successive lines.  There is no limit to the number of continua-
          tion statements allowed.  The statement must be divided at a
          point where a blank or tab could appear as an operator or separa-
          tor; it cannot be split in the middle of a name or quoted string.

            Very long strings may be entered on multiple lines, using the
          implicit blank between lines as a concatenation operator:

                       LONG_STRING = "This is an example of a very long "
               +   "string that wends its way across multiple continua"
               +   "tion statements.  There is an implicit blank at the "
               +   "beginning of each line that provides the concatenation"
               +   " operator between segments."


                              14.5 MULTIPLE STATEMENTS

            The semicolon character may be used to place several statements
          on one line.  Each semicolon terminates the current statement and
          behaves like a new "column one" for the statement which follows.
          Only program statements are permitted after the semicolon; con-
          trol and continuation statements are not allowed.  Here are some
          examples:

                       I = 1;     J = 2;      S PAT = 'HENRI'       :S(YES)
                       I = 1;OUT  OUTPUT = A<I>  :F(END);  I = I + 1 :(OUT)

            Because of its poor readability, placing labels in the middle
          of a statement is strongly discouraged.

            As a language extension, Vanilla SNOBOL4 permits a comment
          statement after the semicolon.  This provides a simple device for



       Reference                      - 105 -                      Statements





          end-of-line comments:

               PARA    NEXT = GETNEXT() :F(FRETURN) ;* Return if EOF
                       IDENT(NEXT)      :S(RETURN)  ;* Return on empty line
                       PARA = PARA NEXT :(PARA)     ;* Splice line


                               14.6 THE END STATEMENT

            The last statement in a program must be an END statement.  The
          word END appears in the label field, beginning in column one.
          Normally, it is the only word on the line:

                       . . .
                       OUTPUT = 'All done'
               END

            After reading the END statement, compilation ends, and execu-
          tion begins immediately with the very first program statement.
          When the program is done, it should flow into the END statement,
          or use a GOTO to transfer to it.

            Occasionally, we would like to begin execution at other than
          the first statement.  If we place a statement label in the sub-
          ject field of the END statement, execution will begin there.  For
          example, this statement will cause execution to begin at the
          statement labeled START:

               END     START




























       Reference                      - 106 -                      Statements





                                                                 Chapter 15


                                                                  OPERATORS
          -----------------------------------------------------------------

            Following are lists of all the unary and binary operators in
          SNOBOL4.  Unused operators may be attached to program-defined
          functions using the OPSYN function.  Unary operators have equal
          precedence among themselves, and higher precedence than binary
          operators.  Operators of higher precedence are performed first,
          unless reordered by parentheses.  Where several instances of
          operators with the same priority appear, associativity specifies
          which one is performed first.


                                15.1 UNARY OPERATORS

            All unary operators are left-associative: if several appear to-
          gether, they are performed left-to-right.

          Graphic        Name                  Definition
          =======  =================    ==============================
            +      plus                 arithmetic positive
            -      minus                arithmetic negative
            .      period               name of object (address)
            $      dollar sign          indirect reference through object
            *      asterisk             unevaluated expression
            &      ampersand            keyword
            ~      tilde                negation of success/failure
            ?      question mark        interrogation
            @      at sign              cursor position assignment
            /      slash                <none>
            ^, !   caret, exclamation   <none>
            %      percent              <none>
            #      pound sign           <none>
            |      vertical bar         <none>



          15.5.1 Indirect Reference and Case-Folding

            The indirect reference operator ($) converts a string to a
          variable name.  When case-folding is in effect, the string char-
          acters are treated as upper-case letters when producing the name.
          The string itself is not modified.  Thus,

               $('abc')

          references variable ABC when case-folding, and variable abc when
          not.






       Reference                      - 107 -                       Operators





                                15.2 BINARY OPERATORS

          Graphic    Name            Definition       Precedence Associates
          ======= =================  ===============  ========== ==========
            ~     tilde              <none>                 12      right
            ?     question mark      <none>                 12      left
            .     period             conditional assignment 11      left
            $     dollar sign        immediate assignment   13      left
            ^, !  caret, exclamation exponentiation         12      right
            **    double asterisk    exponentiation         12      right
            %     percent            <none>                 11      left
            *     asterisk           multiplication         10      left
            /     slash              division                9      left
            #     pound sign         <none>                  8      left
            +     plus               addition                7      left
            -     minus              subtraction             7      left
            @     at sign            <none>                  6      left
            blank blank              concatenation           5      left
            tab   tab                concatenation           5      left
            |     vertical bar       alternation             4      left
            &     ampersand          <none>                  3      left
            =     equal sign         assignment              1      right



































       Reference                      - 108 -                       Operators





                                                                 Chapter 16


                                                                   KEYWORDS
          -----------------------------------------------------------------

            Keywords allow a program to communicate with SNOBOL4.  Their
          names are set apart from other variables by the unary operator
          ampersand (&).  Protected keywords cannot be changed by a pro-
          gram, while unprotected keywords can.

            Several protected keywords can be traced using the TRACE func-
          tion: &ERRTYPE, &FNCLEVEL, &STCOUNT, and &STFCOUNT.  Tracing oc-
          curs each time SNOBOL4 alters their value.  For example, tracing
          keyword &STCOUNT produces a trace after every SNOBOL4 statement
          is executed.


                               16.1 PROTECTED KEYWORDS

            Among these keywords are several which serve as read-only
          repositories of fundamental system patterns and values, such as
          &ARB.  The nonkeyword form (ARB) may be changed by a program, and
          later restored to its original value by assigning it the corre-
          sponding keyword.

             &ABORT           The primitive pattern ABORT.

             &ALPHABET        String of 256 ASCII character values in as-
                              cending order.

             &ARB             The primitive pattern ARB.

             &BAL             The primitive pattern BAL.

             &ERRTEXT         String containing most recent system gener-
                              ated error text.

             &ERRTYPE         Integer code of the last execution error to
                              occur.  This keyword may be traced with func-
                              tion TRACE().

             &FAIL            The primitive pattern FAIL.

             &FENCE           The primitive pattern FENCE.

             &FNCLEVEL        Integer depth of program-defined function
                              calls.  It is initially zero, and incremented
                              by one for each function call, and decre-
                              mented for each function return.  This key-
                              word may be traced.

             &LASTNO          Integer statement number of the previous
                              statement executed.



       Reference                      - 109 -                        Keywords





             &LCASE           The 26 lower-case alphabetic letters.

             &PARM            The command string used to invoke SNOBOL4.
                              Begins with the blank following the word
                              SNOBOL4.

             &REM             The primitive pattern REM.

             &RTNTYPE         Contains a string describing the type of re-
                              turn most recently made by a program-defined
                              function, either 'RETURN', 'FRETURN', or
                              'NRETURN'.

             &STCOUNT         Integer count of the number of statements
                              executed.  This keyword may be traced.  Since
                              integers are 16-bit quantities, executing
                              more than 32,767 statements will cause this
                              keyword to overflow.  No harm results, and
                              the keyword may still be traced, but its
                              value will be a large negative number.

             &STFCOUNT        Integer count of the number of statements
                              which failed.  This keyword may be traced.
                              The same overflow problem discussed for
                              &STCOUNT occurs with this keyword.

             &STNO            Integer statement number of the current
                              statement being executed.

             &SUCCEED         The primitive pattern SUCCEED.

             &UCASE           The 26 upper-case alphabetic letters.


                              16.2 UNPROTECTED KEYWORDS

            These keywords may be set to integer values to modify SNOBOL4's
          behavior.

             &ANCHOR          Nonzero for anchored pattern match.  Ini-
                              tially 0, unanchored.

             &CASE            Zero to prevent case-folding during compila-
                              tion with the functions CODE and EVAL.  Ini-
                              tially 1, causing case-folding to occur.

             &CODE            The end-of-job code is an integer value in
                              the range 0 to 255 returned to the operating
                              system.  It can be tested with the DOS Batch
                              condition ERRORLEVEL.  Initially 0.







       Reference                      - 110 -                        Keywords





             &DUMP            Nonzero to list unprotected keywords and
                              variables with nonnull values at program ter-
                              mination.  A positive value causes the list
                              to be sorted; negative values leave them un-
                              sorted.  Initially 0.  The dump is produced
                              to I/O unit 6 (OUTPUT).

             &ERRLIMIT        Determines the number of conditionally fatal
                              execution errors permitted before terminating
                              a program.  The Execution Error Messages sec-
                              tion of Chapter 20, "System Messages," de-
                              scribes the errors which are conditionally
                              fatal.  Initially 0, causing SNOBOL4 to stop
                              if any error occurs.

             &FTRACE          Nonzero value causes each call and return of
                              a program-defined function to be listed.
                              Decremented for each trace.  Initially 0.

             &FULLSCAN        Nonzero to disable pattern matching heuris-
                              tics.  Initially 0, the quickscan mode of
                              pattern matching.

             &INPUT           Zero to disable all input.  When disabled,
                              using variable INPUT (or other input-associ-
                              ated variables) does not read data from the
                              file.  Initially 1, input is enabled.

             &MAXLNGTH        Maximum string length.  Initially 5000, maxi-
                              mum value is 32767.  Memory limitations in
                              Vanilla SNOBOL4 will limit actual strings to
                              a smaller size.

             &OUTPUT          Zero to disable all output.  When disabled,
                              assigning data to OUTPUT or SCREEN (or other
                              output-associated variables) does not write
                              data to the file.  Initially 1, output is en-
                              abled.

             &STLIMIT         The number of statements allowed to execute.
                              If positive, it is decremented for each
                              statement executed; execution terminates when
                              it reaches 0.  If negative, there is no
                              limit, and it is not decremented.  Initially
                              -1.

             &TRACE           Nonzero to permit tracing with the TRACE
                              function.  Initially 0, it is decremented for
                              each trace performed.

             &TRIM            Nonzero to strip trailing blanks from lines
                              read from ASCII files.  This is faster than
                              using the TRIM function.  It does not strip
                              trailing tab characters.  Initially 0: blanks



       Reference                      - 111 -                        Keywords





                              are not removed and short records are blank
                              padded to the file's standard record length.


                                 16.3 SPECIAL NAMES

            The following names have special meaning to SNOBOL4.  If case-
          folding is in effect, they may appear with any combination of
          upper- and lower-case letters.

             END              This is a special label which denotes the
                              last statement of the user's program.  An op-
                              tional label may follow the word END (in the
                              subject field) to denote where program execu-
                              tion is to begin.  A program should terminate
                              execution by transferring to label END.

             FRETURN          Transfer to this label to return from a
                              program-defined function with a failure indi-
                              cation.

             INPUT            Variable associated with input from unit
                              number 5.

             NRETURN          Transfer to this label to return successfully
                              from a program-defined function by name,
                              rather than by value.  The function name
                              should be assigned a name result (usually
                              with the period (.) unary operator).  This
                              permits a function call to be the object of
                              an assignment operation.

             OUTPUT           Variable associated with output to unit
                              number 6.

             RETURN           Transfer to this label to return from a
                              program-defined function with a success indi-
                              cation.  A value may be returned as the func-
                              tion's result; simply assign it to a variable
                              with the same name as the function before
                              transferring to RETURN.
















       Reference                      - 112 -                        Keywords





                                                                 Chapter 17


                                                  DATA TYPES AND CONVERSION
          -----------------------------------------------------------------

            Most other programming languages require the user to explicitly
          declare the type of data to be stored in a variable.  In SNOBOL4,
          any variable may contain any data type.  Furthermore, the vari-
          able's type may be freely altered during program execution.
          SNOBOL4 remembers what kind of data is in each variable.


                                17.1 DATA TYPE NAMES

            The formal name of a data type is specified by an upper-case
          string (or lower-case if case-folding is in effect), such as
          'INTEGER', or 'ARRAY'.  It is used with the CONVERT function to
          specify the data type conversion desired.  The formal name is
          also the string returned when the DATATYPE() function is used to
          determine an object's type.

          -----------------------------------------------------------------

          ARRAY                            N-dimensional array

          The primitive function ARRAY() creates an array storage area, and
          returns a pointer with this data type.  If this pointer is stored
          in a variable, the variable is said to be of type ARRAY, and may
          then be subscripted to access the elements of the array.

          -----------------------------------------------------------------

          CODE                             Compiled SNOBOL4 code

          The primitive function CODE() compiles a string containing
          SNOBOL4 statements, and returns a pointer to the resulting object
          code block.  If this pointer is stored in a variable, the vari-
          able is said to be of type CODE.  The variable may then be used
          with a direct GOTO by enclosing it in angle brackets.

          -----------------------------------------------------------------

          EXPRESSION                       Unevaluated expression

          When the unevaluated expression operator (*) is applied to an ex-
          pression, the result has the data type EXPRESSION.  Such expres-
          sions are not evaluated when they are defined, only when they are
          referenced.

                       E = *(LEN(K) POS(M))

          defines E as an unevaluated expression.  When this statement is
          executed, the code to concatenate two function calls is compiled,



       Reference                      - 113 -       Data Types and Conversion





          but not executed.  It is only when E is referenced in a subse-
          quent pattern match or appears as the argument of the EVAL
          function that the code is executed to produce a pattern.

            The unevaluated expression operator must be at the outermost
          level to create an object of type EXPRESSION.  If buried with the
          expression, the execution results may appear to be similar, but
          the object's data type is different.  That is, the two statements

                       P = *LEN(N)
                       P = LEN(*N)

          produce identical results when P is used in a pattern match (if
          LEN is not redefined).  However, the first statement produces P
          as type EXPRESSION, while the second produces P as type PATTERN.
          Expressions may also be produced explicitly with the CONVERT()
          function (see below).

          -----------------------------------------------------------------

          EXTERNAL                         Created by external function

          External assembly language functions may create new data types
          whose structure is known only to them.  This feature is only
          available in SNOBOL4+, Catspaw's enhanced implementation of the
          SNOBOL4 language.

          -----------------------------------------------------------------

          INTEGER                          Integer number

          A decimal number in the range -32767 to +32767.  No fractional
          part may appear.  One computer word (16 bits) is used to contain
          an integer value.

          -----------------------------------------------------------------

          NAME                             Name of a variable

          When the unary name operator (.) is applied to a variable, two
          results are possible.  If the variable's name is a simple string
          (a "natural variable"), such as ABC, the variable's name is
          returned as type STRING.  For example, .ABC has the value 'ABC'.
          However, if the variable is a created variable, such as a table
          or array element, the NAME data type results.  In either case,
          the result of the name operator can be thought of as the
          "address" or "storage location" of the variable.  When the indi-
          rect reference operator ($) is applied to such a result, the
          original, underlying object is obtained.  That is, $(.A) is the
          same as using the variable A.

            For natural variables, SNOBOL4 has the surprising property that
          the string 'XYZ' is the address (or name) of variable XYZ, so
          $'XYZ' is equivalent to XYZ.



       Reference                      - 114 -       Data Types and Conversion





          -----------------------------------------------------------------

          PATTERN                          Pattern match structure

          A pattern is created by an expression containing any of the fol-
          lowing:  other patterns, primitive patterns, pattern functions,
          the alternation operator (|), the conditional or immediate as-
          signment operator (. or $), or the cursor position operator (@).
          A simple string is not a pattern data type, even though it may
          appear in the pattern portion of a statement.  The following are
          examples of the pattern data type:

                       POS(0) "A" LEN(1)
                       "COLUMN A" | "COLUMN B"
                       "ZIP" . X
                       "MATCH" @Y

          -----------------------------------------------------------------

          Program-defined data type        Created by DATA() function

          Up to 899 new data types may be created with the primitive func-
          tion DATA.  The name specified in the prototype string becomes a
          new data type in SNOBOL4.  Any object created with the data
          type's creation function is given this name as its data type.

               DATA('COMPLEX(REAL, IMAG)')      ;* Define new type COMPLEX
               NUM = COMPLEX(2, -4)             ;* Create a COMPLEX object
               OUTPUT = DATATYPE(NUM)           ;* Print string 'COMPLEX'

          -----------------------------------------------------------------

          REAL                             Real number

          A floating-point decimal number in the range 2.3E-308 to
          1.7E+308.  Reals are only available in SNOBOL4+, Catspaw's
          enhanced implementation of the SNOBOL4 language.

          -----------------------------------------------------------------

          STRING                           Character string

          A sequence of characters.  Each character occupies one memory
          byte, and may contain any of the 256 possible bit combinations.
          A string of length zero is called the null string.  Maximum
          length of a string is determined by the keyword &MAXLNGTH
          (default 5000).  Memory restrictions in Vanilla SNOBOL4 will
          limit the longest string possible to less than the 32767
          characters allowed in SNOBOL4+, Catspaw's enhanced SNOBOL4
          implementation.







       Reference                      - 115 -       Data Types and Conversion





          -----------------------------------------------------------------

          TABLE                            Associatively referenced table

          The primitive function TABLE() creates a table storage area, and
          returns a pointer with this data type.  If this pointer is stored
          in a variable, the variable is said to be of type TABLE.  The
          variable may then be subscripted to access the elements of the
          table.  A table may be thought of as a one dimensional array in
          which the array subscripts may be any SNOBOL4 data type.  Arrays
          require integer subscripts, but table subscripts such as
          T<"TALLY"> or T<13.52> are acceptable.


                              17.2 DATA TYPE CONVERSION

            Data may be implicitly or explicitly converted from one type to
          another.


          17.2.1 Implicit Conversion

            Implicit conversion occurs automatically when SNOBOL4 requires
          a certain data type, and your program provides it in another
          form.  Conversion to the correct data type will be attempted, and
          an error message given if conversion is not possible.


          17.2.2 Explicit Conversion

            A program may use the CONVERT() function to explicitly convert
          an object to another data type.  The first argument is the object
          to be converted; the second is a string containing the formal
          name of the desired data type.  The formal name must be in upper-
          case (lower-case allowed if case-folding).  If conversion is
          possible, the function succeeds and returns the converted object.
          If not, the function fails.  The call looks like this:

                       NEWTYPE = CONVERT(OBJECT, "DESIRED TYPE")


          17.2.3 Permissible Conversions

          -----------------------------------------------------------------

          ARRAY to STRING                  

          The formal name "ARRAY" is produced.  The defining array dimen-
          sion string is appended if less than 20 characters:

                       A = ARRAY('1:50,6')
                       OUTPUT = A

          produces the string "ARRAY('1:50,6')".



       Reference                      - 116 -       Data Types and Conversion





          -----------------------------------------------------------------

          CODE to STRING                   

          The formal name "CODE" is produced:

                       C = CODE(' PIT2 = .OPPIT4 :(RETURN)')
                       OUTPUT = C

          displays the string "CODE".

          -----------------------------------------------------------------

          EXPRESSION to PATTERN            

          This occurs implicitly within a pattern match, or by using the
          EVAL function.  The deferred expression is evaluated, using
          current values for any variables which appear.  Example:

                       LASTN = *(RTAB(N) REM . LCHARS)
                        . . .
                       N = 4
                       SUBJECT LASTN                        :F(TOO_SHORT)

          -----------------------------------------------------------------

          EXPRESSION to STRING             

          The formal name "EXPRESSION" is produced.  For example,

                       LASTN = *(RTAB(N) REM . LCHARS)
                       OUTPUT = LASTN

          produces the string "EXPRESSION".

          -----------------------------------------------------------------

          INTEGER to PATTERN               

          This only occurs implicitly within a pattern match.  The integer
          is converted to a string, and the string converted to a pattern.
          Example:

                       SUBJECT 19 = ''













       Reference                      - 117 -       Data Types and Conversion





          -----------------------------------------------------------------

          INTEGER to STRING                

          Leading zeros are suppressed, and a minus sign appears if the
          integer was negative.  Integer zero is converted to the string
          "0".  For example,

                       A = -23;  B = 0;  C = 92
                       OUTPUT = A B C

          produces the string "-23092".

          -----------------------------------------------------------------

          NAME to STRING                   

          The formal name "NAME" is produced:

                       N = .A[2]
                       OUTPUT = N

          displays the string "NAME".

          -----------------------------------------------------------------

          PATTERN to STRING                

          The formal name "PATTERN" is produced.  For example,

                       WPAT = BREAK(LETTERS) SPAN(LETTERS) . WORD
                       OUTPUT = WPAT

          produces the string "PATTERN".

          -----------------------------------------------------------------

          DEFINED DATA TYPE to STRING      

          The formal name from the defining DATA function call is returned.

                       DATA('COMPLEX(REAL,IMAG)')
                       R1 = COMPLEX(2, 3)
                       OUTPUT = R1

          produces the string "COMPLEX".

          -----------------------------------------------------------------

          STRING to INTEGER                

          The string must not have any leading or trailing blanks.  A lead-
          ing plus or minus sign is allowed, but must be followed by at
          least one digit.  Leading zeros are allowed, and the resulting



       Reference                      - 118 -       Data Types and Conversion





          value must be in the legal range for integer values.  A null
          string is converted to integer zero.

                       RESULT = ("-14" + "") / "2"

          stores integer -7 in RESULT.

          -----------------------------------------------------------------

          STRING to PATTERN                

          This only occurs implicitly within a pattern match.  The pattern
          created will match the specified substring:

                       SUBJECT "HOPE"

          -----------------------------------------------------------------

          TABLE to ARRAY                   

          This only occurs when using the CONVERT function.  The table is
          converted to a two dimensional array.  Example:

                       T = TABLE(100)
                        . . .
                       A = CONVERT(T, "ARRAY")         :F(EMPTY)

            The table is converted to a rectangular array.  Null table
          entries are omitted, and there must be at least one nonnull entry
          or the function fails.  An N by 2 array is created, where N is
          the number of nonnull table values.  The first array column con-
          tains the table subscripts, the second column contains the entry
          values.

          -----------------------------------------------------------------

          TABLE to STRING                  

          The formal name "TABLE" is returned with the present size of the
          table and its expansion increment.  For example,

                       T = TABLE(10,10)
                        . . .
               ; Insert 45 nonnull elements into T
                        . . .
                       OUTPUT = T

          produces the string "TABLE(50,10)" (because table segments in
          this case are allocated in multiples of 10).

            The following matrix indicates conversions with CONVERT():






       Reference                      - 119 -       Data Types and Conversion





                          |      Result Type            E
                          |                             X
                          |                             P
                          |     I   P                   R   D
                          | S   N   A                   E   E
                          | T   T   T       A   T       S   F
                          | R   E   T   N   R   A   C   S   I
                          | I   G   E   A   R   B   O   I   N
                 Argument | N   E   R   M   A   L   D   O   E
                   Type   | G   R   N   E   Y   E   E   N   D
               -----------+-----------------------------------
                   STRING | *   I   P               C   E
                  INTEGER | S   *   P
                  PATTERN | F       *
                     NAME | F           *
                    ARRAY | A               *   1
                    TABLE | T               2   *
                     CODE | F                       *
               EXPRESSION | F       P                   *
                  DEFINED | F                               *

            *  The argument object is returned unchanged.

            A  The formal data type name "ARRAY" is returned with the
               defining prototype string if it is less than 20 characters.

            C  CONVERT(string,"CODE) behaves exactly like CODE(string).

            E  Produces an unevaluated expression, that may be subsequently
               used in a pattern, or evaluated with the EVAL() function.

            F  The formal data type name is returned.

            I  Numeric conversion is conditioned on magnitude and syntax
               restrictions. No leading or trailing blanks are permitted.

            P  Occurs implicitly within a pattern match.

            S  A number may always be converted to its string form.

            T  The string "TABLE" is returned with the present size of the
               table and its expansion increment:  "TABLE(50,10)".

            1  The array must be rectangular, with a second dimension of 2
               (N rows by 2 columns).  A table with N entries is created.
               The table subscripts are taken from the first column of the
               array; the table values are copied from the second column.

            2  The table is converted to a rectangular array.  Null table
               entries are omitted, and there must be at least one nonnull
               entry or the function fails.  An N by 2 array is created,
               where N is the number of nonnull table values.  The first
               array column contains the table subscripts, the second col-
               umn contains the entry values.



       Reference                      - 120 -       Data Types and Conversion





                                                                 Chapter 18


                                             PATTERNS AND PATTERN FUNCTIONS
          -----------------------------------------------------------------

            The SNOBOL4 pattern matcher is called the "scanner."  The
          "cursor" is the scanner's pointer into the subject string; it
          points between subject characters (no relation to your CRT cur-
          sor).  It is initially zero when positioned to the left of the
          subject, and is incremented as the scanner moves to the right in
          the subject.


                               18.1 PRIMITIVE PATTERNS

            These variables initially contain the primitive patterns of the
          same name.  They may be set to other values by a program, and re-
          stored to their original value from the corresponding protected
          keywords.

             ABORT            Causes immediate failure of the entire pat-
                              tern match, without seeking alternatives.

             ARB              Matches zero or more characters of the sub-
                              ject string.  It matches the shortest possi-
                              ble substring.

             BAL              Matches any nonnull string which is balanced
                              with respect to parentheses.  A string with-
                              out parentheses is considered balanced.  BAL
                              matches the shortest string possible.

             FAIL             Causes failure of this portion of the pattern
                              match, causing the scanner to backtrack and
                              try alternatives.

             FENCE            Matches the null string and succeeds when the
                              scanner is moving left to right in a pattern,
                              but fails if the scanner has to back up
                              through it, seeking alternatives.

             REM              Matches zero or more characters from the cur-
                              rent cursor position to the end of the sub-
                              ject string.

             SUCCEED          Matches the null string and always succeeds.

            Altering these primitive patterns can produce very confusing
          programs, unless the new value encompasses the old, like this:

                       ARB = &ARB . OUTPUT





       Reference                      - 121 -                        Patterns





                          18.2 PRIMITIVE PATTERN FUNCTIONS

            These functions produce a pattern based on the argument sup-
          plied.  The argument data type is shown below---other data types
          or expressions will be converted to the required type if
          possible.

            Pattern functions may be combined with other primitive pat-
          terns, functions, and strings using the alternation and concate-
          nation operators to produce larger patterns.

          -----------------------------------------------------------------

          ANY(string)                      Match one character from set

          Matches exactly one character from the set of characters speci-
          fied by the argument string.

          -----------------------------------------------------------------

          ARBNO(pattern)                   Match repeated pattern

          Matches zero or more consecutive occurrences of the string
          matched by the argument pattern.  ARBNO matches the shortest
          string possible--initially the null string--and only tries to
          match pattern if other pattern components in the statement re-
          quire it.

          -----------------------------------------------------------------

          BREAK(string)                    Match characters not in set

          Matches zero or more characters provided they are not in the set
          of characters in the argument string.  That is, it matches up to,
          but not including, a character from the argument string.

          -----------------------------------------------------------------

          LEN(integer)                     Match fixed length string

          Matches a string of the specified length.  There are no restric-
          tions on the subject string characters.  An argument of zero will
          match the null string.

          -----------------------------------------------------------------

          NOTANY(string)                   Match one character not in set

          Matches exactly one character provided it is not in the set of
          characters specified by the argument string.







       Reference                      - 122 -                        Patterns





          -----------------------------------------------------------------

          POS(integer)                     Verify scanner position

          Succeeds if the scanner's current cursor position in the subject
          string is equal to the specified integer value.  This function
          merely verifies scanner position---it does not consume or match
          any subject characters.  POS(0) as the first component of a pat-
          tern produces an anchored pattern match.

          -----------------------------------------------------------------

          RPOS(integer)                    Verify scanner position from end

          Succeeds if the scanner's current cursor position in the subject
          string is the specified number of characters from the end of the
          string.  Like POS(), it verifies scanner position but does not
          consume any characters.  RPOS(0) as the last component of a pat-
          tern forces the pattern to match to the end of the subject
          string.

          -----------------------------------------------------------------

          RTAB(integer)                    Match through position counting
                                           from end

          Matches all characters from the current cursor position up to the
          specified cursor position, counting from the end of the subject
          string.  RTAB(N) matches characters up to, but not including, the
          final N characters of the subject.

          -----------------------------------------------------------------

          SPAN(string)                     Match characters in set

          Matches one or more characters from the set of characters speci-
          fied by the argument string.  SPAN will not match the null
          string; at least one character from the argument string must be
          found in the subject.

          -----------------------------------------------------------------

          TAB(integer)                     Match through fixed position

          Matches all characters from the current cursor position up to the
          specified cursor position.  TAB(N) matches characters up to, and
          including, the initial N characters of the subject.  TAB will
          match the null string if the target position and current cursor
          position are the same.  The function fails if the current scanner
          position is to the right of the target position.







       Reference                      - 123 -                        Patterns





                                                                 Chapter 19


                                                         BUILT-IN FUNCTIONS
          -----------------------------------------------------------------

            In this chapter, the following items are used to indicate the
          required argument type.  Other types may be used, and will be
          automatically converted to the required type, if possible.  Inte-
          ger suffixes will be used to distinguish multiple arguments of
          the same type.

             arg              A generic argument of any SNOBOL4 data type.

             array            An array.

             i                An integer number.

             name             The name of a variable, function or label,
                              such as .VAR or 'VAR'.  When case-folding,
                              'VAR' and 'var' are equivalent as names.

             s                Any SNOBOL4 string.

             table            A table.

             unit             I/O unit; an integer between 1 and 16.

            If an argument is omitted in a function call, SNOBOL4 supplies
          the null string instead.

          -----------------------------------------------------------------

          APPLY(name, arg1, arg2,...,argn) Indirect call to a function

          Call function name with the specified arguments.  Since name may
          be a variable containing a function name, it allows an indirect
          call to a function, similar to the :($VAR) construct in the GOTO
          field.

          -----------------------------------------------------------------

          ARG(name, i)                     Get dummy argument name from
                                           function definition

          Returns a string which is the Ith argument from the formal defi-
          nition of program-defined function name.  ARG fails if i is
          greater than the number of arguments in name's definition.  ARG
          is useful when one function is used to trace another.  The trace
          function can access the actual argument used with the function
          being traced with an indirect reference: $ARG(name, i).






       Reference                      - 124 -              Built-In Functions





          -----------------------------------------------------------------

          ARRAY(s, arg)                    Create an array

          S is a prototype which specifies the dimensions of the array cre-
          ated, and the optional arg is the value used to initialize all
          array elements.  The form of the prototype string is:

                       "L1:H1,L2:H2,...,Ln:Hn"

          where L and H are integers giving the lower and upper bounds of
          each dimension.  Blanks are not permitted.  If the lower bound
          and colon are omitted from any dimension, '1:' is assumed.  ARRAY
          returns a pointer to the new array, which should be assigned to a
          variable.  The variable can then be subscripted to access the
          array elements.

            A common error when defining a multidimensional array is to use
          integers instead of a string for the prototype:

                       ARRAY(3,4) instead of ARRAY("3,4")

            The first example defines a 3-element, one-dimensional array,
          with elements initialized to integer 4.  The second defines a
          rectangular array, 3 rows by 4 columns.

          -----------------------------------------------------------------

          CHAR(i)                          Convert integer to ASCII
                                           character

          Converts an integer ASCII code to a one-character string.  The
          argument must be in the range 0 to 255, otherwise the function
          fails.

          -----------------------------------------------------------------

          CLEAR()                          Clear all variables

          The null string is assigned to all variables in the system
          (including primitive patterns, such as ARB.  These patterns and
          names may be restored from the protected keywords with the same
          names (e.g., ARB = &ARB).

            CLEAR does not modify variables which are currently saved on
          the function call stack.

          -----------------------------------------------------------------

          CODE(s)                          Compile a string

          Returns a pointer to the object code compiled from the SNOBOL4
          statements in string s.  This pointer can be assigned to a vari-
          able, and the code executed with the direct GOTO :<variable>.



       Reference                      - 125 -              Built-In Functions





          CODE fails if it finds a syntax error, and places an error mes-
          sage string in keyword &ERRTEXT.  Individual statements in s are
          separated by a semicolon (;).  The first character following a
          semicolon must be a blank, tab, the start of a label, or a com-
          ment.  Control and continuation statements are not allowed in s.
          Statements may be any length; the 120 character limit when com-
          piling from a file does not apply.  Case-folding of names is con-
          trolled by keyword &CASE.

          -----------------------------------------------------------------

          COLLECT(i)                       Regenerate storage

          This function calls SNOBOL4's garbage collection routine, which
          reclaims all unused storage.  It returns an integer result that
          is the number of free descriptors remaining in the work space (a
          descriptor contains 5 bytes of storage).  If there are less than
          i free descriptors after regeneration, the function fails.
          SNOBOL4 automatically calls COLLECT whenever memory becomes full.

          -----------------------------------------------------------------

          CONVERT(arg, s)                  Convert to specified data type

          The argument is converted to the specified data type and returned
          as the value of the function.  If conversion is not possible, the
          function fails.  S is a data type name string, such as 'STRING',
          'TABLE', etc.  Data type names may be lower case if case-folding
          is active.  Chapter 17, "Data Types and Conversion," lists allow-
          able conversions.

          -----------------------------------------------------------------

          COPY(arg)                        Make copy of argument

          Returns a distinct copy of arg.  The argument may be an array,
          code block, pattern, or program-defined data type.  If A is an
          array, the statement

                       B = COPY(A)

          creates a new array B, whose initial contents are the same as
          array A.  Their elements are independent; altering element A<I>
          does not affect element B<I>.  In contrast, the assignment B = A
          makes A and B alternate names for the same array.

          -----------------------------------------------------------------

          DATA(s)                          Create new data type

          Defines a new data type according to the prototype in string s.
          The prototype assumes a form similar to a function call, with the
          data type taking the place of the function name, and the field
          names replacing the arguments.  The form of the prototype string



       Reference                      - 126 -              Built-In Functions





          is

                       "NEWTYPE(FIELD1,FIELD2,...,FIELDn)"

            The DATA function implicitly defines a new function and n new
          field variables:

             NEWTYPE(ARG1,ARG2,...,ARGn)     Object creation function.

             FIELD1(x)        Reference to field variable 1.

             . . .

             FIELDn(x)        Reference to field variable n.

          where x is an object created with the NEWTYPE function.

            The fields may be of any data type, including pointers to other
          program-defined data items.

          -----------------------------------------------------------------

          DATATYPE(arg)                    Get data type of argument

          Returns a string specifying the data type of the argument.  Some
          typical arguments and their data types are:

             12               INTEGER

             'ABCD'           STRING

             POS(2) 'C' LEN(3)     PATTERN

             .Q<3>            NAME

             *PAT             EXPRESSION

            If the argument is a program-defined data type, the name from
          the creating DATA() function is returned.

          -----------------------------------------------------------------

          DATE()                           Get current date and time

          Returns a 20-character string of the form:

                       'MM-DD-YY HH:MM:SS.CC'

          representing month, day, year, hour, minute, second, and cen-
          tisecond respectively.  The centisecond field can only be approx-
          imated, since many personal computer clocks are only updated
          every 55 milliseconds.

          -----------------------------------------------------------------



       Reference                      - 127 -              Built-In Functions





          DEFINE(s, name)                  Create program-defined function

          This function creates a new, program-defined function.  S is a
          prototype string specifying the function's name, arguments, and
          local variables, if any.  Name is optional, and specifies a label
          as the first statement of the function body.  If absent, a label
          with the same name as the function is the assumed entry point.
          The form of the prototype string is

                       "FNAME(ARG1,ARG2,...,ARGn)LOCAL1,LOCAL2,...,LOCALn"

          where FNAME is the name of the function, and ARGi are names of
          formal arguments to the function.  Blanks are not permitted in
          the prototype.  The values of variables specified in the list of
          locals are saved prior to function entry, and restored upon func-
          tion return.

            Functions may return a value or variable name by assigning the
          result to a variable with the same name as the function.  Func-
          tions return by transferring to one of the reserved labels
          RETURN, NRETURN, or FRETURN to return by value, by name, or to
          fail respectively.

          -----------------------------------------------------------------

          DETACH(name)                     Remove I/O association

          Removes any input or output unit associated with the variable
          name.  The underlying file is not affected in any way.  Remember
          that name is the address of the variable (e.g. .X or 'X'), not
          the variable itself.

          -----------------------------------------------------------------

          DIFFER(arg1, arg2)               Check if arguments are different

          Succeeds and returns the null string if and only if arg1 and arg2
          are different.  Strings and integers are different if they have
          unequal values.  Other data types contain pointers to the actual
          data object, and differ only if the pointers are different.  If
          arg2 is omitted, DIFFER succeeds if arg1 is not null.

          -----------------------------------------------------------------

          DUMP(i)                          Dump variables

          This function causes all natural variables with nonnull values to
          be listed on the file associated with I/O unit 6 (normally
          OUTPUT).  If i is zero, the dump does not occur.








       Reference                      - 128 -              Built-In Functions





          -----------------------------------------------------------------

          DUPL(s, i)                       Duplicate string

          Returns the argument string s repeated i times.  The function
          returns the null string if i is zero, and fails if i is negative.

          -----------------------------------------------------------------

          ENDFILE(unit)                    Close file

          The file attached to the specified I/O unit is closed, and the
          file buffer is flushed and released.  All variables which have
          been associated with this unit have their association removed.
          Upon program termination, SNOBOL4 will automatically perform an
          ENDFILE function on all open units.

          -----------------------------------------------------------------

          EQ(i1, i2)                       Equality test for numbers

          This function succeeds and returns the null string if the two
          integer arguments are equal.  I1 and i2 must evaluate to integer
          values.  The function fails if i1 is not equal to i2.

          -----------------------------------------------------------------

          EVAL(s or n)                     Compile and evaluate expression

          If the argument is a string, it should contain a valid SNOBOL4
          expression to be compiled and evaluated.  The evaluation result
          is returned as the value of the function.  EVAL fails and sets
          &ERRTEXT to an error message string if s contains a syntactic er-
          ror.  If the argument is a number, i, it is returned unchanged.
          If the argument is an unevaluated expression, it is evaluated,
          and the result returned.

          -----------------------------------------------------------------

          FIELD(s, i)                      Get field name of defined data
                                           type

          Returns a string which is the Ith field name from the formal def-
          inition of the program-defined data type whose name is in string
          s.  FIELD fails if i is greater than the number of fields in the
          data type's definition.

          -----------------------------------------------------------------

          GE(i1, i2)                       Greater than or equal test for
                                           numbers

          This function succeeds and returns the null string if the two
          integer arguments satisfy the relationship i1 >= i2.  I1 and i2



       Reference                      - 129 -              Built-In Functions





          must evaluate to integer values.  The function fails if i1 is
          less than i2.

          -----------------------------------------------------------------

          GT(i1,i2)                        Greater than test for numbers

          This function succeeds and returns the null string if the two in-
          teger arguments satisfy the relationship i1 > i2.  I1 and i2 must
          evaluate to integer values.  The function fails if i1 is less
          than or equal to i2.

          -----------------------------------------------------------------

          IDENT(arg1, arg2)                Check if arguments are identical

          Succeeds and returns the null string if and only if arg1 and arg2
          are identical.  Strings and integers re identical if they have
          the same values.  Other data types contain pointers to the actual
          data object, and are identical only if they point to the same
          object.  If arg2 is omitted, IDENT succeeds if arg1 is the null
          string.

          -----------------------------------------------------------------

          INPUT(Name, Unit, i, s)          Open file for input

          This function opens a file for input, and associates it with a
          variable.  Data may then be read from the file by using the vari-
          able in an expression or an assignment statement.

            The file designated by string S is opened for input and given
          the specified unit number.  I is an optional record length.  The
          variable specified by Name is associated with this unit.

            The first argument, Name, specifies a SNOBOL4 variable, typi-
          cally as a quoted string or with the unary period operator:

                       INPUT('IN', ...
                       INPUT(.IN, ...
                       X = 'IN'
                       INPUT(X, ...

            The second argument, Unit, must evaluate to an integer value in
          the range 0 to 16 inclusive.  Unit 0 (or omitting the unit argu-
          ment) will select the default input unit, 5.

            The third argument, I, contains the record length in charac-
          ters.  0 < I <= &MAXLNGTH.  If omitted, the default is 80.

            The fourth argument, S, is a string containing the name of the
          file to be opened.  If the file is a disk file, S may contain an
          optional drive letter and pathname in addition to the filename.
          Besides disk files, MS-DOS device names such as NUL:, CON:,



       Reference                      - 130 -              Built-In Functions





          COM2:, etc., are permitted.

            If S is absent or null, and this unit is not currently open,
          the SNOBOL4 command line is searched for a file to use with this
          unit (/n:file).  If S is absent, but the unit is already open,
          the INPUT call serves only to establish another association
          between a variable and the unit.  If S is not null, any file pre-
          viously associated with this unit number is first closed by
          SNOBOL4 with an implicit ENDFILE(unit).

            An error message is generated for an illegal unit number.  The
          INPUT function fails (with no printed error message) if the file
          cannot be opened.

            The record length I (or its default value, 80), determines the
          number of characters returned in a string when the associated
          variable is referenced.  ASCII files will return I characters or
          less if an end-of-line condition is encountered.  End-of-line is
          defined as either a carriage return, or a carriage return fol-
          lowed by a line feed.  If I characters are read from an ASCII
          file without encountering an end-of-line, additional characters
          are read from the file and discarded until the end-of-line char-
          acter(s) are found.  That is, long lines are truncated.  If less
          than I characters are read from an ASCII file, and keyword &TRIM
          is zero, the line will be padded with blank characters until
          length i is obtained.

            A read operation will terminate on the last character of a disk
          file, returning a short record.  Reading past the End-of-File
          will cause statement failure.  If the file is ASCII, reading a
          control-Z character will be treated as an End-of-File.

            Note:  When program begins execution, the variable INPUT is
          associated with unit 5.  Unit 5 is normally device CON:, the key-
          board, unless redirected elsewhere by the /I=file command line
          option, or the MS-DOS redirection operation (<file).

          -----------------------------------------------------------------

          INTEGER(arg)                     Check if argument is an integer

          Succeeds and returns the null string if arg is an integer, or a
          string which can be converted to an integer.  If the argument is
          not an integer, the function fails.

          -----------------------------------------------------------------

          ITEM(array, i1, i2, ..., in)     Get array element

          ITEM(table, arg)                 Get table element

          Returns the specified array or table element.  I1, i2, ..., in
          are array subscripts, and arg is a table subscript.  Since the
          first argument may be a function which returns an array or table



       Reference                      - 131 -              Built-In Functions





          name, it allows an indirect reference in situations that would
          not be syntactically valid.  ITEM is an analog of the APPLY func-
          tion.  For example, if F(X) is a program-defined function which
          returns an array name,

                       ITEM(F(X), 20)

          references the 20th element of that array, whereas F(X)<20> is
          not acceptable.

          -----------------------------------------------------------------

          LE(i1, i2)                       Less than or equal test for
                                           numbers

          This function succeeds and returns the null string if the two
          integer arguments satisfy the relationship i1 <= i2.  I1 and i2
          must evaluate to integer values.  The function fails if i1 is
          greater than i2.

          -----------------------------------------------------------------

          LGT(s1, s2)                      Lexically greater than test for
                                           strings

          This function succeeds and returns the null string if s1 is lexi-
          cally greater than s2 (according to their alphabetic ordering).
          The two strings are compared left to right, character by charac-
          ter.  If one string is exhausted before the other---with all
          characters equal---the longer string is lexically greater than
          the shorter string.  The null string is lexically less than any
          other non-null string.  If there is a character mismatch at the
          same position in both strings, the relationship between the char-
          acters determines the relationship of the strings.  Strings are
          equal only if they are the same length, and are identical charac-
          ter by character.

          -----------------------------------------------------------------

          LOCAL(name, i)                   Get local variable name from
                                           function definition

          Returns a string which is the Ith local variable from the formal
          definition of program-defined function name.  LOCAL fails if i is
          greater than the number of local variables in name's definition.
          LOCAL is useful when one function is used to trace another.  The
          trace function can access the local variables used with the func-
          ion being traced with an indirect reference: $LOCAL(name, i).









       Reference                      - 132 -              Built-In Functions





          -----------------------------------------------------------------

          LPAD(s1, i, s2)                  Pad left end of string

          This function is useful for right-justifying columnar output.  It
          returns s1 padded on its left end until its total size is i char-
          acters.  The pad character used is the first character of s2 if
          present, otherwise a blank is used if s2 is absent or null.  If i
          is less than or equal to the length of s1, s1 is returned un-
          changed.

          -----------------------------------------------------------------

          LT(i1, i2)                       Less than test for numbers

          This function succeeds and returns the null string if the two in-
          teger arguments satisfy the relationship i1 < i2.  I1 and i2 must
          evaluate to integer values.  The function fails if i1 is greater
          than or equal to i2.

          -----------------------------------------------------------------

          NE(i1, i2)                       Not equal test for numbers

          This function succeeds and returns the null string if the two in-
          teger arguments are not equal.  I1 and i2 must evaluate to inte-
          ger values.  The function fails if i1 is equal to i2.

          -----------------------------------------------------------------

          OPSYN(s1, s2, i)                 Create operator synonym

          The function or operator name s1 becomes a synonym for s2.  If i
          is absent or 0, both strings are assumed to be function names.
          If i is 1 or 2, then the strings are assumed to be unary or
          binary operators, respectively.  Other values for i are illegal.
          Operators are specified by using their graphic symbol in a quoted
          literal, such as:

                       OPSYN('#', '/', 2)

            The concatenation operator is specified as a one-character
          string containing a blank: ' '.  The implicit pattern match oper-
          ator between subject and pattern cannot be OPSYNed.

          -----------------------------------------------------------------

          OUTPUT(name, unit, i, s)         Open file for output

          This function opens a file for output, and associates it with a
          variable.  Data may then be written to the file by assigning val-
          ues to the variable.

            The description of the OUTPUT function parallels that of the



       Reference                      - 133 -              Built-In Functions





          INPUT function, and will not be duplicated here.  The following
          differences are noted below.

            If the output file already exists, it is deleted and recreated
          anew.  Facilities for updating existing files (direct-access
          files) are not present in Vanilla SNOBOL4; they are contained in
          SNOBOL4+, Catspaw's enhanced implementation of the SNOBOL4 lan-
          guage.

            When an output variable is assigned a string value, the string
          is written to the associated file.  A carriage return and line
          feed appended to the string.  If the string is longer than the
          record length (i, or the default, 80), a carriage return and line
          feed will be inserted every i characters.  That is, long strings
          will create multiple output lines.

            Note:  When a program begins execution, the variable OUTPUT is
          associated with unit 6.  Unit 6 is normally device CON:, the dis-
          play, unless redirected elsewhere by the /O: command line option
          or the MS-DOS redirection operation (>file).  The variable SCREEN
          is associated with unit 7, which is also attached to device CON:.

          -----------------------------------------------------------------

          PROTOTYPE(array)                 Get prototype which created an
                                           array

          Returns the prototype string of dimensions used to create the
          specified array.  If the array was created by the ARRAY function,
          then the string returned is identical to the first argument of
          the original ARRAY function call.  If the array was produced from
          a table by the CONVERT function, the string has the form 'N,2',
          where N is the integer number of rows in the array.

          -----------------------------------------------------------------

          REMDR(i1, i2)                    Get remainder after division

          REMDR returns the integer remainder resulting from i1 divided by
          i2, that is, i1 modulus i2.  The result has the same sign as i1.

          -----------------------------------------------------------------

          REPLACE(s1, s2, s3)              Replace characters in string

          This function returns s1 transformed according to a translation
          specified by s2 and s3.  Each character of s1 found in s2 is re-
          placed by the corresponding character in s3.  S2 and s3 must be
          the same length.  If duplicate characters appear in s2, the
          rightmost one is used to obtain the mapping character from s3.
          Normally, s2 and s3 are thought of as parameters, and REPLACE
          performs character substitutions on the variable s1.  For
          instance:




       Reference                      - 134 -              Built-In Functions





                       REPLACE(S, 'aeiouAEIOU', '1234512345')

          replaces all upper- and lower-case vowels in S with the digits 1
          through 5.  It is possible to use REPLACE as a "transposition"
          function if s1 and s2 are considered parameters, and s3 allowed
          to vary.  If s1 and s2 are the same length, a simple positional
          transformation results.  For example,

                       REPLACE('123456', '214365', S)

          returns the six character string S with adjacent pairs of charac-
          ters interchanged ('ABCDEF' becomes 'BADCFE').  S1 and s2 can be
          different lengths---only s2 and s3 must be the same size.  If s2
          contains characters not in s1, the corresponding characters in s3
          are dropped from the result.  If s1 contains characters not in
          s2, they will appear in the result.  The function call

                       REPLACE('Yy/Mm/Dd', 'Mm-Dd-Yy xx:xx:xx.xx', DATE())

          returns the date in the form YY/MM/DD (e.g., 87/07/28).  Dupli-
          cate characters in s1 are permitted, so:

                       REPLACE('aaabbbccc', 'abc' '(1)')

          produces '(((111)))'.

          -----------------------------------------------------------------

          RPAD(s1, i, s2)                  Pad right end of string

          This function is useful for left-justifying columnar output.  It
          returns s1 padded on its right end until its total size is i
          characters.  The pad character used is the first character of s2
          if present, otherwise a blank (ASCII character 32) is used if s2
          is absent or null.  If i is less than or equal to the length of
          s1, s1 is returned unchanged.

          -----------------------------------------------------------------

          SIZE(s)                          Get length of string

          The function SIZE returns an integer value which is the number of
          characters in its argument string.  A null string argument
          returns 0.

          -----------------------------------------------------------------

          STOPTR(name, type)               Stop trace

          Discontinues the type of trace of the named item.  Consult the
          TRACE() function for a list of tracing types available.






       Reference                      - 135 -              Built-In Functions





          -----------------------------------------------------------------

          TABLE(i1, i2)                    Create a table

          A table is similar to a one-dimensional array, but the subscripts
          may be any SNOBOL4 data type.  The TABLE function creates a table
          and returns a pointer to it.  The integer i1 specifies the ini-
          tial number of entries in the table.  Integer i2 specifies the
          size by which the table is increased whenever it becomes full,
          and additional table space is required.  If either is omitted, 10
          is used as a default value.

          -----------------------------------------------------------------

          TIME()                           Get execution time

          Returns the number of tenths of a second elapsed since the start
          of program execution, including all I/O wait time.

          -----------------------------------------------------------------

          TRACE(name1, type, s, name2)     Trace an entity

          The item name1 is traced according to the action specified by
          type.  Trace output is written to the file associated with I/O
          unit 6.

            Name1 is a the name of a variable, function, statement label,
          or keyword.  It may appear as a string, or specified with the
          unary name operator (.).

            Type is a string that determines the type of trace desired.  It
          must be one of these values:

             'VALUE'          When value of name1 is changed (default if
                              type omitted).

             'CALL'           When function name1 is called.

             'RETURN'         When function name1 returns.

             'FUNCTION'       When function name1 is called, or returns.

             'LABEL'          When control is transferred to label name1.

             'KEYWORD'        When the value of keyword &name1 is changed.
                              Note that the ampersand character (&) is not
                              included in the first argument, name1.

            S is an optional identifying tag that is added to the trace
          output line when name1 is a created object, such as an array or
          table element.

            Name2 is an optional name of a program-defined function.



       Reference                      - 136 -              Built-In Functions





          Instead of producing a trace output line, this function is called
          when the trace action occurs.  The function is called with name1
          as the first argument, and string s as the second argument.

            Tracing will only occur when the keyword &TRACE is nonzero.
          Each trace will decrement &TRACE by one.  Tracing ends when it
          becomes zero.

          -----------------------------------------------------------------

          TRIM(s)                          Remove trailing blanks

          Returns the argument string with trailing blanks removed.  Trail-
          ing tab characters are not affected.  If the argument string was
          read from an input file, it is more efficient to set keyword
          &TRIM nonzero than to use TRIM(INPUT).

            By combining function TRIM with REPLACE, any trailing character
          can be removed.  The desired character is temporarily exchanged
          with blank, trimmed, then exchanged back.  For example, this
          expression returns string S with trailing zeros removed:

                       REPLACE(TRIM(REPLACE(S,'0 ',' 0')),'0 ',' 0')

          -----------------------------------------------------------------

          UNLOAD(name)                     Remove function definition

          The function name becomes undefined.

          -----------------------------------------------------------------

          VALUE(name)                      Get value of an object

          The VALUE function returns the value of the variable name,
          behaving like the unary indirect operator ($).





















       Reference                      - 137 -              Built-In Functions





                                                                 Chapter 20


                                                            SYSTEM MESSAGES
          -----------------------------------------------------------------

            This chapter lists all messages displayed by SNOBOL4.


                                20.1 INITIAL MESSAGES

            When SNOBOL4 begins execution, this title is displayed:

               Vanilla SNOBOL4      Version 2.14.
               (c) Copyright 1984,1988 Catspaw, Inc. All Rights Reserved.

            Additional messages which may appear:

          Cannot open file: name
          The file specified in the command line cannot be opened.

          Command line error:
          A syntactic error was detected in the SNOBOL4 command line.  The
          command line is displayed on two lines.  The line break shows
          where the error occurred.

          Errors detected in source program
          There were compilation errors in the source program.  Execution
          will proceed until a statement with a compilation error is
          encountered.

          Insufficient storage for initialization
          Not enough memory was available to initialize the SNOBOL4 system.

          No errors
          Compilation is complete, and without error.  Execution begins
          immediately.


                              20.2 TERMINATION MESSAGES

            Termination messages are normally produced on I/O unit 7, which
          defaults to the user's display screen.  If the /B option was used
          in the invoking command line, they are produced on I/O unit 6,
          associated with variable OUTPUT.  Dump messages are always pro-
          duced to unit 6.

          Normal termination at level LL
          The program transferred to the label END.  LL is the current
          program-defined function call depth.  This message is produced
          only if the /S command line option (statistics) was used.






       Reference                      - 138 -                 System Messages





          filename(XXX) : Last statement executed was NNN
          NNN is the statement number of the last statement executed, XXX
          is its source line number.  It is the statement that transferred
          to the END statement.  If this was a normal termination, it is
          only displayed if the /S option was used.

          filename(XXX) : Warning: Interrupted in statement NNN at level LL
          Execution was interrupted when you pressed the BREAK or control-C
          key.  The interruption occurred before the specified statement
          was executed.  LL is the current call depth of program-defined
          functions.

          Incomplete storage regeneration.  Terminal dump not possible
          Stack overflow occurred during storage regeneration, and the
          &DUMP keyword was nonzero.  Memory is in an indeterminate form,
          and a dump listing cannot be produced.

          Dump of variables at termination
          Natural variables
          Unprotected keywords
          These headings will appear if a termination dump was requested by
          setting the &DUMP keyword nonzero.  Variables are listed only if
          they contain a nonnull value.  The variable names will be sorted
          if the &DUMP keyword is positive; they are unsorted if it is
          negative.


          20.2.1 Job Statistics

            End-of-run statistics on program execution are provided if the
          /S command line option is used.  Compilation and execution times
          are in tenths of a second.  Times are wall-clock values, and
          include all I/O wait time, such as delays for keyboard input:

                       SNOBOL4 statistics summary-
                       NN tenths of a second compilation time
                       NN tenths of a second execution time
                       NN statements executed, NN failed
                       NN arithmetic operations performed
                       NN pattern matches performed
                       NN regenerations of dynamic storage
                       NN reads performed
                       NN writes performed


                              20.3 COMPILATION MESSAGES

            SNOBOL4 syntax errors are detected during compilation.  State-
          ment compilation ceases at the point where the error was de-
          tected.  The error message contains a marker which indicates the
          valid portion of the statement accepted by the compiler---the
          error occurred after this point.  Only the first error in a
          statement is detected.  The erroneous statement is compiled with
          an internal error code which produces an error message if the



       Reference                      - 139 -                 System Messages





          statement is executed.  Compilation resumes with the next state-
          ment.  Compilation ceases and SNOBOL4 terminates if more than 50
          errors are found.

            When compiling without a list file (/L: command line option),
          the compiler will attempt to display the erroneous line on your
          screen.  If a statement is continued over several lines, only the
          line in error is displayed.  Several errors cannot be detected
          until the absolute end-of-statement is found.  This may require
          reading the next line, and finding it is NOT a continuation
          statement.  In this case, the single line displayed will be the
          NEXT line, with the error marker in the first character position.

            The CODE function may be used to compile SNOBOL4 statements
          that have been concatenated into a long string.  The CODE func-
          tion fails if a syntax error is found, and the keyword &ERRTEXT
          contains the error message string for the error encountered.

          Binary operators must be surrounded by blanks
          Omitting a blank will often cause this error.  An illegal or
          undefined binary operator will also produce this error.

          Error in GOTO
          There is a syntactic error in the GOTO field.

          Erroneous END statement
          The END statement contains a syntactic error, or the label speci-
          fied in the subject field for initial transfer could not be
          found.

          Erroneous integer
          An integer number appears which is too large for the SNOBOL4 sys-
          tem.  The allowable range for magnitude values is 0 to 32767.

          Erroneous label
          The first character of a statement must be blank, tab, alphanu-
          meric, * (comment), + or . (continuation), or - (control).

          Erroneous or missing break character
          A character which separates language elements occurs in an ille-
          gal context, or an expression is not balanced with respect to
          parentheses.

          Erroneous subject
          A compiler break character appears before the statement subject
          field.  The break characters are comma, equal sign, right paren-
          thesis, right square bracket (]), and right angular bracket (>).

          Illegal character in element
          A character was found which was incorrect for the type of lan-
          guage object being compiled.  This often occurs when a blank is
          omitted between elements, causing them to run together.





       Reference                      - 140 -                 System Messages





          Improperly terminated statement
          The source statement terminated with an incomplete language con-
          struction.

          Limit on compilation errors exceeded
          More than 50 compilation errors were found in the source program.

          No END statement in source file
          End-of-File was encountered in the source file without an END
          statement.

          Previously defined label
          A duplicate label appears.  The first definition is retained;
          subsequent definitions are discarded.

          Unclosed literal
          The closing quotation mark from a literal string is missing.
          This error also occurs if the closing quotation mark (single or
          double) was different from the opening mark.


                            20.4 EXECUTION ERROR MESSAGES

            Most program logic errors can only be detected during program
          execution.  Some are unconditionally fatal, and cause the SNOBOL4
          system to terminate.  Others are conditionally fatal---the system
          terminates if the value of the keyword &ERRLIMIT is zero.  If
          &ERRLIMIT is nonzero, the keyword &ERRTYPE is set to the error
          message number, &ERRTEXT is set to the message text, &ERRLIMIT is
          decremented, and execution continues.

            The protected keyword &ERRTYPE may be traced, permitting a
          program-defined function to gain control when a conditional error
          occurs.  THe program CODE.SNO provides an example of how to do
          this.  The initial value of the unprotected keyword &ERRLIMIT is
          zero, forcing program termination upon any error.

            Errors 1-16 are conditionally fatal.  Errors 17-28 are uncondi-
          tionally fatal.  When execution terminates due to an error, the
          following is displayed:

               filename(XXX) : Error NN, -- description --
               In statement NNN, at level LL

          NN is the error number below.  NNN is the statement number
          assigned in the compiler list file, XXX is the absolute line
          number in the source file.  LL specifies the current program-
          defined function call depth (0 is the normal main-program level).

          1. Illegal data type
          The data type of an operand was incorrect for the type of opera-
          tion attempted.  This occurs most frequently with arithmetic op-
          erations, when one operand is a string which cannot be converted
          to a number.



       Reference                      - 141 -                 System Messages





          2. Error in arithmetic operation
          An arithmetic operation upon integer values produced a result
          which was out of range, or was undefined, such as division by
          zero.

          3. Erroneous array or table reference
          An array or table reference was made to a variable which did not
          contain an array or table pointer.

          4. Null string in illegal context
          The null string appeared where it is not permitted, such as the
          object of an indirect reference.

          5. Undefined function or operation
          A function was called before it was defined, or an undefined
          operator was used.

          6. Erroneous prototype
          A syntactic error occurred in the prototype string used with the
          functions ARRAY, DATA or DEFINE.  Note that the blank and tab
          characters are not permitted within the prototype string.

          7. Unknown keyword
          The keyword specified is unknown to the SNOBOL4 system.

          8. Variable not present where required
          A variable name must be used as the subject of an assignment
          statement, or as the argument of the unary cursor, name, or key-
          word operator (@, ., &), or the binary pattern match assignment
          operators (., $).

          9. Entry point of function not label
          At the time a program-defined function was first called, its
          entry point label did not appear as the label of any SNOBOL4
          statement.

          10. Illegal argument to primitive function
          An illegal value was used as an argument to the function ARG,
          FIELD, LOCAL, OPSYN, STOPTR, or TRACE, or an illegal value was
          specified in the third argument to INPUT or OUTPUT.

          11. Reading error
          An error condition was returned when reading from a file.

          12. Illegal I/O unit
          Allowable unit numbers are 1 through 16 (inclusive).  (Unit 0 is
          allowed in functions INPUT and OUTPUT, and is converted to units
          5 and 6 respectively.)

          13. Limit on defined data types exceeded
          SNOBOL4 allows 899 different program-defined data types.






       Reference                      - 142 -                 System Messages





          14. Negative number in illegal context
          A negative number was used incorrectly as the argument of the
          function LEN, POS, TAB, or RTAB.

          15. String overflow
          The program attempted to create a string larger than &MAXLNGTH
          characters.

          16. Overflow during pattern matching
          The internal SNOBOL4 stack overflowed during pattern matching.
          This can happen when a recursive or looping pattern is incor-
          rectly specified.

          17. Error in SNOBOL4 system
          This message indicates an internal SNOBOL4 system error.

          18. Return from level zero
          An attempt was made to transfer to the function return label
          RETURN, FRETURN, or NRETURN outside of any function call.

          19. Failure during GOTO evaluation
          The expression used for an indirect transfer within the GOTO
          field failed when evaluated.

          20. Insufficient storage to continue
          All available memory has been used.  Vanilla SNOBOL4 is limited
          to 30K bytes for program and data.  SNOBOL4+, Catspaw's enhanced
          version, allocates 300K bytes for program and data.

          21. Stack overflow
          The SNOBOL4 internal stack has overflowed.  This may be caused by
          excessive function recursion, or occur during memory garbage
          collection.

          22. Limit on statement execution exceeded
          The number of statements executed was greater than the value in
          the keyword &STLIMIT.  &STLIMIT is initially -1, specifying
          unlimited execution.

          23. Object exceeds size limit
          The program attempted to create an object larger than the maximum
          size allowed.

          24. Undefined or erroneous GOTO
          A transfer was attempted to an undefined label, or an expression
          in a GOTO field evaluated to a string, rather than a label
          name---usually the result of omitting the indirect operator ($).

          25. Incorrect number of arguments
          A primitive function was called with too many arguments.

          28. Execution of statement with compilation error
          Execution proceeded to a statement that contained a compilation
          error.



       Reference                      - 143 -                 System Messages





                            20.5 EXECUTION TRACE MESSAGES

            Tracing is provided for variables, certain keywords, label
          transfers, and function calls and returns.  A trace message is
          output to I/O unit 6 for each trace occurrence.  Program execu-
          tion time, in tenths of a second, is appended to each message.

            Tracing normally occurs only if the keyword &TRACE is nonzero.
          However, another keyword, &FTRACE, may be set nonzero to trace
          all function calls and returns independently of keyword &TRACE.

          STATEMENT NN: <vname> = <value>,TIME = TT
          Value trace; produced by the function call TRACE('vname',
          'VALUE'), where vname is the name of the variable to be traced.

          STATEMENT NN: &<keyname> = <value>,TIME = TT
          Keyword trace; produced by the function call TRACE('keyname',
          'KEYWORD'), where keyname is the upper case keyword name, without
          the leading ampersand.

          STATEMENT NN: TRANSFER TO <labname>,TIME = TT
          Label trace; produced by the function call TRACE('labname',
          'LABEL'), where labname is the desired label name.  Tracing only
          occurs on a transfer of control; it does not occur if the labeled
          statement is flowed into.

          STATEMENT NN: LEVEL LL CALL OF <fname>(arg1,...,argn),TIME = TT
          Call trace; produced by the function call TRACE('fname', 'CALL'),
          where fname is the name of the function to be traced.  The func-
          tion's arguments at the time of the call are evaluated and dis-
          played.

          STATEMENT NN: LEVEL LL RETURN OF <fname> = <value>,TIME = TT
          STATEMENT NN: LEVEL LL NRETURN OF <fname> = <value>,TIME = TT
          STATEMENT NN: LEVEL LL FRETURN OF <fname>,TIME = TT
          Return trace; produced by the function call TRACE('fname',
          'RETURN'), where fname is the name of the function whose return
          is to be traced.  The type of return that occurred is displayed
          in the trace message.

          ***Print request too long***
          An internal buffer is used to display trace messages, and vari-
          able values during dumps.  If the required display is longer than
          1,800 characters, this error message is produced instead.













       Reference                      - 144 -                 System Messages
