This section is under construction.

The scripting language


Index

1.0.0 Introduction

This site section tries to describe the Basic language used by SPSS scripts.
It isn't an exhaustive description of the language: this section should be considered as a place where to start writing a script in Sax basic.
If you already have some VBA programming experience (for example Excel macros), you can skip this section of the site entirely and go directly to the SPSS objects section.
It's my opinion that it's easier to learn something if you have some examples to look at.
Therefore examples here will refer to SPSS, by solving problems an SPSS script may encounter.

Return to index

2.0.0 Scripts vs. Syntax

What is a programming language? It's a description given to the computer of how to perform a specific task, by following a syntax scheme.
You may think of SPSS syntax as a programming language: you describe the things to be accomplished by using the commands made available by SPSS.

The difference from the scripts programming language is that an SPSS syntax can perform powerful things with a shorter number of code lines. For example, if you need to sort your data file, with the SPSS syntax you can do it with a single line of code, by using the SORT command.
To accomplish a similar operation (such as sorting a list of values) with the scripting language, it can require a much greater number of code lines.

The drawback is that SPSS syntax is limited to perform only pre-defined actions on the data file: have you ever thought to write a new statistical procedure by using the SPSS syntax? It's impossible!
On the other way, theoretically with a script you can write a your own statistical procedure.

Syntax only works on the SPSS data file: it is somehow "connected" to the SPSS processor.
Scripts can be "connected" to the SPSS processor, to the viewer window, to the syntax window. With a script you can do everything a syntax is able to do, and much more!

Generally speaking, with a script you can do quite everything, but it is more difficult to learn than SPSS syntax.
But once you've learned it, you have a lot of power in you hands, by having the possibility to automate operations that, without a script, you would perform manually.

Return to index

3.0.0 The approach

The goal of a script is to perform a task. Writing a script means defining steps the computer must follow in order to accomplish that task. For example, suppose you need to write a script which sends to the printer all pivot tables containing a certain title.
The sequence of actions (i.e. the steps described above) the script must perform is the following:

  1. identify the active output viewer
  2. look at the first output item
  3. if the item is a pivot table containing the desired title, then select it
  4. look at the next output item
  5. repeat steps 3, 4 until the last output item has been reached
  6. print the selected output

You may think of the above actions as macro steps: in fact, in order to be converted into a script, they need to be more detailed, especially step 3. It can be split into the following simpler steps:

  1. is the item a pivot table?
  2. if no, then skip the following steps
  3. if yes, retrieve the table's title
  4. is the title equal to the desired title?
  5. if yes, select the item

Now, let's look at the full sequence of actions:

  1. identify the active output viewer
  2. look at the first output item
    1. is the item a pivot table?
    2. if no, then go to step 8
    3. if yes, retrieve the table's title
    4. is the title equal to the desired title?
    5. if yes, select the item
  3. look at the next output item
  4. repeat steps 3...8 until the last output item has been reached
  5. print the selected output

Are you surprised of the number of steps necessary to accomplish a simple task like this?
If you think carefully, it isn't so strange. In fact the above actions are exactly the same you would perform if you would carry out that task manually.
The reason is that they are too simple, and you don't take care of them because you consider them implicitly.
But, in order to explain the task to the computer, you need to define with great detail all steps to perform!

3.1.0 Top-down approach

How can you avoid to get confused of details? You need a rationalization of the problem.
First, it is necessary to define the task into big steps. After that, you should divide each big step into several sub-steps. If necessary, a further split of these newly defined sub-steps need to be done until the maximum detail level has been reached.
The maximum detail level is reached when you define a step by only using the syntax commands made available by the language.

This approach is called top-down method: it consists of the fragmentation of the problem to solve into more elementary sub-problems.
So, when you are starting to write a program, the first thing you should do is a problem analysis, in order to fully comprehend it, and the goals to meet.

Return to index

4.0.0 Script data

Generally, in order to accomplish a task, a script needs to use some intermediate data. We know that a program (in our case a script) is made of a sequence of actions (an algorithm) which form the description of what the program must do.

But every action needs "to act" on something, and this "something" is represented by intermediate data, which are different from the permanent data stored in a file such as the SPSS data file.
Script data lifetime is limited to the execution time of the script itself. Script data don't need to be permanent: they are discarded once the script terminates.
This implies that data must be defined in the the script code, by taking up a particular section, called as the program data definition section.

Therefore a script is composed by two main parts: the data definition and the action definition or algorithm, which is the definition of the actions to execute.

Depending on the type of task to perform, the algorithm can use different type of data, for example numeric data or data representing a sequence of characters.
Every datum needs to be identified by a type and a name. The name is a short description of what the datum represents in the algorithm, and it is necessary for the identification of a piece of information among data.

For example, suppose we would need to write a script which selects all pivot tables contained in the Output Navigator.
In order to accomplish a task like this, our script will need to scan every Output Navigator Item and then select the item if it represents a pivot table.

The data the algorithm needs to use are: number of items contained in the Output Navigator, current item number, and a number which is the code distinguishing the pivot table item among other Output Navigator items.
We could name the first datum as NumOfItems, the second one as CurrentItemNo, and the last datum as PivotTableCode. All are numeric data.

CurrentItemNo is the name of the datum representing the current item number. Since the algorithm will scan every Output Navigator's item, this datum will assume different values during the execution of our script, depending on the item the algorithm is currently scanning.
Therefore, during the script run, the value that this datum represents, varies: its content dynamically changes. For this reason this datum can be called as a script variable.
Also the NumOfItems datum is a variable: the script must determine the current number of items, which could be different every time the script is executed. Therefore its value could be different at each script execution.
On the other way PivotTableCode is NOT a variable: its value is always the same, during the script execution and every time the script is launched for execution. PivotTableCode is a constant: its value never changes.

In conclusion:

Return to index

4.1.0 Variable types

Both variables and constants can contain data of different types.
In a script it isn't possible to have two variables with the same name but with different type, and it isn't possible to have both a constant and a variable with the same name.
An optional way of declaring the constant/variable type allows the use, in the constant/variable name, of a trailing character which specifies the type. For example, a variable having the A$ name, is a String variable: notice the $ trailing character which identifies the String type.
A constant/variable name must start with a letter. Following characters may be a letter, an underscore or a digit.

If the trailing character identifying the type isn't used in the variable name, it is always necessary the definition of the variable type by specifying the full type name. The used syntax is specified in the declaration part.

If in the variable declaration isn't specified the type and if variable name doesn't contain the trailing-type character, variable will be considered of Variant type.

Sax Basic Script can handle the following data types:

Boolean
This data type can assume only two values: True or False.
This type is used for logical operations or to indicate a status.
No trailing character.
Byte
This is a 8-bit unsigned numeric data type.
It can assume only positive numbers with no fractional part ranging from 0 to 255.
No trailing character.
Integer
This is a 16-bit signed numeric data type.
It can assume only numbers with no fractional part ranging from -32,768 to 32,767.
Trailing character: %
Long
This is a 32-bit signed numeric data type.
It can assume only numbers with no fractional part ranging from -2,147,483,648 to 2,147,483,647.
Trailing character: &
Single
This is a 32-bit floating point real number data type.
Trailing character: !
Double
This is a 64-bit floating point real number data type.
Trailing character: #
Currency
This is a 64-bit fixed point real number data type.
This type is generally used to represents numbers where accuracy in calculations is important.
Trailing character: @
Date
This is a A 64 bit real value. The whole part represents the date, while the fractional part is the time of day. (December 30, 1899 = 0.)
It is necessary to use #date# as a literal date value in an expression.
String
This type represents a sequence of characters and numbers of arbitrary length.
In order to assign a sequence of characters to a string variable, it is necessary to enclose them with double quotes(")
For example, the assignment of the text This is a string to the A variable, is made as the following:
    A = "This is a string"
Trailing character: $
String*n
This type represents a fixed length sequence of characters and numbers specified by n.
Trailing character: $
Variant
This is a generic type having a particular property: it can represent any of the previous types.
It is used when a variable in the same program can contain values of different type.
It is possible to assign to a Variant variable a String value, and then an Integer value.
Conversions between types are made automatically, in case an expression includes Variant variables containing values of different types.
Suppose that a Variant variable contains a number in string format (like "1"): it is possible to define an expression which makes mathematical operations with this variable. For clarification, look at the following example, where A is a Variant variable:
  A = "1"
  B = A + 1

After the above code has been executed, the B variable contains 2: the A variable contains the "1" string which has been added to the 1 number; .
Notice that any operation made with a Variant variable requires more execution time compared with a simple variable, and a Variant variable occupies more memory space.
The use of Variant variables can make coding process easier, but it can reduce code readability.
A variable declared with no trailing character and without an explicit type declaration, is considered by default of Variant type.
Return to index

5.0.0 Declaration part

A Sax basic script is a structured language. This means that the language is made of elements, eventually grouped in blocks.
A script is mainly composed of two parts: a declaration and an execution part.

Return to index

5.1.0 Constants declaration

Constants are data already known before the script runs which don't change during the script execution. Like a variable, a constant is represented by a name and a type, and its value is fixed.

Constant declaration is done with the following syntax:

CONST <constant-name> = <value>

where:

Constant type is defined by the value assigned to the constant. For example, if a number is assigned to the constant, its type is number; if a string is assigned to the constant, its type is string.

In accordance with syntax above, the TITLE constant (containing the "My Title" string value) is declared in the following way:

CONST TITLE="My Title"

It is possible to declare more than one constant with the CONST keyword, by separating each constant with a comma:

CONST TITLE="My Title", WIDTH=100, HEIGHT=80 

The example above declares the two numeric constants WIDTH and HEIGHT besides the TITLE string constant.

Return to index

5.2.0 Variables declaration

A script variable is declared with the DIM keyword in the following way:

DIM <variable-name> AS <type>

where:

If <variable-name> contains a trailing character specifying the variable type (as explained in the Variable types chapter), then the

AS <type>

part must be omitted.

For example, to declare variable Count as Integer and variable Txt as String, the script syntax is:

DIM Count AS Integer
DIM Txt AS String 

With the use of trailing characters, we define variables Count% (% defines an Integer) and Txt$ ($ defines a String) in the following way:

DIM Count%
DIM Txt$

If no trailing character is used and there isn't an explict type declaration (AS <type> part missing), then the variable is considered to be of Variant type.

It is possible to declare more than one variable with the DIM keyword, by separating each variable with a comma:

DIM Count AS Integer, Txt AS String
Return to index

5.3.0 Enumerated types

There are cases where data can assume only a limited number of values, each one with a special meaning.
In order to properly describe these data, Sax basic provides the enumerated data type, built by enumerating all values that variables or constants of that type can assume.

An example where we could use an enumerated data type is when we need to store in a variable the person's sex. In this case the only values this variable can contain are MALE (enumerated as 1), and FEMALE (enumerated as 2).

Here is the declaration syntax of enumerated type:

ENUM <enumerated type name>
    <first  constant name> = <1st constant-value>
    <second constant name> = <2nd constant-value>
    <third  constant name> = <3rd constant-value> 
    ...
    <Nth constant name> = <Nth constant-value>
END ENUM

In accordance with the above syntax scheme, the ESex enumerated type is declared as follows:

ENUM ESex
    MALE=1
    FEMALE=2
END ENUM

Within the same enumerated type, all values must be of the same type. It isn't possible to declare an enumerated type having, for example, both an integer and a string constant.
Therefore the following declaration is wrong:

ENUM Sex
   MALE="Male"
   FEMALE=2
END ENUM

(the MALE constant is of string type, but the FEMALE constant is a number).

Enumerated types can only be globally declared, and not within a processing block. Global, local data and processing blocks are explained in chapter 6 "Processing blocks".

Return to index

5.4.0 Arrays

Sometimes we can come across with problems where it is necessary to keep in memory a great amount of data.
We know that a variable can contain only one value at a time, and therefore if we need for our program, let's say, 30 different data at the same time, we should define 30 variables, each one with a different name.
But this organization draws to data handling difficulties that we will meet with when we implement the code which works with these variables.

A solution is putting data in memory within a structure. If all data are homogeneous (that is, of the same type), it is possible to use a structure called as Array.
An array takes up a predefined contiguous memory area, employed to keep all homogeneous data: each datum is addressed by an index value. A datum addressed by an index is called as an array element

An array element within the code is identified by a name, followed by the index number enclosed by brackets.
For example, the 3rd element of an array named as TestArray, is represented as:

	TestArray(2)

We used index 2 instead if index 3: this is because by default the first array element is indexed as 0, therefore the 2nd and 3rd element are addressed respectively as 1 and 2.
An array element can be treated as a normal variable.

Array declarations are done with the same DIM keyword used to declare variables.
A static array is declared differently from a dynamic array.
When declaring a static array it is necessary to define its number of elements. Therefore the syntax is the following:

DIM <array-name>(<maximum index value>) AS <type>

where:

Here follows the declaration of TestArray static array containing 10 integer elements:

DIM TestArray(9) AS Integer

It is possible to change the index number of the 1st array element, by using the following alternative syntax:

DIM <array-name>(<minimum index value> TO <maximum index value>) AS <type>

where:

So, if we want declare our TestArray static array having the first element of index 1 and the last element of index 10 (10 integer elements), we should use the following syntax:

DIM TestArray(1 TO 10) AS Integer

Dynamic arrays when defined, doesn't contain any element. Therefore their declaration differs from Static arrays:

DIM <array-name>() AS <type>

Notice the empty brackets () following the <array-name>: they mean that array size is undefined. Array dimensioning and redimensioning is done later in the code execution part with the REDIM keyword:

REDIM <dynamic-array-name>(<maximum index value>) AS <type>

for 0-based dynamic arrays or:

REDIM <dynamic-array-name>(<minimum index value> TO <maximum index value>) AS <type>

for dynamic arrays with first index different or equal to 0.

The <type> must be the same used in the dynamic array declaration with the DIM keyword.
A dynamic array cannot be used until it has been dimensioned with the REDIM keyword.

In dynamic arrays declaration, both <minimum index value> and <maximum index value> can be defined with variables, so that an array can be dimensioned depending on the value of a script variable. So it is possible to declare a dynamic array in the following way:

DIM Size AS Integer
DIM TestArray() AS Integer
Size=10
REDIM TestArray(1 TO Size) AS Integer

The Size=10 code snippet sets variable Size to value 10, so that the array is dimensioned with Size elements.
With static arrays this isn't possible: only constants or explicit numbers can be used for <minimum index value> and <maximum index value>.

Another advantage of dynamic arrays is that they can change their size at run-time. Once the array has been dimensioned with the REDIM keyword, it is possible to increase or reduce its size, with the option to keep or not the values of its elements during the redimensioning process.

Redimensioning is done with the REDIM keyword again:

REDIM <dynamic-array-name>(<new maximum index value>) AS <type>

or:

REDIM <dynamic-array-name>(<new minimum index value> TO <new maximum index value>) AS <type>

The above redimensioning syntax doesn't preserve the array values. If you need to keep the values when redimensioning, it is necessary the use of the PRESERVE keyword:

REDIM PRESERVE <dynamic-array-name>(<NEW maximum index value>) AS <type>

or:

REDIM PRESERVE <dynamic-array-name>(<minimum index value> TO <NEW maximum index value>) AS <type>

A limitation of redimensioning while preserving array values is that it is only possible to increase or reduce the <maximum index value>, but the <minumum index value> must be equal to what defined when the dynamic array was dimensioned the first time.

Here follows the declaration and dimensioning of TestDynArray dynamic array with integer elements from index 1 to index 10:

DIM TestDynArray() AS Integer
REDIM TestDynArray(1 TO 10) AS Integer

To add to TestDynArray other 5 integer elements, while preserving the already existing values, the used syntax is the following:

REDIM PRESERVE TestDynArray(1 TO 15) AS Integer
Return to index

5.4.1 Bidimensional arrays (matrixes)

Array elements can be simple or complex.
A simple element contains only a value of a predefined type.
If an array contains elements representing other arrays (which can be named as component arrays), this array is called multidimensional array, containing complex elements.
If the elements of a component array are of simple type, the array is named as matrix or bidimensional array.

In order to address an element within a matrix, it is necessary the use of 2 indexes: a row index and a column index.
Usually, the first index identifies a row, and the second one identifies the column, but since this is just an accepted custom, not a rule, the order can be inverted.

If the name of our matrix is myMatrix, and we are looking for the element located at the 2nd row, 5th column, it is identified with:

	myMatrix(1, 4)

The total number of elements a bidimensional array contains is equal to the number of rows (1st dimension) multiplied by the number of columns (2nd dimension) of the matrix

Like a monodimensional array, a bidimensional array declaration is done with the DIM (for static arrays) or with DIM/REDIM (for dynamic arrays).
The only difference is that it is necessary to define two dimensions instead of one.

Here follows the declaration of the StatMatrix integer static bidimensional array, having 3 rows (3 elements for the 1st dimension), and 4 columns (4 elements for the 2nd dimension):

DIM StatMatrix(2, 3) AS Integer

The above matrix has for each dimension is 1st element, the index 0. If we want to declare the same array, but having the first dimension's starting element of index 1, the declaration is done in this way:

DIM StatMatrix(1 TO 3, 3) AS Integer

The first element of dimension 1 has index 1, and the first element of dimension 2 has index 0.
(the rule is: when declaring a dimension size, if it is NOT used the
<minimum index value> TO <maximum index value> syntax,
but only the
<maximum index value> one,
then the <minimum index value> is considered to be 0.

In the same way, let's declare a string dynamic 4x2 matrix, named as DynMatrix, having the 1st element of each dimension of index 1:

DIM DynMatrix() AS String
REDIM DynMatrix(1 TO 4, 1 TO 2) AS String

With the PRESERVE keyword (used to keep array values while redimensioning), it is possible to only change the <maximum index value> for the last dimension. If it is needed to preserve matrix values, the first dimension cannot be redimensioned.

Return to index

5.4.2 Multidimensional arrays

Sax basic is also capable of handling arrays having more than two dimensions (multidimensional arrays).
Like in a bidimensional array, a multidimensional array element is addressed by using as many indexes as the number of the array dimensions.

Therefore, if we have a tridimensional array used to identify a point in the space, we will use a three-coordinate system (x,y,z) to address the point.

The element at coordinate x=2, y=10, z=1 of the array my3D, is addressed with the following expression:

	 my3D(1, 9, 0)

Multidimensional arrays can take up a lot of memory: the total number of elements corresponds to the product between the number of elements of every dimension.

Declaration of multidimensional arrays is done in the same way as bidimensional arrays, the only difference is that it's necessary to define three or more dimensions beside of only two dimensions.
Let's define the my3D integer dynamic array of 5x10x3 elements:

DIM my3D() AS Integer
REDIM my3D(4, 9, 2) AS Integer

With previous declaration, each dimension of my3D array is 0-based (1st dimension element of index 0).
Again, the PRESERVE keyword permits keeping values during dynamic arrays redimensioning.

Return to index

5.5.0 User-defined types

A problem's data could be too complex for being described by simple variables or arrays.
For example, individual data most of the times are composed by a number of attributes which together identify the person, i.e. the first name, the last name, the age, sex, etc.

Therefore, there are cases when we may require more than a simple description of a series of data: we need to find the most suitable way to describe classes of data with characteristics in common, called as attributes.

This is the case when we need to create our personalized data-types, each one identifying a class with its own attributes. Every attribute has a type and a name (like a variable), and a a class is the container of this attributes.

Program data can be composed of several classes of data: a class is the definition of a complex data structure: it is composed by a finite number of logically connected elements of every type; each element (or attribute) contains a value.

A User-defined type represents a class with a name; this name identifies a new type. The definition of a user-defined type creates a new type.
A User-defined type variable is the stored representation of a class.

Declaration of a User-defined type is done with TYPE....END TYPE structure, containing a list of one or more elements:

TYPE
    <first element name> AS <type>
    <second element name> AS <type>
    <third element name> AS <type>
    ...
    <Nth element name> AS <type>
END TYPE

Like enumerated types, user-defined types can only be globally declared, and not within a processing block. Global, local data and processing blocks are explained in chapter 6 "Processing blocks".

If our data need a class containing an individual's identification information, like First Name, Last Name, Age, Sex, we could define a user defined type named as TIdentification, containing the following elements:

User-defined type: TPerson
This user-defined type contains data which identify of a person
Element nameElement typeElement description
FirstNameStringFirst name
LastNameStringLast name
AgeIntegerAge
SexIntegerSex: 1=Male; 2=Female

The above class is defined with the following syntax:

TYPE TIdentification
    FirstName AS String
    LastName AS String
    Age AS Integer
    Sex AS Integer
END TYPE

TIdentification is the user-defined type. A user-defined type variable is declared like a normal variable but with a user-defined type:

DIM Identification AS TIdentification

Class elements are addressed with the user-defined type variable name, followed by a dot (.) and the element's name.
The following expression addresses to the LastName element of the Identification user-defined type variable:

Identification.LastName

We declared Sex element is declared as integer, but we could notice that it can only assume two definite values: Male and Female. Therefore we could use an enumerated type instead:

' enumerated-type declaration:
ENUM ESex
    Male=1
    Female=2
END ENUM

' user-defined type declaration:
TYPE TIdentification
    FirstName AS String
    LastName AS String
    Age AS Integer
    Sex AS ESex
END TYPE

' user-defined type variable declaration:
DIM Identification AS TIdentification

Sax basic allows a class to contain elements representing other classes. More clearly, it is possible to define an element whose type is a user-defined type.
This is necessary only when the problem requires very complex data.

By extending the example above, the TIdentification class only defines the basic identification of an individual, but we may want to store other information on that individual, for example its address.
In order to define this case we should identify 3 classes: the TIdentification class, the TAddress class (whose elements may be the street and city name), and the TIndividual class whose elements are the Identification and Address user-defined type variables.
This is the case when we need to use for a class element another class.

Here follows the full declaration:

' enumerated-type declaration:
ENUM ESex
    Male=1
    Female=2
END ENUM

' user-defined type declaration:
TYPE TIdentification
    FirstName AS String
    LastName AS String
    Age AS Integer
    Sex AS ESex
END TYPE

TYPE TAddress
     Street AS String
     CityName AS String
END TYPE

TYPE TIndividual
     Identification AS TIdentification
     Address AS TAddress
END TYPE

' user-defined type variable declaration:
DIM Individual AS TIndividual

In order to address a subclass element, we first identify the main class, then the subclass and finally the subclass element.
For example, the Age element of the Identification subclass stored in the Individual class is addressed with the following expression:

	 Individual.Identification.Age

A user-defined type element may be an array, but only static arrays are allowed: a dynamic array cannot be a class element.

Again, for an individual we may need to also store the list of its phone numbers: this information may be properly stored into an array, and it could be another element for the Individual user-defined type.
The access of the individual's 2nd phone number is made with the expression:

	 Individual.PhoneNo(1)

(remember that by default, the first array element is addressed as 0).

A Sax Basic limitation doesn't allow the defininition of an array (static or dynamic) of user-defined types. This means that we can't define an array of Individuals, but only a number of user-defined variables of Individual type.

User-defined types can only be globally declared, and not within a processing block. Global, local data and processing blocks are explained in chapter 6 "Processing blocks".

Return to index

5.6.0 Object types

Objects are particular types. An object is a representation of a certain entity.
This representation contains information about the entity and the necessary knowledge to act on this internal status. This knowledge is exposed to the program in the form of a set of actions the object can perform on its status.

A program sees an object as a bunch of properties and methods. Properties are data the object decide to expose, while methods are the exposed actions.
An object can contain hidden data or actions: it is the object itself which decide what data or action to expose to the program or not. There could be read-only or write-only properties, in order to secure object's internal status.

Property
An property can be considered as a datum, encapsulated in the object, having its own type. A program can read and/or change a property value, depending on what the object permits.
Method
A method can be considered as an action to perform on the object, such as the action of moving or printing the object itself. A method can have one or more parameters, which give the additional information necessary to perform the action.

Basically, an object can be seen as a container where both data and code are put together. A class is an object definition: it specifies the object data, properties and methods. A class definition also include the code which implements the actions.

It is necessary to create an object before using it, so that an object can initialize itself.
In case of SPSS OLE objects, the ISpssApp object contains all SPSS OLE objects (any SPSS OLE object can be accessed by starting from this object's methods/properties), and it is has already been created by SPSS, referenced by the objSpssApp global variable.
Therefore SPSS OLE objects can be used within an SPSS script starting from the objSpssApp variable, without taking care of object creation.

An object variable is a reference to the object itself, so there could be two or more object variables representing the same object.

To clear things, suppose having an object called ObjectA. We define three variables, objA_var1, objA_var2, objA_var3 with the following code:

DIM objA_var1 AS ObjectA
DIM objA_var2 AS ObjectA
DIM objA_var3 AS ObjectA

What is stored in memory appears in the following figure:

Object variables

We notice that if we change prop1 of ObjectA by using variable objA_var1, and then we use objA_var2 or objA_var3 to read the same property, we will read the changed value even if it has been modified with variable objA_var1!

This is due to the fact that any of objA_var1, objA_var2, objA_var3 variables are a reference of the same object, so that all of them will access to the same ObjectA properties/methods

The difference is best identified if we compare the declaration of object variables with the declaration of non-object varables (like integer variables:

DIM int_var1 AS Integer
DIM int_var2 AS Integer
DIM int_var3 AS Integer

This is what appears in memory:

Standard variables

In this case, int_var1, int_var2, int_var3 take up different memory locations, and this means that they are each other indipendent, they don't rely on the same memory location as object variables do.

Return to index

6.0.0 Processing blocks

A Sax basic script is a structured language. This means that the language is made of elements, eventually grouped in blocks.
A script is mainly composed of two parts: a declaration and an execution part.

In order to make the top-down approach implementation easier, a script can be subdivided into several processing blocks, identified by name, each one representing a fragment of the problem to solve.
These processing blocks are composed of both the declaration and the execution part. In the block's execution part, there could be references to other blocks, so that a block can execute other blocks.

A script can only work if it contains a block representing the execution starting point. This processing block contains references (calls) to other blocks, and in this way it defines the script execution flow.
The name of this starter-block must be MAIN, and no other block can have this name. If a block isn't called by this main block, either directly or indirectly (that is, called by a block referenced in the MAIN block), it will never be executed.

Since a processing block contains both the declaration and execution part, and there could be several blocks, in a script exist either global and local data.
Global data are visible to any processing block; on the other way, local data are visible only in a block and cannot be seen from other blocks.

A data is visible when it can be read and/or modified (a constant can't be modified, anyway).

Global data lifetime lasts the entire script execution.
Local data lifetime is limited to the corresponding block execution lifetime. After the block has terminated its execution, its local data are lost.

Global data are declared outside execution blocks; on the other way, local data are declared within execution blocks.
Enumerated types and User-defined types can only be declared globally.

To sum up, any processing block can read (and modify, in case of variables, not constants) global data in addition to its local data.
Local data are private to the block, so two different processing blocks can have in their declaration part variables defined with the same name.
In the particular situation of a block having a local datum with the same name of a global datum, in this case the local one takes precedence, and makes the global datum not accessible to the block. But, for good programming practice, it's preferable not to use this "feature".

Usually, data are defined globally when they are used in many processing blocks, and they can be considered like the main script data.
On the other way, local data are used only within a block and they are used only to allow the processing block's execution part to accomplish its task.

Besides global and local data, a block can access to parameters data. This are data passed to the block when it's called from another block. They can determine how the block must behave, or they can be the data the block must work with and then return to the caller.

For example, if a processing block task is to sort data, a parameter could tell the block to whether make an ascendent or descendent sort. Another parameter could also pass the data to be sorted.
So in this case, with only one sorting block, it is possible to sort different data, passed as parameters to the sorting block.

Depending on their meanings, parameter data can be passed in different ways, but this is explained in the Subprograms chapter.

Return to index

7.0.0 Execution part: language basic structure

A Sax basic script is a structured language. This means that the language is made of elements, eventually grouped in blocks.
Within each element you always find one of more of the following main logic components

Return to index

7.1.0 Assignment

This logic component allows the assignment of an expression result to a variable.
An expression is concatenation of operands and operators which return a result. This result is then stored into the variable.
In Sax Script an assignment is defined in the following way:

[LET] <variable-name> = <expression>

The keyword LET can be omitted: in fact here it is enclosed into square brackets.

Square brackets must not be included in the code: they are used in a syntax definition in order to indicate that the keyword LET is optional, and they are not part of the syntax itself.
Angle brackets indicate that the word(s) they enclose is(are) only and indication of what should be inserted. Looking at the above case, in your code you need to replace <variable-name> with a valid variable name (e.g.: Mean), and <expression> with a valid expression (e.g.:Sum / N).
So, by following the above syntax definition, a valid code could be:

LET Mean = Sum / N

A variable used in our code is different from a variable used in SPSS Syntax. You can imagine of a "Sax Basic variable" as a box containing only one value. On the other way, a variable used in an SPSS syntax is a "representation" of one column of the active data file.

Each variable contains a value of a predefined type. This means that once you have assigned, for example, a number to a variable, then the same variable in the future will be able to only contain numbers. This allows you to call it a Numeric type variable or simply numeric variable.

Sax Script has several variable types. A special variable type is the Object type. For object variables, the assignment syntax is different:

SET <object-variable-name> = <object-expression>

In this case the SET keyword cannot be omitted. An object variable can contain any SPSS objects reference, which allow a script to automate SPSS.

Return to Language basic structure index

7.2.0 Conditional execution

In order to solve a problem, sometimes it is necessary to decide what action should be performed.
This decision is determined whether a certain condition occurs or doesn't occur. This situation in the Sax Script can be described with these structures:

Return to Language basic structure index

If...Then...Else

A decision in the Sax Script can be described in the following way:

IF <condition> THEN
    <sequence 1>
[ELSE
    <sequence 2>]
END IF

It means:

The logical <condition> is an expression returning the boolean result TRUE or FALSE.
The ELSE construct is optional: it is used only when it is also necessary to define an action when the logical <condition> is FALSE.
Suppose you want to select a pivot table's item if the table's title is equal to CROSSTAB. You can define the operation in this way:

IF table.TitleText = "CROSSTAB" THEN
    table-item.Selected = TRUE
END IF

The above example doesn't use the ELSE construct.
table is an object variable referring to the pivot table object; TitleText is a property of the pivot table object, containing the table's title (notice the dot "." which separates the object variable from its property).
An object property can be considered as a datum, encapsulated in the object, having its own type. The TitleText property contains a String type value (therefore it's a string property).
table-item is an object variable referring to the object of an SPSS Output Viewer item; Selected is an item object's property, specifying whether the item is selected (TRUE) or not (FALSE). This is a boolean property.

Now, imagine you need to extend the functionality of the previous code by adding the capability of deselecting the table's item ONLY if the pivot table's title IS NOT equal to CROSSTAB.
In other words if the table's title is equal to CROSSTAB you want to select the item, otherwise you want to deselect it. You can define the entire operation by writing this code:

IF table.TitleText = "CROSSTAB" THEN
    table-item.Selected = TRUE
ELSE
    table-item.Selected = FALSE
END IF

Like any other construct, it is possible to nest the IF...ELSE...ENDIF construct one inside another.
The above examples suppose that you already have identified a pivot table item. But in the SPSS Output Viewer there could be different items besides pivot table items. Therefore you need to discriminate pivot table items from other items. This can be done by using an IF structure again:

IF item.SPSSType = SPSSPivot THEN
    SET table = item.GetTableOLEObject
    IF table.TitleText = "CROSSTAB" THEN
        item.Selected = TRUE
    ELSE
        item.Selected = FALSE
    END IF
END IF

Now item is a general item object: its SPSSType property specifies the type of the output item; SPSSPivot identifies a pivot table item. When SPSSType is equal to SPSSPivot, it means that the general item is a pivot table item, containing a pivot table object.
The GetTableOLEObject method returns a reference to the item's not-activated pivot table object (a table is activated when it has been double-clicked). Notice the use of the SET keyword to assign the pivot table's object to the table object variable.
Look at the indentation of the second IF nested inside the first IF: this is a good way of writing scripts in order to have a better code comprehension.

Now, in our code we want to add the capability of selecting all text items. Here follows an implementation of this added functionality (SPSSText identify a text item):

IF item.SPSSType = SPSSPivot THEN
    SET table = item.GetTableOLEObject
    IF table.TitleText = "CROSSTAB" THEN
        item.Selected = TRUE
    ELSE
        item.Selected = FALSE
    END IF
ELSE
    IF item.SPSSType = SPSSText THEN
        item.Selected = TRUE
    END IF
END IF

And how can we write the code if do we want to also select all chart items? Here is the answer:

IF item.SPSSType = SPSSPivot THEN
    SET table = item.GetTableOLEObject
    IF table.TitleText = "CROSSTAB" THEN
        item.Selected = TRUE
    ELSE
        item.Selected = FALSE
    END IF
ELSE
    IF item.SPSSType = SPSSText THEN
        item.Selected = TRUE
    ELSE
        IF item.SPSSType = SPSSChart THEN
            item.Selected = TRUE
        END IF
    END IF
END IF

Does it seem to you that the code is becoming a little confusing? Don't worry. Fortunately the IF syntax has another structure that can help coding situations like this:

IF <condition> THEN
    <sequence 1>
[ELSEIF <condition> THEN
    <sequence 2>]...
[ELSE
    <else-sequence>]
END IF

It means:

By using this structure, we can write our code into a clearer way:

IF item.SPSSType = SPSSPivot THEN
    SET table = item.GetTableOLEObject
    IF table.TitleText = "CROSSTAB" THEN
        item.Selected = TRUE
    ELSE
        item.Selected = FALSE
    END IF
ELSE IF item.SPSSType = SPSSText THEN
    item.Selected = TRUE
ELSE IF item.SPSSType = SPSSChart THEN
    item.Selected = TRUE
END IF
Return to Conditional execution

Select...Case

When you need to perform a task depending on a certain expression results, Sax basic gives another facility besides the IF...ELSEIF...ELSE...END IF structure.
This situation can also be declared by using the SELECT CASE...END CASE statement:

SELECT CASE <expression>
   [CASE <case-expression>[, ...]
        <sequence>]...
   [CASE ELSE
        <else-sequence>]
END SELECT

It means:

In the <case-expression> is possible to use the keyword IS to make comparisons:

Expression Description
Is < expr Execute if less than
Is <= expr Execute if less than or equal to
Is > expr Execute if greater than
Is >= expr Execute if greater than or equal to
Is <> expr Execute if not equal to
expr1 To expr2 Execute if greater than or equal to expr1 and less than or equal to expr2
expr1, expr2, ... , exprN Execute if any of expr1, expr2, ... , exprN

Now, let's use the SELECT IF instruction to write our code in another way:

SELECT CASE item.SPSSType
CASE SPSSPivot
    SET table = item.GetTableOLEObject
    IF table.TitleText = "CROSSTAB" THEN
        item.Selected = TRUE
    ELSE
        item.Selected = FALSE
    END IF
CASE SPSSText, SPSSChart
    item.Selected = TRUE
END SELECT

Where:

It is interesting to notice that in the last CASE, we used both SPSSText and SPSSChart values (separated by a comma): in fact, whether item.SPSSType is equal to SPSSText or SPSSChart, the action to perform (item.Selected = TRUE) is always the same.

Return to Conditional execution
Return to Language basic structure index

7.3.0 Cycles

Sometimes, it is necessary to execute a sequence of actions many times, until a certain condition occurs. This is what it's called cycle.
Sax basic has three ways of implementing a cycle:

Return to Language basic structure index

Do...Loop <condition>

The DO...LOOP <condition> syntax structure is:

DO
    <sequence>
LOOP [UNTIL|WHILE] <condition>

The pipe (|) character means that you can put only one of the neighboring keywords. In the above example, [UNTIL|WHILE] can be translated to:

LOOP [UNTIL] <condition>

OR

LOOP [WHILE] <condition>

Therefore you can encounter the following situations:

Code Description
DO
    <sequence>
LOOP UNTIL <condition>
The code is repeatedly executed until the <condition> becomes TRUE.
In other words, stop the <sequence> code execution when the <condition> becomes TRUE.
DO
    <sequence>
LOOP WHILE <condition>
Repeatedly execute the <sequence> code while the <condition> is TRUE.
In other words, continue the <sequence> code execution if the <condition> is TRUE.

With the DO...LOOP <condition> cycle, the <sequence> code is executed at least one time, because the <condition> is tested at the end of the loop.

Return to Do...Loop

DO<condition>...Loop

The DO<condition>...LOOP syntax structure is:

DO [UNTIL|WHILE] <condition>
    <sequence>
LOOP

Therefore you can encounter the following situations:

Code Description
DO WHILE <condition>
    <sequence>
LOOP
If the <condition> is TRUE then continue the <sequence> code execution.
The WHILE..WEND syntax structure is an alias of this structure.
DO UNTIL <condition>
    <sequence>
LOOP
If the <condition> is FALSE then continue the <sequence> code execution.

With the DO <condition>...LOOP cycle, it is possible that the <sequence> code could never be executed, because the <condition> is tested at the loop beginning.

In your scripts maybe you will use one or two of these cycling structures. For example, I rarely use the DO <condition>...LOOP syntax structure. They all exist in order to give the necessary flexibility to make easier the coding of the diverse situations you can come across.

Return to Cycles

7.3.1 Do...Loop example

Now, with the Do...Loop construct, we have the knowledge of all Sax basic syntax constructs necessary to write the code which sends to the printer all pivot tables having the CROSSTAB title.
We have already defined in the Approach chapter the steps the script must follow:

  1. identify the active output viewer
  2. look at the first output item
    1. is the item a pivot table?
    2. if no, then go to step 8
    3. if yes, retrieve the table's title
    4. is the title equal to to CROSSTAB ?
    5. if yes, select the item
  3. look at the next output item
  4. repeat steps 3...8 until the last output item has been reached
  5. print the selected output

We have already seen the following constants/properties/methods:

SPSSPivot
It's a predefined constant identifying a pivot table item
Item.SPSSType
This property specifies the type of the output item referenced by the Item object
Item.GetTableOLEObject
This method returns a reference to a not-activated pivot table object (a table is activated when it has been double-clicked) identified by the Item object.

We still don't know some SPSS objects properties/methods to use in this case:

objSPSSApp
This is the main SPSS object: it contains all other objects
objSPSSApp.GetDesignatedOutputDoc
Return a reference to the currently active output document
OutputDoc.Items.Count
Return the number of items in the output document
OutputDoc.Items.GetItem(<ItemNo>)
Return a reference to the Item object at the <ItemNo> index in the output document (0 is the first item)
OutputDoc.PrintRange(<Range>)
Specify the range of items to print:
OutputDoc.PrintDoc
Send to the printer the items specified by OutputDoc.PrintRange

You may get confused of all this stuff about objects / properties / methods.
Now it isn't necessary to understand them clearly: you only need to know what they do and not what they are. You should try to focus on the Script syntax solely.

By looking at the SPSS objects we have, the list of steps we defined isn't ready for a script, and again we need to give more detail:

  1. identify the active output viewer
  2. initialize the item index variable (ItemNo) to the first output document's item (0=first item)
    1. get the output document's item of ItemNo index
      1. is the item a pivot table?
      2. if no, then go to step 10
      3. if yes, get the table object
      4. retrieve the table's title
      5. is the title equal to CROSSTAB ?
      6. if yes, select the item
    2. increment the item index variable (ItemNo) by 1 in order to look at the next item
  3. repeat steps 3..11 until the item index (ItemNo) is equal to the index of the last output item
  4. set the printing range to Selected items
  5. send to the printer the items specified by the printing range

Finally, here is the code to put into a script. For more clearance, each code line is preceded by the corresponding step(s) number:

  (1)  SET OutputDoc = objSPSSApp.GetDesignatedOutputDoc
  (2)  ItemNo=0
       DO
  (3)     SET Item = OutputDoc.Items.GetItem(ItemNo)
(4,5)     IF Item.SPSSType = SPSSPivot THEN
  (6)        SET Table = Item.GetTableOLEObject
(7,8)        IF Table.TitleText = "CROSSTAB" THEN
  (9)           Item.Selected = TRUE
             END IF
          END IF
 (10)     ItemNo = ItemNo + 1
 (11)  LOOP UNTIL ItemNo=OutputDoc.Items.Count
 (12)  OutputDoc.PrintRange(1)
 (13)  OutputDoc.PrintDoc
Return to Cycles

For...Next

This cycling structure is used when the number of cycles to perform is known and doesn't depend from the code executed every cycle.

In a DO...LOOP structure, the number of cycles may not be known in advance, it could be dynamically determined by the execution of the action specified by the code inside the cycle.
On the other way, in a FOR...NEXT structure the number of cycles is fixed, and cannot be modified by the code executed every cycle.

The FOR...NEXT cycling structure uses an index variable whose value is incremented at each cycle. The code repeated every cycle can use this variable.
Although the index variable can be modified by the code inside the cycle, it isn't good practice the change of its value, because if you are not sure of what you're doing, your code may work in a way you don't expect.

Looking at the Do...Loop example above, we know the number of times the code inside the DO...LOOP is run: it is equal to the number of items contained in the output document, that is OutputDoc.Items.Count.
Therefore, the same example is suitable to be coded with a FOR...NEXT cycle.

And now let's look at the FOR..NEXT structure syntax:

FOR <index-variable> = <start-value> To <last-value> [Step <increment-value>]
    <sequence>
NEXT [<index-variable>]

It means:

  1. Initialize the <index-variable> with <start-value>;
  2. Execute the sequence of actions specified by <sequence>;
  3. Increment the <index-variable> by the <increment-value>.
    If the <increment-value> is omitted, then increment the <index-variable> by 1;
  4. If the <increment-value> is greater than 0, then return to step 2 if the <index-variable> is less than the <last-value>
  5. If the <increment-value> is less than 0, then return to step 2 if the <index-variable> is greater than the <last-value>
  6. <start-value>, <last-value>, <increment- value> can be a constant, a literal value, a variabile or an expression.

Like any construct, a FOR...NEXT loop can be nested into any other construct,

In general, <start-value> is less than <last-value>. This happens in case you want to increment the <index-variable>.
But if you need the cycle to decrement the <index-variable>, <start-value> should be greater than <last-value>, and you should specify <increment-value> as a negative number.

If you omit the <increment-value> while <start-value> is greater than <last-value>, the cycle will never end, just because the <increment-value>, when missing, is considered as 1, and the <index-variable> will be always increased by 1 and it will never match <last-value>.

So, the FOR..NEXT cycle can be used in 2 situations:

7.3.2 For...Next example

Now, we can modify the Do...Loop example with the following code which makes use of the FOR...NEXT code structure instead:

SET OutputDoc = objSPSSApp.GetDesignatedOutputDoc
FOR ItemNo = 0 TO OutputDoc.Items.Count-1
    SET Item = OutputDoc.Items.GetItem(ItemNo)
    IF Item.SPSSType = SPSSPivot THEN
        SET Table = Item.GetTableOLEObject
        IF Table.TitleText = "CROSSTAB" THEN
            Item.Selected = TRUE
        END IF
    END IF
NEXT ItemNo

Notice that for the <last-value> we have used OutputDoc.Items.Count-1 instead of OutputDoc.Items.Count because the OutputDoc.Items.GetItem method's index is zero-based, therefore the first item is indexed as 0, and the last item is indexed by the Items.Count property value less 1.

Return to Cycles
Return to Language basic structure index
Return to index

8.0.0 Subprograms

As explained in the Processing blocks chapter, in order to make the top-down approach implementation easier, a script can be subdivided into several processing blocks, identified by name, each one representing a fragment of the problem to solve.

During a problem analysis and in the following program writing, there are some situations that could come up:

In everyone of these cases, subprograms are used. A subprogram is a part of a program which solves a subproblem.
A subprogram represents a processing block declaration.

A subprogram is identified by a name, which usually represents the meaning of the action the subprogram performs.
Obviously, in a program it isn't possible to have two subprograms with the same name.

In every program must always exists a subprogram named as MAIN: this is the subprogram that calls other subprograms, and defines the program starting point.

In order to avoid to rewrite the same algorithm, not only for the need of executing the same code, but also for applying it to different variables, it is possible to pass parameters to a subrogram.

Sax basic distinguishes two types of subprograms: subroutines or functions. A description of the two here follows.

8.1.0 Subroutines

A subroutine is an instructions grouping whose target is solve a problem.
It is declared in the following way:

SUB <subroutine-name> [(<parameter-list>)]
    <instruction-sequence>
END SUB

where:

8.1.0 Procedures and functions

  • A procedure is a synonym of a subprogram which doesn't return any value.
  • A function is a subprogram which contains an algorithm returning a value
  • Return to index

    Click Here!