This site section tries to describe the Basic language used by SPSS scripts.
It isn't an exhaustive description of the language: this section should be
considered as a place where to start writing a script in Sax basic.
If you already have some VBA programming experience (for example Excel macros),
you can skip this section of the site entirely and go directly to the SPSS objects section.
It's my opinion that it's easier to learn something if you have some
examples to look at.
Therefore examples here will refer to SPSS, by solving problems an SPSS
script may encounter.
What is a programming language? It's a description given to the
computer of how to perform a specific task, by following a syntax scheme.
You may think of SPSS syntax as a programming language: you describe
the things to be accomplished by using the commands made available by SPSS.
The difference from the scripts programming language is that an SPSS
syntax can perform powerful things with a shorter number of code lines.
For example, if you need to sort your data file, with the SPSS syntax you
can do it with a single line of code, by using the SORT command.
To accomplish a similar operation (such as sorting a list of
values) with the scripting language, it can require a much greater number
of code lines.
The drawback is that SPSS syntax is limited to perform only pre-defined actions
on the data file: have you ever thought to write a new statistical procedure
by using the SPSS syntax? It's impossible!
On the other way, theoretically with a script you can write a your own
statistical procedure.
Syntax only works on the SPSS data file: it is somehow "connected" to the
SPSS processor.
Scripts can be "connected" to the SPSS processor, to the viewer window,
to the syntax window. With a script you can do everything a syntax is
able to do, and much more!
Generally speaking, with a script you can do quite everything, but it
is more difficult to learn than SPSS syntax.
But once you've learned it, you have a lot of power in you hands, by having
the possibility to automate operations that, without a script, you would
perform manually.
The goal of a script is to perform a task. Writing a script
means defining steps the computer must follow in order to accomplish that task.
For example, suppose you need to write a script which sends to the printer
all pivot tables containing a certain title.
The sequence of actions (i.e. the steps described above) the script
must perform is the following:
You may think of the above actions as macro steps: in fact, in order to be converted into a script, they need to be more detailed, especially step 3. It can be split into the following simpler steps:
Now, let's look at the full sequence of actions:
Are you surprised of the number of steps necessary to accomplish a simple
task like this?
If you think carefully, it isn't so strange. In fact the above
actions are exactly the same you would perform if you would carry out that
task manually.
The reason is that they are too simple, and you don't take care of them
because you consider them implicitly.
But, in order to explain the task to the computer, you need to define
with great detail all steps to perform!
How can you avoid to get confused of details?
You need a rationalization of the problem.
First, it is necessary to define the task into big steps. After that, you
should divide each big step into several sub-steps. If necessary, a
further split of these newly defined sub-steps need to be done until the
maximum detail level has been reached.
The maximum detail level is reached when you define a step by
only using the syntax commands made available by the language.
This approach is called top-down method: it consists of the
fragmentation of the problem to solve into more elementary sub-problems.
So, when you are starting to write a program, the first thing you should do
is a problem analysis, in order to fully comprehend it, and the goals
to meet.
Generally, in order to accomplish a task, a script needs to use some intermediate data. We know that a program (in our case a script) is made of a sequence of actions (an algorithm) which form the description of what the program must do.
But every action needs "to act" on something, and this "something" is represented by intermediate
data, which are different from the permanent data stored in a file such as the SPSS data file.
Script data lifetime is limited to the execution time of the script itself. Script data don't need to be
permanent: they are discarded once the script terminates.
This implies that data must be defined in the the script code, by taking up a particular section, called as
the program data definition section.
Therefore a script is composed by two main parts: the data definition and the action definition or algorithm, which is the definition of the actions to execute.
Depending on the type of task to perform, the algorithm can use different type of data, for example numeric data or
data representing a sequence of characters.
Every datum needs to be identified by a type and a name. The name is a short description of what the
datum represents in the algorithm, and it is necessary for the identification of a piece of information among data.
For example, suppose we would need to write a script which selects all pivot tables contained in the
Output Navigator.
In order to accomplish a task like this, our script will need to scan every Output Navigator Item and then select the item
if it represents a pivot table.
The data the algorithm needs to use are: number of items contained in the Output Navigator, current item number,
and a number which is the code distinguishing the pivot table item among other Output Navigator items.
We could name the first datum as NumOfItems, the second one as CurrentItemNo, and the last datum as
PivotTableCode. All are numeric data.
In conclusion:
Both variables and constants can contain data of different types.
In a script it isn't possible to have two variables with the same name but with different type, and it
isn't possible to have both a constant and a variable with the same name.
An optional way of declaring the constant/variable type allows the use, in the constant/variable name, of a trailing character which
specifies the type. For example, a variable having the A$ name, is a String variable:
notice the $ trailing character which identifies the String type.
A constant/variable name must start with a letter. Following characters may be a letter, an underscore or a digit.
If the trailing character identifying the type isn't used in the variable name, it is always necessary the definition of the variable type by specifying the full type name. The used syntax is specified in the declaration part.
If in the variable declaration isn't specified the type and if variable name doesn't contain the trailing-type character, variable will be considered of Variant type.
Sax Basic Script can handle the following data types:
True or False. A = "This is a string" A = "1"
B = A + 1A
variable contains the "1" string which has been added to the 1 number; . A Sax basic script is a structured language. This means that the
language is made of elements, eventually grouped in blocks.
A script is mainly composed of two parts: a declaration and an
execution part.
Constants are data already known before the script runs which don't change during the script execution. Like a variable, a constant is represented by a name and a type, and its value is fixed.
Constant declaration is done with the following syntax:
CONST <constant-name> = <value>
where:
<variable-name>: name of the constant to
declare
<value>: constant valueConstant type is defined by the value assigned to the constant. For example, if a number is assigned to the constant, its type is number; if a string is assigned to the constant, its type is string.
In accordance with syntax above, the TITLE constant (containing the "My Title" string value) is declared in the following way:
CONST TITLE="My Title"
It is possible to declare more than one constant with the CONST keyword,
by separating each constant with a comma:
CONST TITLE="My Title", WIDTH=100, HEIGHT=80
The example above declares the two numeric constants WIDTH and HEIGHT besides the TITLE string constant.
Return to indexA script variable is declared with the DIM keyword in the following
way:
DIM <variable-name> AS <type>
where:
<variable-name>: is the name of variable to declare
<type>: is the variable type: this can be any of script data
types, listed in the Variable types chapter.If <variable-name> contains a trailing character specifying the
variable type (as explained in the Variable types chapter),
then the
AS <type>
part must be omitted.
For example, to declare variable Count as Integer and variable Txt as String, the script syntax is:
DIM Count AS Integer DIM Txt AS String
With the use of trailing characters, we define variables Count% (% defines an Integer) and Txt$ ($ defines a String) in the following way:
DIM Count% DIM Txt$
If no trailing character is used and there isn't an explict type declaration
(AS <type> part missing), then the variable is considered to be
of Variant type.
It is possible to declare more than one variable with the DIM keyword,
by separating each variable with a comma:
DIM Count AS Integer, Txt AS StringReturn to index
There are cases where data can assume only a limited number of values, each one
with a special meaning.
In order to properly describe these data, Sax basic provides
the enumerated data type, built by enumerating all values that variables
or constants of that type can assume.
An example where we could use an enumerated data type is when we need
to store in a variable the person's sex. In this case the only values this variable can
contain are MALE (enumerated as 1), and FEMALE (enumerated as 2).
Here is the declaration syntax of enumerated type:
ENUM <enumerated type name>
<first constant name> = <1st constant-value>
<second constant name> = <2nd constant-value>
<third constant name> = <3rd constant-value>
...
<Nth constant name> = <Nth constant-value>
END ENUM
In accordance with the above syntax scheme, the ESex enumerated type is declared as follows:
ENUM ESex
MALE=1
FEMALE=2
END ENUM
Within the same enumerated type, all values must be of the same type. It isn't
possible to declare an enumerated type having, for example, both an integer and a
string constant.
Therefore the following declaration is wrong:
ENUM Sex MALE="Male" FEMALE=2 END ENUM
(the MALE constant is of string type, but the FEMALE constant is a number).
Enumerated types can only be globally declared, and not within a processing block. Global, local data and processing blocks are explained in chapter 6 "Processing blocks".
Return to indexSometimes we can come across with problems where it is necessary to keep in memory a great amount of data.
We know that a variable can contain only one value at a time, and therefore if we need for our program, let's say, 30 different data at the same
time, we should define 30 variables, each one with a different name.
But this organization draws to data handling difficulties that we will meet with when we implement the code which works
with these variables.
A solution is putting data in memory within a structure. If all data are homogeneous (that is, of the same type), it is possible to use
a structure called as Array.
An array takes up a predefined contiguous memory area, employed to keep all homogeneous data: each datum is addressed
by an index value. A datum addressed by an index is called as an array element
An array element within the code is identified by a name, followed by the index number enclosed by brackets.
For example, the 3rd element of an array named as TestArray, is represented as:
TestArray(2)
We used index 2 instead if index 3: this is because by default the first array
element is indexed as 0, therefore the 2nd and 3rd element are
addressed respectively as 1 and 2.
An array element can be treated as a normal variable.
Array declarations are done with the same DIM keyword used to declare
variables.
A static array is declared differently from a dynamic array.
When declaring a static array it is necessary to define
its number of elements. Therefore the syntax is the following:
DIM <array-name>(<maximum index value>) AS <type>
where:
<array-name>: is the array name
<maximum index value>: is a number specifying the latest array
index: it defines the array size.<type>: is the type of array elements: this can be any of
script data types, listed in the Variable types chapter.Here follows the declaration of TestArray static array containing 10 integer elements:
DIM TestArray(9) AS Integer
It is possible to change the index number of the 1st array element, by using the following alternative syntax:
DIM <array-name>(<minimum index value> TO <maximum index value>) AS <type>
where:
<minimum index value>: is the index of the first array element
<maximum index value>: is the index of the last array
elementSo, if we want declare our TestArray static array having the first element of index 1 and the last element of index 10 (10 integer elements), we should use the following syntax:
DIM TestArray(1 TO 10) AS Integer
Dynamic arrays when defined, doesn't contain any element. Therefore their declaration differs from Static arrays:
DIM <array-name>() AS <type>
Notice the empty brackets () following the <array-name>: they
mean that array size is undefined. Array dimensioning and redimensioning is
done later in the code execution part with the REDIM keyword:
REDIM <dynamic-array-name>(<maximum index value>) AS <type>
for 0-based dynamic arrays or:
REDIM <dynamic-array-name>(<minimum index value> TO <maximum index value>) AS <type>
for dynamic arrays with first index different or equal to 0.
The <type> must be the same used in the dynamic array
declaration with the DIM keyword.
A dynamic array cannot be used until it has been dimensioned with the REDIM keyword.
In dynamic arrays declaration, both <minimum index value>
and <maximum index value> can be defined with variables, so that an array can
be dimensioned depending on the value of a script variable. So it is possible to declare a
dynamic array in the following way:
DIM Size AS Integer DIM TestArray() AS Integer Size=10 REDIM TestArray(1 TO Size) AS Integer
The Size=10 code snippet sets variable Size to value 10,
so that the array is dimensioned with Size elements.
With static arrays this isn't possible: only
constants or explicit numbers can be used for <minimum index value>
and <maximum index value>.
Another advantage of dynamic arrays is that they can change
their size at run-time. Once the array has been dimensioned with the REDIM keyword, it
is possible to increase or reduce its size, with the option to keep or not the values
of its elements during the redimensioning process.
Redimensioning is done with the REDIM keyword again:
REDIM <dynamic-array-name>(<new maximum index value>) AS <type>
or:
REDIM <dynamic-array-name>(<new minimum index value> TO <new maximum index value>) AS <type>
The above redimensioning syntax doesn't preserve the array values. If you need to keep
the values when redimensioning, it is necessary the use of the PRESERVE keyword:
REDIM PRESERVE <dynamic-array-name>(<NEW maximum index value>) AS <type>
or:
REDIM PRESERVE <dynamic-array-name>(<minimum index value> TO <NEW maximum index value>) AS <type>
A limitation of redimensioning while preserving array values is that it is only possible
to increase or reduce the <maximum index value>, but the
<minumum index value> must be equal to what defined when the
dynamic array was dimensioned the first time.
Here follows the declaration and dimensioning of TestDynArray dynamic array with integer elements from index 1 to index 10:
DIM TestDynArray() AS Integer REDIM TestDynArray(1 TO 10) AS Integer
To add to TestDynArray other 5 integer elements, while preserving the already existing values, the used syntax is the following:
REDIM PRESERVE TestDynArray(1 TO 15) AS IntegerReturn to index
Array elements can be simple or complex.
A simple element contains only a value of a predefined type.
If an array contains elements representing other arrays (which can be named as component arrays), this array
is called multidimensional array, containing complex elements.
If the elements of a component array are of simple type, the array is named as matrix or
bidimensional array.
In order to address an element within a matrix, it is necessary the use of 2 indexes: a row index and
a column index.
Usually, the first index identifies a row, and the second one identifies the column, but since this is just an accepted
custom, not a rule, the order can be inverted.
If the name of our matrix is myMatrix, and we are looking for the element located at the 2nd row, 5th column, it is identified with:
myMatrix(1, 4)
The total number of elements a bidimensional array contains is equal to the number of rows (1st dimension) multiplied by the number of columns (2nd dimension) of the matrix
Like a monodimensional array, a bidimensional array declaration is done with the DIM
(for static arrays) or with DIM/REDIM
(for dynamic arrays).
The only difference is that it is necessary to define two dimensions instead of one.
Here follows the declaration of the StatMatrix integer static bidimensional array, having 3 rows (3 elements for the 1st dimension), and 4 columns (4 elements for the 2nd dimension):
DIM StatMatrix(2, 3) AS Integer
The above matrix has for each dimension is 1st element, the index 0. If we want to declare the same array, but having the first dimension's starting element of index 1, the declaration is done in this way:
DIM StatMatrix(1 TO 3, 3) AS Integer
The first element of dimension 1 has index 1, and the first element of dimension 2 has index 0.
(the rule is: when declaring a dimension size, if it is NOT used the
<minimum index value> TO <maximum index value> syntax,
but only the
<maximum index value> one,
then the <minimum index value> is considered to be 0.
In the same way, let's declare a string dynamic 4x2 matrix, named as DynMatrix, having the 1st element of each dimension of index 1:
DIM DynMatrix() AS String REDIM DynMatrix(1 TO 4, 1 TO 2) AS String
With the PRESERVE keyword (used to keep array values while redimensioning),
it is possible to only change the <maximum index value> for the
last dimension. If it is needed to preserve matrix values, the first dimension
cannot be redimensioned.
Sax basic is also capable of handling arrays having more than two dimensions
(multidimensional arrays).
Like in a bidimensional array, a multidimensional array element is addressed by using as
many indexes as the number of the array dimensions.
Therefore, if we have a tridimensional array used to identify a point in the space, we will use a three-coordinate system (x,y,z) to address the point.
The element at coordinate x=2, y=10, z=1 of the array my3D, is addressed with the following expression:
my3D(1, 9, 0)
Multidimensional arrays can take up a lot of memory: the total number of elements corresponds to the product between the number of elements of every dimension.
Declaration of multidimensional arrays is done in the same way as bidimensional arrays,
the only difference is that it's necessary to define three or more dimensions beside of only
two dimensions.
Let's define the my3D integer
dynamic array of 5x10x3 elements:
DIM my3D() AS Integer REDIM my3D(4, 9, 2) AS Integer
With previous declaration, each dimension of my3D array is 0-based (1st
dimension element of index 0).
Again, the PRESERVE keyword permits keeping values during dynamic arrays
redimensioning.
A problem's data could be too complex for being described by simple variables
or arrays.
For example, individual data most of the times are composed by a number of attributes
which together identify the person, i.e. the first name, the last name, the age, sex, etc.
Therefore, there are cases when we may require more than a simple description of a series of data: we need to find the most suitable way to describe classes of data with characteristics in common, called as attributes.
This is the case when we need to create our personalized data-types, each one identifying a class with its own attributes. Every attribute has a type and a name (like a variable), and a a class is the container of this attributes.
Program data can be composed of several classes of data: a class is the definition of a complex data structure: it is composed by a finite number of logically connected elements of every type; each element (or attribute) contains a value.
A User-defined type represents a class with a name; this name identifies a new type. The
definition of a user-defined type creates a new type.
A User-defined type variable is the stored representation of a class.
Declaration of a User-defined type is done with TYPE....END TYPE
structure, containing a list of one or more elements:
TYPE
<first element name> AS <type>
<second element name> AS <type>
<third element name> AS <type>
...
<Nth element name> AS <type>
END TYPE
Like enumerated types, user-defined types can only be globally declared, and not within a processing block. Global, local data and processing blocks are explained in chapter 6 "Processing blocks".
If our data need a class containing an individual's identification information, like First Name, Last Name, Age, Sex, we could define a user defined type named as TIdentification, containing the following elements:
| This user-defined type contains data which identify of a person | ||
| Element name | Element type | Element description |
|---|---|---|
| FirstName | String | First name |
| LastName | String | Last name |
| Age | Integer | Age |
| Sex | Integer | Sex: 1=Male; 2=Female |
The above class is defined with the following syntax:
TYPE TIdentification
FirstName AS String
LastName AS String
Age AS Integer
Sex AS Integer
END TYPE
TIdentification is the user-defined type. A user-defined type variable is declared like a normal variable but with a user-defined type:
DIM Identification AS TIdentification
Class elements are addressed with the user-defined type variable name, followed by a dot (.)
and the element's name.
The following expression addresses to the LastName element of the Identification user-defined
type variable:
Identification.LastName
We declared Sex element is declared as integer, but we could notice that it can only assume two definite values: Male and Female. Therefore we could use an enumerated type instead:
' enumerated-type declaration:
ENUM ESex
Male=1
Female=2
END ENUM
' user-defined type declaration:
TYPE TIdentification
FirstName AS String
LastName AS String
Age AS Integer
Sex AS ESex
END TYPE
' user-defined type variable declaration:
DIM Identification AS TIdentification
Sax basic allows a class to contain elements representing other classes. More clearly,
it is possible to define an element whose type is a user-defined type.
This is necessary only when the problem requires very complex data.
By extending the example above, the TIdentification class only defines the basic identification of
an individual, but we may want to store other information on that individual,
for example its address.
In order to define this case we should identify 3 classes: the TIdentification class,
the TAddress class (whose elements may be the street and city name), and the TIndividual class
whose elements are the Identification and Address user-defined type variables.
This is the case when we need to use for a class element another class.
Here follows the full declaration:
' enumerated-type declaration:
ENUM ESex
Male=1
Female=2
END ENUM
' user-defined type declaration:
TYPE TIdentification
FirstName AS String
LastName AS String
Age AS Integer
Sex AS ESex
END TYPE
TYPE TAddress
Street AS String
CityName AS String
END TYPE
TYPE TIndividual
Identification AS TIdentification
Address AS TAddress
END TYPE
' user-defined type variable declaration:
DIM Individual AS TIndividual
In order to address a subclass element, we first identify the main class,
then the subclass and finally the subclass element.
For example, the Age element of the Identification subclass stored in the Individual class is addressed with the following
expression:
Individual.Identification.Age
A user-defined type element may be an array, but only static arrays are allowed: a dynamic array cannot be a class element.
Again, for an individual we may need to also store the list of its phone numbers:
this information may be properly stored into an array, and it could be another element for
the Individual user-defined type.
The access of the individual's 2nd phone number is made with the expression:
Individual.PhoneNo(1)
(remember that by default, the first array element is addressed as 0).
A Sax Basic limitation doesn't allow the defininition of an array (static or dynamic) of user-defined types. This means that we can't define an array of Individuals, but only a number of user-defined variables of Individual type.
User-defined types can only be globally declared, and not within a processing block. Global, local data and processing blocks are explained in chapter 6 "Processing blocks".
Return to indexObjects are particular types. An object is a representation of a certain entity.
This representation contains information about the entity and the necessary knowledge
to act on this internal status. This knowledge is exposed to the program in the form of
a set of actions the object can perform on its status.
A program sees an object as a bunch of properties and methods.
Properties are data the object decide to expose, while methods are the exposed actions.
An object can contain hidden data or actions: it is the object itself which decide what
data or action to expose to the program or not. There could be read-only or write-only
properties, in order to secure object's internal status.
Basically, an object can be seen as a container where both data and code are put together. A class is an object definition: it specifies the object data, properties and methods. A class definition also include the code which implements the actions.
It is necessary to create an object before using it, so that an object can
initialize itself.
In case of SPSS OLE objects, the
ISpssApp object contains all SPSS OLE objects
(any SPSS OLE object can be accessed by starting from this object's methods/properties),
and it is has already been created by SPSS, referenced by the objSpssApp global
variable.
Therefore SPSS OLE objects can be used within an SPSS script
starting from the objSpssApp variable, without taking care of object creation.
An object variable is a reference to the object itself, so there could be two or more object variables representing the same object.
To clear things, suppose having an object called ObjectA. We define three variables, objA_var1, objA_var2, objA_var3 with the following code:
DIM objA_var1 AS ObjectA DIM objA_var2 AS ObjectA DIM objA_var3 AS ObjectA
What is stored in memory appears in the following figure:

We notice that if we change prop1 of ObjectA by using variable objA_var1, and then we use objA_var2 or objA_var3 to read the same property, we will read the changed value even if it has been modified with variable objA_var1!
This is due to the fact that any of objA_var1, objA_var2, objA_var3 variables are a reference of the same object, so that all of them will access to the same ObjectA properties/methods
The difference is best identified if we compare the declaration of object variables with the declaration of non-object varables (like integer variables:
DIM int_var1 AS Integer DIM int_var2 AS Integer DIM int_var3 AS Integer
This is what appears in memory:

In this case, int_var1, int_var2, int_var3 take up different memory locations, and this means that they are each other indipendent, they don't rely on the same memory location as object variables do.
Return to indexA Sax basic script is a structured language. This means that the
language is made of elements, eventually grouped in blocks.
A script is mainly composed of two parts: a declaration and an
execution part.
In order to make the top-down approach
implementation easier, a script can be subdivided into several processing blocks,
identified by name, each one representing a fragment of the problem to solve.
These processing blocks are composed of both the declaration and the
execution part.
In the block's execution part, there could be references to other blocks, so that
a block can execute other blocks.
A script can only work if it contains a block representing the
execution starting point. This processing block contains references (calls)
to other blocks, and in this way it defines the script execution flow.
The name of this starter-block must be MAIN, and no other block can have
this name. If a block isn't called by this main block, either directly or
indirectly (that is, called by a block referenced in the MAIN block), it will
never be executed.
Since a processing block contains both the declaration and execution part, and
there could be several blocks, in a script exist either global and local
data.
Global data are visible to any processing block; on the other way, local
data are visible only in a block and cannot be seen from other blocks.
A data is visible when it can be read and/or modified (a constant can't be modified, anyway).
Global data lifetime lasts the entire script execution.
Local data lifetime is limited to the corresponding block execution lifetime.
After the block has terminated its execution, its local data are lost.
Global data are declared outside execution blocks;
on the other way, local data are declared within execution blocks.
Enumerated types and
User-defined types can only be declared globally.
To sum up, any processing block can read (and modify, in case of variables, not constants)
global data in addition to its local data.
Local data are private to the block, so two different processing blocks can
have in their declaration part variables defined with the same name.
In the particular situation of a block having a local datum with the same name of a
global datum, in this case the local one takes precedence, and makes the global
datum not accessible to the block. But, for good programming practice, it's
preferable not to use this "feature".
Usually, data are defined globally when they are used in many processing blocks, and
they can be considered like the main script data.
On the other way, local data are used only within a block and they are used only
to allow the processing block's execution part to accomplish its task.
Besides global and local data, a block can access to parameters data. This are data passed to the block when it's called from another block. They can determine how the block must behave, or they can be the data the block must work with and then return to the caller.
For example, if a processing block task is to sort data, a parameter could tell
the block to whether make an ascendent or descendent sort. Another parameter could
also pass the data to be sorted.
So in this case, with only one sorting block, it is possible to sort different data,
passed as parameters to the sorting block.
Depending on their meanings, parameter data can be passed in different ways, but this is explained in the Subprograms chapter.
Return to indexA Sax basic script is a structured language. This means that the
language is made of elements, eventually grouped in blocks.
Within each element you always find one of more of the following main
logic components
This logic component allows the assignment of an expression result to
a variable.
An expression is concatenation of operands and operators which return a result.
This result is then stored into the variable.
In Sax Script an assignment is defined in the following way:
[LET] <variable-name> = <expression>
The keyword LET can be omitted: in fact here it is enclosed
into square brackets.
Square brackets must not be included in the code: they are used in
a syntax definition in order to indicate that the keyword LET
is optional, and they are not part of the syntax itself.
Angle brackets indicate that the word(s) they enclose is(are) only and
indication of what should be inserted. Looking at the above case, in
your code you need to replace <variable-name> with a valid
variable name (e.g.: Mean), and <expression> with a valid
expression (e.g.:Sum / N).
So, by following the above syntax definition, a valid code could be:
LET Mean = Sum / N
A variable used in our code is different from a variable used in SPSS Syntax. You can imagine of a "Sax Basic variable" as a box containing only one value. On the other way, a variable used in an SPSS syntax is a "representation" of one column of the active data file.
Each variable contains a value of a predefined type. This means that once you have assigned, for example, a number to a variable, then the same variable in the future will be able to only contain numbers. This allows you to call it a Numeric type variable or simply numeric variable.
Sax Script has several variable types. A special variable type is the Object type. For object variables, the assignment syntax is different:
SET <object-variable-name> = <object-expression>
In this case the SET keyword cannot be omitted.
An object variable can contain any SPSS objects
reference, which allow a script to automate SPSS.
In order to solve a problem, sometimes it is necessary to decide what action
should be performed.
This decision is determined whether a certain condition occurs or doesn't
occur. This situation in the Sax Script can be described with these
structures:
A decision in the Sax Script can be described in the following way:
IF <condition> THEN
<sequence 1>
[ELSE
<sequence 2>]
END IF
It means:
<condition> is TRUE, then execute
the sequence of actions specified by <sequence 1>;<condition> is FALSE, then execute
the sequence of actions specified by <sequence 2>;The logical <condition> is an expression returning the boolean
result TRUE or FALSE.
The ELSE construct is optional: it is used only when it is also necessary to
define an action when the logical <condition> is FALSE.
Suppose you want to select a pivot table's item if the table's title is
equal to CROSSTAB.
You can define the operation in this way:
IF table.TitleText = "CROSSTAB" THEN
table-item.Selected = TRUE
END IF
The above example doesn't use the ELSE construct.
table is an object variable referring to the pivot table object;
TitleText is a property of the pivot table object,
containing the table's title (notice the dot "." which separates
the object variable from its property).
An object property can be considered as a datum,
encapsulated in the object, having its own type. The TitleText
property
contains a String type value (therefore it's a string
property).
table-item is an object variable referring to the object of an SPSS
Output Viewer item; Selected is an item object's property,
specifying whether the item is selected (TRUE) or
not (FALSE). This is a boolean property.
Now, imagine you need to extend the functionality of the previous
code by adding the capability of deselecting the table's item ONLY if the
pivot table's title IS NOT equal to CROSSTAB.
In other words if the table's title is equal to CROSSTAB you want to select
the item, otherwise you want to deselect it.
You can define the entire operation by writing this code:
IF table.TitleText = "CROSSTAB" THEN
table-item.Selected = TRUE
ELSE
table-item.Selected = FALSE
END IF
Like any other construct, it is possible to nest the
IF...ELSE...ENDIF construct one inside another.
The above examples suppose that you already have identified a
pivot table item. But in the SPSS Output Viewer there could be different
items besides pivot table items. Therefore you need to discriminate pivot table
items from other items. This can be done by using an IF structure again:
IF item.SPSSType = SPSSPivot THEN
SET table = item.GetTableOLEObject
IF table.TitleText = "CROSSTAB" THEN
item.Selected = TRUE
ELSE
item.Selected = FALSE
END IF
END IF
Now item is a general item object: its SPSSType
property specifies the type of the output item;
SPSSPivot identifies a pivot table item. When SPSSType is equal
to SPSSPivot, it means that the general item is a pivot table item,
containing a pivot table object.
The GetTableOLEObject method returns a reference
to the item's not-activated pivot table object (a table is activated when it has been double-clicked).
Notice the use of the SET keyword to assign the pivot table's object to
the table object variable.
Look at the indentation of the second IF nested inside
the first IF: this is a good way of writing scripts in order to
have a better code comprehension.
Now, in our code we want to add the capability of selecting all text items. Here follows an implementation of this added functionality (SPSSText identify a text item):
IF item.SPSSType = SPSSPivot THEN
SET table = item.GetTableOLEObject
IF table.TitleText = "CROSSTAB" THEN
item.Selected = TRUE
ELSE
item.Selected = FALSE
END IF
ELSE
IF item.SPSSType = SPSSText THEN
item.Selected = TRUE
END IF
END IF
And how can we write the code if do we want to also select all chart items? Here is the answer:
IF item.SPSSType = SPSSPivot THEN
SET table = item.GetTableOLEObject
IF table.TitleText = "CROSSTAB" THEN
item.Selected = TRUE
ELSE
item.Selected = FALSE
END IF
ELSE
IF item.SPSSType = SPSSText THEN
item.Selected = TRUE
ELSE
IF item.SPSSType = SPSSChart THEN
item.Selected = TRUE
END IF
END IF
END IF
Does it seem to you that the code is becoming a little confusing?
Don't worry. Fortunately the IF syntax has another
structure that can help coding situations like this:
IF <condition> THEN
<sequence 1>
[ELSEIF <condition> THEN
<sequence 2>]...
[ELSE
<else-sequence>]
END IF
It means:
<condition> (IF) is checked
in turn;
TRUE logical <condition> causes
its corresponding <sequence> to be executed;
<conditions> are False then the
ELSE 's statements (here specified by <else-sequence>) are executedBy using this structure, we can write our code into a clearer way:
IF item.SPSSType = SPSSPivot THEN
SET table = item.GetTableOLEObject
IF table.TitleText = "CROSSTAB" THEN
item.Selected = TRUE
ELSE
item.Selected = FALSE
END IF
ELSE IF item.SPSSType = SPSSText THEN
item.Selected = TRUE
ELSE IF item.SPSSType = SPSSChart THEN
item.Selected = TRUE
END IF
Return to Conditional execution
When you need to perform a task depending on a certain expression
results, Sax basic gives another facility besides the
IF...ELSEIF...ELSE...END IF structure.
This situation can also be declared by using the SELECT CASE...END CASE
statement:
SELECT CASE <expression>
[CASE <case-expression>[, ...]
<sequence>]...
[CASE ELSE
<else-sequence>]
END SELECT
It means:
<case-expression> which
matches <expression>;
<sequence> corresponding the
matching <case-expression>;
<case-expression> matches, then execute the
CASE ELSE 's statementsIn the <case-expression> is possible to use the keyword
IS to make comparisons:
| Expression | Description |
|---|---|
| Is < expr | Execute if less than |
| Is <= expr | Execute if less than or equal to |
| Is > expr | Execute if greater than |
| Is >= expr | Execute if greater than or equal to |
| Is <> expr | Execute if not equal to |
| expr1 To expr2 | Execute if greater than or equal to expr1 and less than or equal to expr2 |
| expr1, expr2, ... , exprN | Execute if any of expr1, expr2, ... , exprN |
Now, let's use the SELECT IF instruction to write our code in
another way:
SELECT CASE item.SPSSType
CASE SPSSPivot
SET table = item.GetTableOLEObject
IF table.TitleText = "CROSSTAB" THEN
item.Selected = TRUE
ELSE
item.Selected = FALSE
END IF
CASE SPSSText, SPSSChart
item.Selected = TRUE
END SELECT
Where:
item.SPSSType is the <expression>
we want to test.SPSSPivot, SPSSText, SPSSChart are the values of the
<expression> where we want to define different actions.It is interesting to notice that in the last CASE, we used both
SPSSText and SPSSChart values (separated by a
comma): in fact, whether item.SPSSType is equal to
SPSSText or SPSSChart, the action to perform
(item.Selected = TRUE) is always the same.
Sometimes, it is necessary to execute a sequence of actions many times,
until a certain condition occurs.
This is what it's called cycle.
Sax basic has three ways of implementing a cycle:
The DO...LOOP <condition> syntax structure is:
DO
<sequence>
LOOP [UNTIL|WHILE] <condition>
The pipe (|) character means that you can put only one of the neighboring
keywords.
In the above example, [UNTIL|WHILE] can be translated to:
LOOP [UNTIL] <condition>
OR
LOOP [WHILE] <condition>
Therefore you can encounter the following situations:
| Code | Description |
|---|---|
DO
<sequence>
LOOP UNTIL <condition>
|
The code is repeatedly executed until the
<condition> becomes TRUE.In other words, stop the <sequence> code
execution when the <condition> becomes TRUE. |
DO
<sequence>
LOOP WHILE <condition>
|
Repeatedly execute the <sequence> code while the
<condition> is TRUE.In other words, continue the <sequence> code
execution if the <condition> is TRUE. |
With the DO...LOOP <condition> cycle, the <sequence> code
is executed at least one time, because the <condition>
is tested at the end of the loop.
The DO<condition>...LOOP syntax structure is:
DO [UNTIL|WHILE] <condition>
<sequence>
LOOP
Therefore you can encounter the following situations:
| Code | Description |
|---|---|
DO WHILE <condition>
<sequence>
LOOP
|
If the <condition> is TRUE then continue
the <sequence> code execution.The WHILE..WEND syntax structure is an alias of this
structure. |
DO UNTIL <condition>
<sequence>
LOOP
|
If the <condition> is FALSE then continue
the <sequence> code execution. |
With the DO <condition>...LOOP cycle, it is possible that
the <sequence> code could never be executed, because the
<condition> is tested at the loop beginning.
In your scripts maybe you will use one or two of these cycling structures.
For example, I rarely use the DO <condition>...LOOP syntax structure. They all exist in order to give the
necessary flexibility to make easier the coding of the diverse situations
you can come across.
Now, with the Do...Loop construct, we have the knowledge of all
Sax basic syntax constructs necessary to write the code which sends to the
printer all pivot tables having the CROSSTAB title.
We have already defined in the Approach chapter the steps the script
must follow:
We have already seen the following constants/properties/methods:
We still don't know some SPSS objects properties/methods to use in this case:
<ItemNo>)
<ItemNo> index in the output document (0 is the first item)
<Range>)
<Range> is 0, then all expanded items will be printed<Range> is 1, then only selected items will be printedYou may get confused of all this stuff about objects / properties / methods.
Now it isn't necessary to understand them clearly: you only need
to know what they do and not what they are. You should try to focus on the Script syntax solely.
By looking at the SPSS objects we have, the list of steps we defined isn't ready for a script, and again we need to give more detail:
(1) SET OutputDoc = objSPSSApp.GetDesignatedOutputDoc
(2) ItemNo=0
DO
(3) SET Item = OutputDoc.Items.GetItem(ItemNo)
(4,5) IF Item.SPSSType = SPSSPivot THEN
(6) SET Table = Item.GetTableOLEObject
(7,8) IF Table.TitleText = "CROSSTAB" THEN
(9) Item.Selected = TRUE
END IF
END IF
(10) ItemNo = ItemNo + 1
(11) LOOP UNTIL ItemNo=OutputDoc.Items.Count
(12) OutputDoc.PrintRange(1)
(13) OutputDoc.PrintDoc
Return to Cycles
This cycling structure is used when the number of cycles to perform is known and doesn't depend from the code executed every cycle.
In a DO...LOOP structure, the number of cycles may not be known in advance,
it could be dynamically determined by the execution of the action specified by the
code inside the cycle.
On the other way, in a FOR...NEXT structure the number of cycles is fixed,
and cannot be modified by the code executed every cycle.
The FOR...NEXT cycling structure uses an index variable whose value is incremented
at each cycle. The code repeated every cycle can use this variable.
Although the index variable can be modified by the code inside the cycle, it isn't
good practice the change of its value, because if you are not sure of what you're doing,
your code may work in a way you don't expect.
Looking at the Do...Loop example above, we know the
number of times the code inside the DO...LOOP is run: it
is equal to the number of items contained in the output document,
that is OutputDoc.Items.Count.
Therefore, the same example is suitable to be coded with a FOR...NEXT
cycle.
And now let's look at the FOR..NEXT structure syntax:
FOR <index-variable> = <start-value> To <last-value> [Step <increment-value>]
<sequence>
NEXT [<index-variable>]
It means:
<index-variable> with <start-value>;
<sequence>;
<index-variable> by the <increment-value>.<increment-value> is omitted, then increment the
<index-variable> by 1;
<increment-value> is greater than 0, then return to step 2 if
the <index-variable> is less than the <last-value>
<increment-value> is less than 0, then return to step 2 if the
<index-variable> is greater than the <last-value><start-value>, <last-value>, <increment-
value> can be a constant, a literal value, a
variabile or an expression.
Like any construct, a FOR...NEXT loop can be nested into any other construct,
In general, <start-value> is less than <last-value>. This
happens in case you want to increment the <index-variable>.
But if you need the cycle to decrement the <index-variable>,
<start-value> should be greater than <last-value>, and you
should specify <increment-value> as a negative number.
If you omit the <increment-value> while <start-value> is
greater than <last-value>, the cycle will never end, just because the
<increment-value>, when missing, is considered as 1, and the
<index-variable> will be always increased by 1 and it will never match
<last-value>.
So, the FOR..NEXT cycle can be used in 2 situations:
<index-variable> by using a <start-value>
less than <last-value> and by specifying a positive
<increment-value>;
<index-variable> by using a <start-value>
greater than <last-value> and by specifying a negative
<increment-value>;Now, we can modify the Do...Loop example with the following code which
makes use of the FOR...NEXT code structure instead:
SET OutputDoc = objSPSSApp.GetDesignatedOutputDoc
FOR ItemNo = 0 TO OutputDoc.Items.Count-1
SET Item = OutputDoc.Items.GetItem(ItemNo)
IF Item.SPSSType = SPSSPivot THEN
SET Table = Item.GetTableOLEObject
IF Table.TitleText = "CROSSTAB" THEN
Item.Selected = TRUE
END IF
END IF
NEXT ItemNo
Notice that for the <last-value> we have used OutputDoc.Items.Count-1 instead of
OutputDoc.Items.Count because the OutputDoc.Items.GetItem method's index is zero-based,
therefore the first item is indexed as 0, and the last item is indexed by the Items.Count property value
less 1.
As explained in the Processing blocks chapter, in order to make the top-down approach implementation easier, a script can be subdivided into several processing blocks, identified by name, each one representing a fragment of the problem to solve.
During a problem analysis and in the following program writing, there are some situations that could come up:
In everyone of these cases, subprograms are used. A subprogram is a
part of a program which solves a subproblem.
A subprogram represents a processing block declaration.
A subprogram is identified by a name, which usually represents the meaning of the
action the subprogram performs.
Obviously, in a program it isn't possible to have two subprograms
with the same name.
In every program must always exists a subprogram named as MAIN: this
is the subprogram that calls other subprograms, and defines the program starting point.
In order to avoid to rewrite the same algorithm, not only for the need of executing the same code, but also for applying it to different variables, it is possible to pass parameters to a subrogram.
Sax basic distinguishes two types of subprograms: subroutines or functions. A description of the two here follows.
A subroutine is an instructions grouping whose target is solve a problem.
It is declared in the following way:
SUB <subroutine-name> [(<parameter-list>)]
<instruction-sequence>
END SUB
where:
<subroutine-name> is the name given to the subroutine
[(<parameter-list>)] is the optional list of parameters passed
to the subroutine
<instruction-sequence> is the sequence of variable declarations and
processing instructions that make up the subroutine