IBM InfoSphere Streams Version 4.1.0

Side-effects

Code like y = (x = 5) + x−−; is hard to read and brittle because it has multiple side-effects in a single statement, so its meaning depends on the statement-internal evaluation order. The situation is even worse if the same expression calls multiple functions with side-effects. For example, the meaning of y = foo(x, 5) + bar(x); not only depends on evaluation order, but furthermore, depends on the definition of foo and bar. For example, x might be a list, and foo and bar might be push and pop. Statements with multiple side-effects are not only hard to understand for a human, but they are also hard to optimize for a compiler. In the absence of side-effects, compilers often optimize by reordering or even parallelizing independent code and eliminating redundant code. Fortunately, even in imperative languages like C and Java™, expression-internal side-effects are uncommon, and referential transparency is common. Unfortunately, without language support, this behavior s hard to establish in a compiler. SPL is designed to make side-effects more explicit, and to encourage a coding style where side-effects are less common.

There are features and design decisions that curb side-effects:

Mutable composite data is never aliased. Since SPL has no pointer type (see topic Composite types), and since assignments make a deep-copy even in the case of composite types (see topic Value semantics), there is no aliasing inside of composite data. That way, a side-effect to one composite variable does not silently corrupt another composite variable.
Variables are immutable by default (see topic Statements). C++ and Java allow you to explicitly declare variables immutable with const or final, but even though most variables are immutable and can be declared that way, programmers typically forget to make that explicit. SPL inverts the default, making mutable an explicit modifier. Variables without that modifier are deeply immutable. That way, side-effect freedom is more common and easier to establish for humans and compilers alike.
Collections in for-loops are immutable (see topic Statements). While a for-loop iterates over a collection, that collection becomes immutable. That prevents common mistakes where the loop body has an unintended side-effect on the loop control.

In addition, SPL has the following rules to curb side-effects:

Function parameters are immutable by default. In practice, functions that mutate their parameters are infrequent. They are mostly used to make a small modification to a large data structure. In SPL, mutable parameters must be explicitly annotated with the mutable modifier, and all other parameters are deep-immutable. Thanks to this information, the compiler can produce helpful errors and even perform optimizations. For example:
```
void test(float64 x, list<float64> z) {
  for (float64 y in z) {
    print(x);
    print((x * 100.0) / y);
  }
}
```

Since function print does not modify x, a compiler can hoist the loop-invariant expression x * 100.0 out of the loop:

void test(float64 x, list<float64> z) {
  float64 loopInvariantTmp = x * 100.0;
  for (float64 y in z) {
    print(x);
    print(loopInvariantTmp / y);
  }
}

Besides enabling optimizations, making function parameters immutable by default also makes code easier to read and maintain.

Mutable function parameters are never aliased. One potential loop-hole in the aliasing prevention that is described so far can occur when the same data is passed to multiple function parameters. Consider for example a function copy(count, srcList, srcIdx, mutable dstList, dstIdx) that copies count elements of srcList starting at srcIdx to dstList starting at dstIdx. If the two lists are the same, then the copy might overwrite some of the elements that it reads. For example, a call like copy(length(x) − 1, x, 0, x, 1) would be brittle, because both srcList and dstList are aliased to x, and because dstList is mutable. Therefore, SPL disallows any mutable parameter to be aliased with any other parameter in the same function call.

Functions are stateless by default. A stateful function is a function that is not referentially transparent or has side-effects. A function is not referentially transparent if it does not consistently yield the same result each time it is called with the same inputs. A function has side-effects if it modifies state observable outside the function. For the purposes of this definition, “state observable outside the function” includes global variables in native code, and I/O to the console, files, the network, and so on, but excludes mutable parameters. Mutable parameters are handled separately because, as the loop invariant code motion example shows, they have separate optimization opportunities (print is stateful but its parameter can be hoisted). Here is an example that illustrates how code that uses stateless functions is easier to understand and optimize:
```
int32 ackermann(int32 m, int32 n) { /* do something expensive */ return 0; }
int32 test(int32 m, int32 n) {
  int32 x = ackermann(m, n);
  int32 y = ackermann(m, n);
  return x + y;
}
```

If the ackermann function is stateless and has immutable parameters, then a compiler might eliminate one of the calls:

int32 ackermann(int32 m, int32 n) { /* do something expensive */ return 0; }
int32 test(int32 m, int32 n) {
  int32 x = ackermann(m, n);
  int32 y = x;
  return x + y;
}

To make statelessness easy to determine, all functions in SPL are stateless unless they are explicitly annotated with the stateful modifier.

Note: Functions that are stateless and have no mutable parameters are pure. Immutable parameters curb context-specific side-effects, whereas statelessness curbs context-independent side-effects.

When SPL was designed, categorically outlawing stateful functions altogether was considered. However, some stateful functions are useful. For example, print, or functions that interact with external resources such as databases. Furthermore, stateful functions can yield better performance through memorization. Therefore, it was decided to permit them in SPL, but the language design encourages mostly writing stateless functions.

State that is written by a statement must not be used elsewhere in the same statement. Refer to the examples from previous topics. This rule disallows code like y = (x = 5) + x−−;, since x is written in one part and used in another part of the statement. The various rules that are related to functions also enable the SPL compiler to check this rule for statements that involve function calls. For example, y = foo(x, 5) + bar(x); is not allowed if either foo or bar has a mutable parameter. This restriction makes code more readable, prevents common programming mistakes, and might lead to more optimization opportunities.
Values in expressions in SPL output clauses must not be used elsewhere in the same output clause. In the following example, the values of a and b are undefined, as the evaluation order is undefined in C++:
```
stream<int32 a, int32 b> A = Beacon() {
     logic state : mutable int32 i = 0;
     param iterations : 10000;
     output A : a = i++, b = i++;
}
```
The undefined behavior might include unexpected output from an operator if there are undefined references. This side-effect affects only SPL output clauses where a value is written more than once, or written and read in different parts of the same output clause.
To resolve this side-effect, rewrite the output clause to remove the undefined behavior. To resolve for the Beacon operator, use the IterationCount() custom output function in the output clause, or use a logic onProcess clause in a Custom operator to replace the Beacon operator.

Together, these rules mean that for most statements, the compiler is free to implement any internal expression evaluation order, and the user cannot observe the difference. The only exception is expressions that involve floating point numbers, which the compiler must always implement such that they evaluate left-to-right.