## What is a Variable?

When you do math or computer science one of the crucial concepts there is **variable**. These variables are so common and necessary that I wondered how to give a simple definition. Since I needed to write today some introduction into formal testing, I tried to get my thoughts clear on it.

Let’s start with a very basic observation.

A variable is a symbol.A very famous symbol for a variable is the letter

**x**. At least in math. In coding one might argue that

**x**is a very bad name for a variable since it does not say anything about its meaning. A better name might be

**numberOfBooks**or whatever. But a name is not enough.

A variable has a type.Sometimes the type is also called the

**sort**or the

**domain**of the variable. So what is a type? A type is a set. The elements of the set are called

**values**.

So how does a variable relate to its type? A programmer might answer that

*a variable always has some value from its type*. This might be somewhat true when executing programs, but it is not true in math or logic. Take this example:

```
x > y
```

When x and y are natural numbers (including 0), we can e.g. deduce here:

```
x > y ==> x is not 0
```

So we use variables here not as a symbol for a single value, but for a subset of values. And this subset is defined by a context.

A variable always exists in a context.

A context can be formalised by a logical formula. I use here first-order logic to do so. The context restricts the values the variable is allowed to take. An unrestricted variable **x** has the context:

```
x = x
```

If for instance **x** is natural number, the context **x = x** allows **x** to be any natural number. Together we have:

A variable represents a subset of its type, determined by its context.When having more than one variable, the variables together represent a subset of the cartesian product of their types (i.e., tuples).

Seeing it like this allows to deal with variables in a unified way, no matter where they show up. We have already seen an example from math:

```
Context "x > y" has the solution set {(x,y) | x > y}
```

Another one:

```
Context "forall x > 5 : x > y" has the solution set {(x, y) | y < 6}
```

Let’s finish with an example from a programming. Say we have this program:

```
1: int a, b, c
2: printf("Enter two numbers to add\n");
3: scanf("%d%d", &a, &b);
4: c = a + b;
5: printf("Sum of the numbers = %d\n", c);
```

We can analyze the program in two ways. Either *statically*, meaning we analyze it without running it. This is also called a *symbolic* analysis — probably because we deal here with variable symbols without knowing their single value they get when executing the program. When we execute the program we know the single value the variables have at each step in execution. This is also called a dynamic analysis. In both cases there is the special situation when a variable is defined without having assigned any value yet (see line 1). Compilers deal differently with this situation, some assign some arbitrary but fixed value out of the variable domain, or they assign a special value not being part of any domain. Such an element is usually called **nil** or **null**. It basically says that the variable has not been assigned any value of its domain. Let’s start with the symbolic analysis:

```
1: int a, b, c
Context a = nil & b = nil and c = nil has the solution set {(nil, nil, nil)}
2: printf("Enter two numbers to add\n");
3: scanf("%d%d", &a, &b);
Context a = a & b = b and c = nil has the solution set int x int x {nil}
4: c = a + b;
Context a = a & b = b and c = a + b has the solution set {(a, b, c) | c = a + b}
5: printf("Sum of the numbers = %d\n", c);
```

Since the user input in line 3 allows the user to enter any integer, we get an unrestricted value for **a** and **b**. This is different when we actually run the program. When running programs, variables always have exactly one value. That is why we can also call them a *point*. If we assume that the user enters the two numbers 4 and 6 in line 3, the situation looks like this:

```
1: int a, b, c
Context a = nil & b = nil and c = nil has the solution set {(nil, nil, nil)}
2: printf("Enter two numbers to add\n");
3: scanf("%d%d", &a, &b);
Context a = 4 & b = 6 and c = nil has the solution set {(4, 6, nil)}
4: c = a + b;
Context a = 4 & b = 6 and c = a + b has the solution set {(4, 6, 10)}
5: printf("Sum of the numbers = %d\n", c);
```

You see here also a bit why a symbolic analysis is more difficult (even NP-complete) since you need to deal with many possible execution paths, not just one (see e.g. here).