Presented is a review of functions and "functional thinking". The S- and K- combinators are introduced and used to construct the set of integers and arithmetic operations.
A function is an abstract "action" that receives an input and produces an output. Depending on one's path of education, the notion of "functions" tends to first arise in one of two disciplines: mathematics or computer programming.
In mathematics, one learns early about graphing equations, such as straight lines:
y = mx + b
or circles:
x^2 + y^2 = r^2
It's all candy and cupcakes until the teacher introduces the dreaded f(x) notation, where the notion of a "function" f replaces the y-variable in some equations, such as that of a line, but not all equations, such as that of a circle. In the notation f(x), the quantity x is input, also known as the argument. The output of the function is just f(x).
When it comes to sorting out just which mathematical equations correspond to functions, always stick to the definition: one output for one input. This means all non-vertical lines are functions:
if (m <> +/- infinity):
y = f(x) = mx + b
On the other hand, the equation of the circle can only be half-captured by one function. To see this, solve the equation of the circle for y, giving:
y = +/- sqrt(r^2 - x^2)
Note there are two "branches" to the y-equation, one positive and the other negative. This means that for a given x on the circle, there are two y-values, so y as written cannot be a function. To deal with this, the circle must be described by two functions, one for each side of the x-axis:
f(x) = +sqr(r^2 - x^2)
g(x) = -sqr(r^2 - x^2)
In contrast to a mathematical introduction, a student of computer programming tends to encounter "functions" as means of code reuse. The student might be told "if you're typing the same code over and over, build a function instead so you only type the hard part once". For example, if it's required to capitalize the first letter of a string and then append an exclamation to the end, one might come up with the following QB64 code:
Inside the body of the function, s is called a parameter. (Recall that outside the function, the value that s represents was an argument.) The body of the function uses QB64 language primitives (also functions) to perform operations on s (without editing it!), storing the result the output string. In practice, one may use the above function to write h$("hello") to receive "Hello!" back.
Another structure for code reuse is the "subroutine", or "subprogram". Using a call statement, control is passed away from the main program, into the subprogram, and then back to the main program under the call statement. The main program should call the subroutine more than once to truly claim "code resuse", however subroutines also appeal to code organization if this isn't case.
The similarity between functions and subroutines stops here, however, or at least it should. It is tempting (but wrong) to believe, whether by one's own devices or through institutional misinformation, that functions should be used just like subroutines, especially when you want a value returned. On the contrary, functions should be regarded as additional "keywords" that the programmer adds to the language.
Going back to the mathematical view of a function, f(x) is considered a pure function, which is to (re-)state the fact that the same input always corresponds to the same output. It must not matter how long the calculation takes, what else is going on in the program, which phase is moon is in - nothing.
A pure function is logically equivalent to a lookup table. Such a lookup table would be infinitely tall if the inputs x are unconstrained, but nonetheless we can imagine a pure function as addresses (x-values) pointing to data (f(x)-values).
It would be hard to conceive of an impure function in a mathematical framework, but this is a no-brainer in computer programming. (Emphasis on "no-brainer". We will develop a bias against impure functions as we go on.) Starting off easy, the implest example of an impure function is one that returns the time:
As innocent as it looks, the function t$(), which simply reutrns the system time, is not a pure function. The input is defined as always empty (no problem yet), however the output is always changing. If this function were in a pipeline of otherwise pure functions, the entipre pipeline loses purity. Impure functions can get a lot worse than this, too. Time to talk about "state".
The state of a program refers to everything the program has in memory. Every variable, function, array, stack, heap, and so on, is part of the state. Now we can make an important distinction: in computer programming, a pure function cannot change the program state. On the other hand, an impure function might change program state. Let's see a real impure function:
In the above we have c defined as a global variable. Inside the function however, the line c = x * y assigns a new value to c. After the function closes, c remains changed. Any other part of the program depending on c needs to contend with c changing whenever impure(x,y) is called. The function impure(x,y) thus changes program state, which is to say the function has side effects, in technical jargon. While plenty of BASIC programmers use side effects as a feature, we pursue a discipline that classifies side effects as spaghetti code.
To convert the above to a pure function, simply remove any reference to c from the function body and replace the helper variable with something privately-scoped:
White it's usually true that no complete piece of software can be written using only pure functions, the burden is on the programmer to try. This is for reasons beyond keeping code tidy, avoiding a mess with global variables, or avoiding other pitfalls unique to BASIC-family languages. I urge this discipline becasue mind takes the shape of its container, and high-minded concepts are easier to digest if you start leaning hard-right on functions.
Soon we'll step into the weeds by working with high-order functions. By definition, a high-order function is a function that either (i) takes a function as input, and/or (ii) writes a function as its output. This seems innocent enough, perhaps mundane or odd, until you try to think of an example.
In ordinary algebra, it's awkward to think of passing a function to a function. For instance, writing f(cos) doesn't seem to mean anything at all. You'd rather see f(cos(x)) with a numerical argument. In fact, the first mathematical application of "passing functions to functions" occurs in the calculus of variations, beyond the scope of this study.
Passing around a construction like f(cos) is slightly more natural in computer programming, but not in all languages. No BASIC-family language would know what to do with f(cos). For the remainder of this study, in fact, we must leave QB64 behind and embrace more abstract means and methods. In fact, let us momentarily leave computer languages alone for a moment and think abstractly.
Consider a function f that receives an object (synonym for "piece of data") x as an input, and outputs f(x) as the same type of object. Calling the output object y, we may write:
y = f(x)
Since y is an object of the type that f "wants" as input, we calculate f(y) to produce a new output z:
z = f(y)
If we were after z all along, it turns out y was just a middleman, so it's perfectly valid to write z in terms of the original x by layering two calls of f as shown:
z = f(f(x))
Kicking things up a notch, we now introduce the compound function:
g(x) = f(f(f(x)))
Compound functions provide a way of condensing complexity into fewer symbols. For instance, Consider another function:
h(x) = f(f(g(x)))
... where g(x) is the same as above. By subsituting g(x) for its equivalent in layers of f(x) (namely three), we discover h(x) = f(f(f(f(f(x))))), which is five applications of f to the argument x.
Continuing the above example, it's easy to see that the function h(x) can also be made by swapping the order of calls to f and g. For instance, an equivalent representation is h(x) = f(g(f(x))), or just as equivalently, h(x) = g(f(f(x))). Each of these resolve to h(x) = f(f(f(f(f(x))))).
To put a name to what we're doing, let's say that functions like h(x) exhibit linear complexity. This is to acknowledge that h is indeed a compound function, and the maximum "depth", or maximum number of function calls, is equal to the number of f-symbols in the definition. For another example, using the same g(x) as above, we may write k(x) = g(g(g(x))). Remembering that each call to g equals three calls of f, we right away know k(x) is nine calls of f:
k(x) = g(g(g(x))) = f(f(f(f(f(f(f(f(f(x)))))))))
Let's take a second look at the compound function g(x) = f(f(f(x))). While this construction for applying f three times is efficient enough, wouldn't it be nice to build in the "three-ness" some other way? What if we need (far) more than three layers of f( )? To begin handling all of this, let me propose a new way to write g(x):
g(x) = e(f,x) = f(f(f(x)))
The function e(f,x) is a high-order function. The second argument sent to e just carries x into the function. The first argument is only a function with no (x)-appendage. In practice, this means that the f's appearing in f(f(f( ))) are be treated as a variable, and may be swapped out for another. To see the power of this, watch how easily we write k(x):
k(x) = e(g,x) = g(g(g(x))) = f(f(f(f(f(f(f(f(f(x)))))))))
The high-order function e(f,x) breaks the linearity ceiling. To keep going, we find e(k,x) contains 27 applications of f. The complexity is growing in powers of three:
e(k,x) = f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(f(x)))...)
As you can see, this can become explosive rather quickly. Note too that the function f had really ought to be a pure function, otherwise the side effects will get out of hand.
To solve a real problem with this apparatus, consider the function:
f(x) = x - tan(x)
... along with the high-order function:
e(f,x) = f(f(f(x)))
Now, watch what happens when we evaluate e(f,2) (the 2 came out of nowhere, its only a guess).
e(f,2) = f(f(f(2)))
e(f,2) = f(f(2-tan(2)))
e(f,2) = f(2-tan(2)-tan(2-tan(2)))
e(f,2) = 2-tan(2)-tan(2-tan(2))-tan(2-tan(2)-tan(2-tan(2)))
e(f,2) = 3.26618627756911
In case it wasn't obvious, our f- and e- functions used together apply a recursive algorithm for approximating digits of pi by Newton's method. To hone in more accuraccy, we could feed the result 3.26618627756911 back into e(f,x), but that's old news. Let's do it in one line by taking the e-function, and sending it g instead of f:
e(g,2) = g(g(g(2)))
e(g,2) = f(f(f(f(f(f(f(f(f(2)))))))))
e(g,2) = 3.141592653589793
This time, the answer converges to a precision beyond the accuracy of the Double number type in QB64. If you're paying attention though, you'll notice that QB64 can't be used as such to test these constructions. A programming language must qualify as functional to properly handle high-order functions.
There is a saying among programmers that goes something like "style is the part of your program ignored by the computer". While this statement is literally true, I point out that the computer doesn't let you ignore the consequenses of style, particularly bad style. The programmer is welcome to make his own mess and eat it too.
Certain languages foresee bad coding style and "engineer it out" to a certain limit. For instance, the Java language attempts to quell spaghetti code by not allowing the goto statement among its keywords. Or, the Haskell language avoids side effects by not allowing variables to be changed once they're defined. Meanwhile, BASIC-family languages are notorious for installing no safeguards, allowing all kinds of bad habits to be compiled and shipped by a clueless programmer.
While it's useful to classify individual languages on the spectrum of "pureness", we can't always gravitate to the purest languages to write the best programs. Sometimes, especially from an engineering perspective, you simply "need" to work in a paradigm that supports mutable objects, pointer math, or goto. There isn't always room for eggheaded purism. The language itself can't always protect us, so the burden falls doubly back on the programmer to write the best code possible, despite having potential to make a huge mess.
One way to avoid making a mess is by employing functional style, which is the main "political leaning" in this study. This means to treat functions as the most important, and perhaps the only building blocks of a program. Complexity, if it must exist, should arise from compounding small functions rather than bulking up one large function. Each function should do one job, and do it perfectly well with no side effects.
As an exercise in a "purely functional" programming, and to get prepared for some upcoming abstract ideas, let us use nothing but functions to build something concrete, such as the positive integers. Apart from the full apparatus of functions we've been discussing, only two pieces of information are required to get started: (i) the number zero as the initial input, and (ii) the notion of the "+1" operation, which adds one to any integer. This means we begin with:
x_0 = 0
f(x) = x + 1
With this setup, we may calculate the rest of the positive integers. This should stand out as rather trivial, for if we nest x_0 inside n layers of f(x), the result is the integer x_n:
x_0 = 0
x_1 = f(0) = 1
x_2 = f(f(0)) = 2
x_3 = f(f(f(0))) = 3
x_4 = f(f(f(f(0)))) = 4
x_5 = f(f(f(f(f(0))))) = 5
While f(x) alone gets the job done, one quickly sees the problem with writing all those f's, especially for high integers. In fact, this approach is old news (linearly complex), and doesn't take advantage of any functional trickery. To update this, introduce the high-order function:
e(f,x) = f(f(x))
Also, let's write down a flurry of ordinary functions that employ e:
g(x) = e(f,x) = f(f(x))
h(x) = e(g,x) = f(f(f(f(x))))
i(x) = e(h,x) = f(f(f(f(f(f(f(f(x))))))))
j(x) = e(i,x) = ... (16 layers)
k(x) = e(j,x) = ... (32 layers)
As we list the functions g, h, i, etc., the number of f-layers doubles with each iteration. This is surely because the original high-order e function contains two calls to f, and we can expect a certain "two-ness" feeling as we proceed.
Now we're ready to re-create the list of integers. Carefully puzzling out each case, the following list emerges (stopping at 40):
As a visual aid, let's remove all parentheses and zeros to highlight the function calls, and also space each letter into its own column. By doing this, notice the pattern in the f's, g's, h's, etc. represent the corresponding integer x_n in binary!
Maybe you wouldn't build the integers this way in a practical scenario, but hopefully the overall point is still shining through. Functions, which we usually think of as algorithmic objects, can be used to represent data. The way the "data" is represented is strongly dependent on the function(s) beneath. For example if instead of using e(f,x)=f(f(x)) with two layers, we could instead use e(f,x)=f(f(f(x))) with three layers, and the resulting set of integers would have a base-three representation in the symbols f, g, h, etc.
Now we jump into some deep water. So far, we've witnessed the utility of high-order functions in their capacity to condense data and algorithmic complexity into a unified notation. The next thing we might wonder is just how extreme this kind of thinking can go. That is, can we write a set of functions that can accomodate any data structure, and represent any algorithm? How many functions would be needed to do this?
It was this kind of thinking that ultimately gave rise to Turing machines, lambda calculus, and the whole field of computation as we know it. Before all of that though, another universal model of computation was proposed, called the S and K combinators, also known as the SK calculus. An excellent video presentation on the history of the S and K combinators was given by Stephen Wolfram on Dec 7, 2020, the 100th anniversary of their introduction. The corresponding article combinators: A Centennial View explores this topic using Wolfram Language constructions.
The S and K combinators, in their full majesty, occur as follows:
s[x][y][z] = x[z][y[z]]
k[x][y] = x
Looking at the k combinator first: k is a function that receives two arguments x and y, and all K does is "forget" about y, returning only x. Note that unlike the ordinary f(x,y) notation, multiple arguments are not enclosed in brackets. The quantiy k[x][y] = x is not equivalent to k[[x][y]] = x.
The symbols x, y, z can stand for anything: numbers, strings, trees, functions, objects, entire programs - really anything. The square brackets aren't like ordinary parentheses in the way they simplify. Never assume that something like [[x]] is equivlent to [x] in the same way that ((7)) simplifies to (7).
The s combinator takes three arguments, and returns an awkward combination x[z][y[z]].
...And that's it. Amazingly, this setup is all we need to compute anything imaginable!... Well, this is technically true, but keep in mind we're going down this road for conceptual purposes. Only a real madlad would actually use this stuff to build something.
Some introductions to SK calculus actually call it SKI calculus, where I stands for Identity. Indeed, there is a third combinator called the "identity", taking its name from linear algebra. The Identity is the combinator that simply returns its argument in the same way it was received. It's the abstract version of "multiplying by one" to not change a number.
The Identity combinator is in fact not fundamental, but can be expressed in terms of S and K combinators. (This had better be the case!) It turns out that the I combinator, or identity combinator is written:
i = s[k][k]
Testing this out on the enclosed object [a], we see the I combinator does its job:
i[a] = s[k][k][a]
i[a] = k[a][k[a]]
i[a] = a
We shall explore the utility of S and K combinators by retracing an example detailed by Wolfram (link above). In this case, suppose we are handed the abstract object [a][b][c], without specifying anything else about this object, S and K combinators allow any other object made from a, b, and c. To be definite, suppopse we start with [a][b][c] and need to end up with a[b[a][c]].
To acheive this, take a look at the rather ugly operator:
o = s [s[k[s]][s[k[k]][s[k[s]][k]]]] [s[k[s[s[k][k]]]][k]]
... which has the qualitative format s[x][y]. Since there is no third argument, the operator o will remain static until something is appended to the right.
The plan now is to append [a][b][c] to the right of o, and simplify according the S and K rules. Letting a machine do the actual work, the computation goes as follows:
In case the above is hard to follow, Wolfram offers this visual representation of each step:
To summarize this example, we started with an inital chunk of data [a][b][c], and then the operator o was applied to the left. After applying the S and K combinator rules recursively, the string boils all the way down to a[b[a][c]]. As a side comment, it seems that the meaning of the word "combinator" can be lifted from this example. The S and K combinators generate different "combinations" of the input data. With the right operations, the S and K combinators can in fact represent any data structure or any algorithm.
Let us conceive of some way to represent the integers using S and K combinators.
Begin by considering the expression s[k][s][k], which readily reduces to k[k][s[k]], and then to simply k. For reasons that will become clear(ish) momentarily, let us interpret the number zero as the concatenation of the "zero-prefix" s[k] and the suffix [s][k].
0 = s[k][s][k] = k
To come up with the next integer, namely one, let us anticipate 1=s[k] as a final form. To build this form, we begin with:
1 = p[s[k]][s][k]
... which has an operator p acting on the so-called zero-prefix, and then the whole quantity is suffixed by [s][k]. The obvious question, of course, is what should p look like? This takes some time to figure out, but with enough persistence, one finds:
p = s[s[k[s]][k]]
Putting it all together and simplifying, the expression indeed boils down to what we want after seven steps:
The operator p=s[s[k[s]][k]] will be recycled for each new integer we make. In other words, p is analogous to the +1 operation in a traditional paradigm.
Continuing the pattern that gave us 1=s[k], we build 2=s[s[k]] by applying the increment operator s[s[k[s]][k]] to the one-prefix p[s[k]], and then appending [s][k] to the right:
2 = p[p[s[k]]][s][k]
This calculation takes twelve steps to settle down, but it works:
The pattern should be evident at this point. Summarizing and projecting a few notches forward, we have, after all that work, a way to represent the integers. In the list below a space has been added before [s][k] as a visual aid to separate the number-prefix from the (universal) suffix.
0 = s[k] [s][k]
1 = p[s[k]] [s][k]
2 = p[p[s[k]]] [s][k]
3 = p[p[p[s[k]]]] [s][k]
4 = p[p[p[p[s[k]]]]] [s][k]
5 = p[p[p[p[p[s[k]]]]]] [s][k]
...
p = s[s[k[s]][k]]
Once simplified, the above reduces down to:
0 = k
1 = s[k]
2 = s[s[k]]
3 = s[s[s[k]]]
4 = s[s[s[s[k]]]]
5 = s[s[s[s[s[k]]]]]
...
Now that the integers are in our pocket, it's time for addition. The way addition works is sketched as follows:
a + b = (add. op.)[a prefix][b prefix][s][k]
The addition operator would be very hard to guess on the first try, so here it is:
(add. op.) = s[k[s]][s[k[s[k[s]]]][s[k[k]]]]
Picking an easy but nontrivial example, let us calculate 1+2, which ought to simplify to 3=s[s[s[k]]]. Plugging everything into the formula above, we have, after 30 steps:
Almost miraculously, the result boils down to s[s[s[k]]] as expected. It follows that any two integers can be added this way, but the number of steps grows in turn. To calculate 3+4=7 requires 50 steps, and to calculate 30+40=70 requires 365 steps.
The product of two integers can be calculated using the same setup as the sum calculation:
a * b = (prod. op.)[a prefix][b prefix][s][k]
The product operator is much simpler than the addition operator. Still not trivial, so here you go:
(prod. op.) = s[k[s]][k]
Picking an easy but nontrivial example, let us calculate 2*3, which ought to simplify to 6=s[s[s[s[s[s[k]]]]]]. Plugging everything into the formula above, we have, after 50 steps:
S and K combinators can deal with exponents in a similar spirit:
a ^ b = (exp. op.)[a prefix][b prefix][s][k]
The exponent operator is:
(exp. op.) = s[k[s[s[k][k]]]][k]
Picking an easy but nontrivial example, let us calculate 3^2, which ought to simplify to 9=s[s[s[s[s[s[s[s[s[k]]]]]]]]]. Plugging everything into the formula above, we have, after 116 steps: