Short version:
In order to make single-assignment style work reliably in Java, you'd need (1) some kind of immutable-friendly infrastructure, and (2) compiler- or runtime-level support for tail-call elimination.
We can write much of the infrastructure, and we can arrange things to try to avoid filling the stack. But as long as each call takes a stack frame, there will be a limit on how much recursion you can do. Keep your iterables small and/or lazy, and you shouldn't have major issues. At least most of the problems you'll run into don't require returning a million results at once. :)
Also note, since the program has to actually effect visible changes in order to be worth running, you can't make everything immutable. You can, however, keep the vast majority of your own stuff immutable, using a tiny subset of essential mutables (streams, for example) only at certain key points where the alternatives would be too onerous.
Long version:
Simply put, A Java program can not totally avoid variables if it wants to do anything worth doing. You can contain them, and thus restrict mutability to a huge degree, but the very design of the language and API -- along with the need to eventually change the underlying system -- make total immutability infeasible.
Java was designed from the start as an imperative, object-oriented language.
- Imperative languages nearly always depend on mutable variables of some kind. They tend to favor iteration over recursion, for example, and nearly all iterative constructs -- even
while (true)
and for (;;)
! -- are utterly dependent on a variable somewhere changing from iteration to iteration.
- Object-oriented languages pretty much envision every program as a graph of objects sending messages to each other, and in nearly all cases, responding to those messages by mutating something.
The end result of those design decisions is that without mutable variables, Java has no way to change the state of anything -- even something as simple as printing "Hello world!" to the screen involves an output stream, which involves sticking bytes in a mutable buffer.
So, for all practical purposes, we're limited to banishing the variables from our own code. OK, we can kinda do that. Almost. Basically what we'd need is to replace almost all iteration with recursion, and all mutations with recursive calls returning the changed value. like so...
class Ints {
final int value;
final Ints tail;
public Ints(int value, Ints rest) {
this.value = value;
this.tail = rest;
}
public Ints next() { return this.tail; }
public int value() { return this.value; }
}
public Ints take(int count, Ints input) {
if (count == 0 || input == null) return null;
return new Ints(input.value(), take(count - 1, input.next()));
}
public Ints squares_of(Ints input) {
if (input == null) return null;
int i = input.value();
return new Ints(i * i, squares_of(input.next()));
}
Basically, we build a linked list, where each node is a list in itself. Each list has a "head" (the current value) and a "tail" (the remaining sublist). Most functional languages do something akin to this, because it's very amenable to efficient immutability. A "next" operation just returns the tail, which is typically passed to the next level in a stack of recursive calls.
Now, this is an extremely oversimplified version of this stuff. But it's good enough to demonstrate a serious problem with this approach in Java. Consider this code:
public function doStuff() {
final Ints integers = ...somehow assemble list of 20 million ints...;
final Ints result = take(25, squares_of(integers));
...
}
Although we only need 25 ints for the result, squares_of
doesn't know that. It is going to return the square of every number in integers
. Recursion 20 million levels deep causes pretty big problems in Java.
See, the functional languages you'd typically do wackiness like this in, have a feature called "tail call elimination". What that means is, when the compiler sees code's last act being to call itself (and return the result if the function's non-void), it uses the current call's stack frame instead of setting up a new one and does a "jump" instead of a "call" (so the stack space used remains constant). In short, it goes about 90% of the way toward turning tail-recursion into iteration. It could deal with those billion ints without overflowing the stack. (It'd still eventually run out of memory, but assembling a list of a billion ints is going to mess you up memorywise anyway on a 32-bit system.)
Java doesn't do that, in most cases. (It depends on the compiler and runtime, but Oracle's implementation doesn't do it.) Each call to a recursive function eats up a stack frame's worth of memory. Use up too much, and you get a stack overflow. Overflowing the stack all but guarantees the death of the program. So we have to make sure not to do that.
One semi-workaround...lazy evaluation. We still have the stack limitations, but they can be tied to factors we have more control over. We don't have to calculate a million ints just to return 25. :)
So let's build us some lazy-evaluation infrastructure. (This code was tested a while back, but i've modified it quite a bit since then; read the idea, not the syntax errors. :))
// Represents something that can give us instances of OutType.
// We can basically treat this class like a list.
interface Source<OutType> {
public Source<OutType> next();
public OutType value();
}
// Represents an operation that turns an InType into an OutType.
// Note, these can be the same type. We're just flexible like that.
interface Transform<InType, OutType> {
public OutType appliedTo(InType input);
}
// Represents an action (as opposed to a function) that can run on
// every element of a sequence.
abstract class Action<InType> {
abstract void doWith(final InType input);
public void doWithEach(final Source<InType> input) {
if (input == null) return;
doWith(input.value());
doWithEach(input.next());
}
}
// A list of Integers.
class Ints implements Source<Integer> {
final Integer value;
final Ints tail;
public Ints(Integer value, Ints rest) {
this.value = value;
this.tail = rest;
}
public Ints(Source<Integer> input) {
this.value = input.value();
this.tail = new Ints(input.next());
}
public Source<Integer> next() { return this.tail; }
public Integer value() { return this.value; }
public static Ints fromArray(Integer[] input) {
return fromArray(input, 0, input.length);
}
public static Ints fromArray(Integer[] input, int start, int end) {
if (end == start || input == null) return null;
return new Ints(input[start], fromArray(input, start + 1, end));
}
}
// An example of the spiff we get by splitting the "iterator" interface
// off. These ints are effectively generated on the fly, as opposed to
// us having to build a huge list. This saves huge amounts of memory
// and CPU time, for the rather common case where the whole sequence
// isn't needed.
class Range implements Source<Integer> {
final int start, end;
public Range(int start, int end) {
this.start = start;
this.end = end;
}
public Integer value() { return start; }
public Source<Integer> next() {
if (start >= end) return null;
return new Range(start + 1, end);
}
}
// This takes each InType of a sequence and turns it into an OutType.
// This *takes* a Transform, rather than just *implementing* Transform,
// because the transforms applied are likely to be specified inline.
// If we just let people override `value()`, we wouldn't easily know what type
// to return, and returning our own type would lose the transform method.
static class Mapper<InType, OutType> implements Source<OutType> {
private final Source<InType> input;
private final Transform<InType, OutType> transform;
public Mapper(Transform<InType, OutType> transform, Source<InType> input) {
this.transform = transform;
this.input = input;
}
public Source<OutType> next() {
return new Mapper<InType, OutType>(transform, input.next());
}
public OutType value() {
return transform.appliedTo(input.value());
}
}
// ...
public <T> Source<T> take(int count, Source<T> input) {
if (count <= 0 || input == null) return null;
return new Source<T>() {
public T value() { return input.value(); }
public Source<T> next() { return take(count - 1, input.next()); }
};
}
(Keep in mind that if this were actually viable in Java, code at least somewhat like the above would already be part of the API.)
Now, with an infrastructure in place, it's rather trivial to write code that doesn't need mutable variables and is at least stable for smaller amounts of input.
public Source<Integer> squares_of(Source<Integer> input) {
final Transform<Integer, Integer> square = new Transform<Integer, Integer>() {
public Integer appliedTo(final Integer i) { return i * i; }
};
return new Mapper<>(square, input);
}
public void example() {
final Source<Integer> integers = new Range(0, 1000000000);
// and, as for the author's "bet you can't do this"...
final Source<Integer> squares = take(25, squares_of(integers));
// Just to make sure we got it right :P
final Action<Integer> printAction = new Action<Integer>() {
public void doWith(Integer input) { System.out.println(input); }
};
printAction.doWithEach(squares);
}
This mostly works, but it's still somewhat prone to stack overflows. Try take
ing 2 billion ints and doing some action on them. :P It will eventually throw an exception, at least until 64+ GB of RAM becomes standard. Problem is, the amount of a program's memory that's reserved for its stack is not that big. It's typically between 1 and 8 MiB. (You can ask for bigger, but it doesn't matter all that much how much you ask for -- you call take(1000000000, someInfiniteSequence)
, you will get an exception.) Fortunately, with lazy evaluation, the weak spot is in an area we can better control. We just have to be careful about how much we take()
.
It'll still have lots of problems scaling up, because our stack usage increases linearly. Each call handles one element and passes the rest off to another call. Now that i think about it, though, there is one trick we can pull which might gain us quite a bit more headroom: turn the chain of calls into a tree of calls. Consider something more like this:
public <T> void doSomethingWith(T input) { /* magic happens here */ }
public <T> Source<T> workWith(Source<T> input, int count) {
if (count < 0 || input == null) return null;
if (count == 0) return input;
if (count == 1) {
doSomethingWith(input.value());
return input.next();
}
return (workWith(workWith(input, count/2), count - count/2);
}
workWith
basically breaks up the work into two halves, and assigns each half to another call to itself. Since each call reduces the size of the working list by half rather than by one, this should scale logarithmically rather than linearly.
Problem is, this function wants an input -- and with a linked list, getting the length requires traversing the whole list. That's easily solved, though; simply don't care how many entries there are. :) The above code would work with something like Integer.MAX_VALUE
as the count, since a null stops the processing anyway. The count is mostly there so we have a solid base case. If you anticipate having more than Integer.MAX_VALUE
entries in a list, then you can check workWith
's return value -- it should be null at the end. Otherwise, recurse.
Keep in mind, this touches as many elements as you tell it to. It's not lazy; it does its thing immediately. You only want to do it for actions -- that is, thingies whose sole purpose is to apply itself to every element in a list. As i'm thinking it over right now, it seems to me that sequences would be a lot less complicated if kept linear; shouldn't be a problem, since sequences don't call themselves anyway -- they just create objects that call them again.