Modern computers have several layers of cache memory in addition to a large, but slow, main memory system. One can make dozens of accesses to the fastest cache memory in the time required to read or write one byte from the main memory system. Thus, accessing one location a thousand times is much faster than accessing 1,000 (or even 100) independent locations once each. Because most applications repeatedly allocate and deallocate small amounts of memory near the top of the stack, the locations on the top of the stack get used and re-used an enormous amount, such that the vast majority (99%+ in a typical application) of stack accesses can be handled using cache memory.
By contrast, if an application were to repeatedly create and abandon heap objects to store continuation information, every version of every stack object that was ever created would have to be written out to main memory. Even if the vast majority of such objects would be completely useless by the time the CPU wanted to recycle the cache pages they started out in, the CPU would have no way of knowing that. Consequently, the CPU would have to waste a lot of time performing slow memory writes of useless information. Not exactly a recipe for speed.
Another thing to consider is that in many cases it's useful to know that an object reference passed to a routine will not be used once the routine exits. If parameters and local variables are passed via the stack, and if inspection of the routine's code reveals that it does not persist a copy of the passed-in reference, then the code which calls the routine can be sure that if no outside reference to the object existed before the call, none will exist afterward. By contrast, if parameters were passed via heap objects, concepts like "after a routine returns" become somewhat more nebulous, since if code kept a copy of the continuation, it would be possible for the routine to "return" more than once following a single call.