Because in order to free memory as soon as the reference counter hits zero, you have to keep a reference counter. And that doesn't come for free. Typically, it limits your throughput.
There are generally two major strategies for implementing garbage collectors: tracing collectors and reference counting collectors. (There are others, but those are the ones in use by most mainstream automatic memory management systems.)
Typically, reference counting GCs tend to have worse throughput but better (and more predictable) latency than tracing collectors whereas tracing collectors have better throughput but higher and less predictable latency. Another big problem with (at least with simple implementations of) reference counting garbage collectors is that they can't garbage collect cycles. You typically need to run a tracing collector in conjunction with a reference counting collector anyway (that's what CPython does, for example).
Practically speaking, all modern industrial-strength high-performance automatic memory management systems (all of the collectors in Oracle JDK, Oracle JRockit and most other JVMs, Microsoft CLR, Mono, most ECMAScript implementations, all Ruby implementations, almost all Python implementations, all Smalltalk implementations, all Lisp implementations etc.) are tracing collectors, so there is a bit of a self-reinforcing feedback loop here: more money gets put into research on tracing GCs because they are popular, and they become more popular because they get better because of the money spent on their research … and so on.