Data access and persistence/storage layers are irresistibly natural places for caching. They're doing the I/Os, making them handy, easy place to insert caching. I daresay that almost every DAL or persistence layer will, as it matures, be given a caching function--if it isn't designed that way from the very start.
The problem is intent. DAL and persistence layers deal with relatively low-level constructs--for example, records, tables, rows, and blocks. They don't see the "business" or application-layer objects, or have much insight into how they're being used at higher levels. When they see a handful of rows or a dozen blocks being read or written, it's not clear that they represent. "The Jones account we're currently analyzing" doesn't look much different from "some basic taxation rate reference data the app needs just once, and to which it won't refer again." At this layer, data is data is data.
Caching at the DAL/persistence layer risk having the "cold" tax reference data sitting there, pointlessly occupying 12.2MB of cache and displacing some account information that will, in fact, be intensively used in just a minute. Even the best cache managers are dealing with scant knowledge of the higher level data structures and connections, and little insight as to what operations are coming soon, so they fall back to guesstimation algorithms.
In contrast, application- or business-layer caching isn't nearly so neat. It requires inserting cache management operations or hints in the middle of other business logic, which makes the business code more complex. But the tradeoff is: Having more knowledge of how macro-level data is structured and what operations are coming up, it has a much better opportunity to approximate optimal ("clairvoyant" or "Bélády Min") caching efficiency.
Whether inserting cache management responsibility into business/application code makes sense is a judgment call, and will vary by applications. In many cases, while it's known that DAL/persistence layers won't get it "perfectly right," the tradeoff is that they can do a pretty good job, that they do so in an architecturally "clean" and much more intensively testable way, and that low-level catching avoids increasing the complexity of business/app code.
Lower complexity encourages higher correctness and reliability, and faster time-to-market. That is often considered a great tradeoff--less perfect caching, but better-quality, more timely business code.