It is quite possible to understand and optimize for caches. It starts with understanding the hardware and continues with being in control of the system. The less control you have over the system the less likely you will be to succeed. Linux or Windows running a bunch of applications/threads that are not idling.
Most caches are somewhat similar in their properties, use some part of the address field to look for hits, have a depth (ways), and a width (cache line). Some have write buffers, some can be configured to write through or bypass the cache on writes, etc.
You need to be acutely aware of all the memory transactions going on that are hitting that cache (some systems have independent instruction and data caches making the task easier).
You can easily make a cache useless by not carefully managing your memory. For example, if you have multiple data blocks you are processing, hoping to keep them in cache, but they are in memory at addresses that are even multiples relative to the caches hit/miss checking, say 0x10000 0x20000 0x30000, and you have more of these than ways in the cache, you may very quickly end up making something that runs quite slow with the cache on, slower than it would with the cache off. But change that to perhaps 0x10000, 0x21000, 0x32000 and that might be enough to take full advantage of the cache, reducing the evictions.
Bottom line, the key to optimizing for a cache (well, other than knowing the system quite well) is to keep all of the things you need performance for in the cache at the same time, organizing that data such that it is possible to have it all in the cache at once. And preventing things like code execution, interrupts and other regular or random events from evicting significant portions of this data you are using.
The same goes for code. It is a little harder though as you need to control the locations where the code lives to avoid collisions with other code you want to keep in the cache. While testing/profiling any code that goes through a cache that adding a single line of code here and there or even a single nop, anything that shifts or changes the addresses where the code lives from one compile to another for the same code, changes where the cache lines fall within that code and changes what gets evicted and what doesn't for critical sections.