I'm not an expert in FPGAs (as you will obviously see), but these could be some (probably naïve) ideas:
1) Try to cause voltage toggling in as many CMOS structures as you can, inside the FPGA, and at the fastest possible rate. Options for that:
1.1) Create as many inverters as possible in your FPGA (hopefully, the software won't do optimizations), connect them in cascade, and drive all of them with an external clock running at the highest tolerable frequency.
1.2) If, by just synthesizing inverters, you can't make use of all existing general-purpose fabric logic, try to see, by examining the details on how it implemented, how you could cause toggling in as many of them as possible.
1.3) If the FPGA has RAM, devote a small part of the previous logic to cause flippings in all RAM bits, at the highest possible rate.
1.4) If the FPGA has dedicated DSP structures, put them also to stress.
All this, synched by the external clock.
To make all that gradually controllable, from 0% to 100%, vary the frequency of that external clock. Or devote a small portion of the fabric logic to have there a divide-by-N counter, so that it is the output from that counter, what drives everything I talked about.