Superheterodyne and heterodyne are effectively the same. The "super-" merely indicates that the intermediate frequency is ultrasonic. See part way through the history section of Wikipedia's Superheterodyne receiver.
Double-superheterodyne is a radio system which uses two different frequency conversion stages. This is used in analog TV to recover the audio band or in very wide tuning range radios in order to improve image rejection. See the wikipedia article's Advanced Designs section.
Why 10.7MHz? In addition to producing the desired intermediate frequency signal, heterodyning also produces an image signal at twice the intermediate frequency. This is removed by the bandpass filter; however, image rejection is improved by increasing the intermediate frequency so that the image is better attenuated by the filter. At the time of FM radio development, ~10MHz was near the limits of what was possible.
The choice of 10.7 vs 10 was so that the local oscillator radiation would fall in between stations since the US FM radio frequency allocation places stations every 0.2MHz.
(summarized from the radio board's Why 10.7MHz IF?)
Yes, the heterodyne principle may be used in modulation. Wikipedia Heterodyne page