The answer is very dependent upon which kind of system you discussed.
In PCs, the following relations hold:
- Moving the data is the simplest and the shortest procedure. Usually it involves just pointers changes, without actually modifying memory's contents.
- Writing new data and copying usually takes the same order of magnitude of operations. CPU must allocate memory for the new data, program the DMA engine with the source and the destination addresses, program the DMA engine with additional parameters (amount of data to transfer, various flags, etc.) and after the DMA is launched, the CPU is no longer needed.
My intuition says that on every architecture the copying and the writing will take the same (actually comparable) CPU time (with and without DMAs). This comes from the fact that writing to memory is essentially copying from another location.
However, the above discussion concerned just the number of operations CPU performs. It may be the case that your friend meant runtime when he spoke about "load on microprocessor". Well, there is no point in discussing the runtime in presence of a DMA - it is not the CPU who performs the majority of the work. Without DMAs the moving is still the fastest (assuming that it is not the simplest microprocessor which executes move as a transfer of data), but the relation between copying and writing will depend on the source from which the write data is acquired: if this source has lower latency than the memory, the write operation will be faster; if the source have higher latency than the memory, the write operation will be slower.
In any case, it is difficult to say whether your friend is correct without getting the whole context of your discussion.
Regards