Nice project. Here are a few hints, but it would be difficult to generalize this for every project, though.
Start with the computational requirements
This is what will tell you what kind of core you need and the general performances of the MCU. I suggest you start with this, since it obviously can't be extended using external components, unlike peripherals.
First, it seems you use heavy mathematical operations with large integers within the loop. So, as you suggested, 32 bit would be useful here, so ARM therefore comes as an ideal candidate. As for the frequency of operation: currently, you're using an Arduino MEGA2560 (running at 16MHz, I assume) and you can make 10 loops/s. If you want to achieve 100 loops/s, you should be fine with a Cortex-M3/M4 in the range of 100MHz or more (rough estimation). Note that the Cortex-M4F has a floating point unit.
We already narrowed down the selection.
Memory requirements
This one is easy: choose the MCU which has the most RAM/Flash of its range for the prototype. Once you validates the prototype, switch to the MCU from the same range that has just enough RAM/Flash, now that you know your exact requirements.
Note that I don't think your application needs amazing amounts of memory.
Now, the peripherals
You absolutely need some ADC. All MCUs of the range we are looking at have some, so it's not a useful criteria. Neither are digital input/outputs, except if you need a very large number of them (which doesn't seem to be your case).
You seem to need a DAC. However, this is something you won't actually find easily and will narrow down the candidates too much. So we don't keep that requirement and we'll stay with a PWM and lowpass (which is certainly acceptable, actually).
You don't mention any communication interface, except the LCD (later on that). Anyway, all MCUs have I2C/SPI/UART/... if you need some.
The LCD
This one is trickier, because there are a lot of different solutions that puts completely different requirements on the MCU. But don't choose the LCD depending on the MCU. Choose the LCD you want for your product and then select the MCU that will drive it efficiently.
- If you want a character LCD: then the easiest and the least constraining for the MCU is to talk with it through some serial interface (often SPI). This way it won't use too much PINs, you can use smaller/cheaper MCUs and speed is not an issue.
- If you want a graphic TFT LCD: if it's a small one, the serial link can still be appropriate. However, for 320x200 or larger and if you want to have a nice graphical interface, you'll start wanting to communicate with parallel interface. In this case, either you use some GPIO (but that will put more load on the MCU because you'll have to bit bang the control lines) or you choose a MCU that has a dedicated LCD interface (which is often the same as an external memory interface). This last one puts a strong constraint of the MCU choice, but you don't have other strong constraints, so...
Now, you choose
Go to ST Micro / NXP / Atmel website and use their MCU selection tools. You'll spend lots of time reading datasheets, too. Take this time. It's not wasted. Anything you'll learn here, even if you don't use it specifically for this project, can be useful.
At this point, you also need to have a look at the number of PINs you'll actually need and check the multiplexing scheme of the chosen MCU candidates to verify you can use all the PINs functions you need. Because obviously, you'll want to take the MCUs with the lowest number of pins that fulfills your requirements (for cost/PCB real estate reasons).
Check the prices/availability on Mouser/Digikey. But you shouldn't need something particularily expensive here. Maybe 5€ or so.
Last thing regarding the LCD control
It seems the update of the LCD is part of your main loop. It shouldn't. Especially if you're looping 100 times a second, it's useless. Make the control loop compute everything and adjust the motor command on each iteration, but just update the values to display somewhere in memory. Then, have another loop with lower priority display this information to the user when there's nothing more important to do.
Yeah, ideally, it requires some task switching and stuff. A real OS, actually (lookup FreeRTOS, Coocox OS, Nuttx, ... those are very small, are largely used on Cortex-M, and provide the required multitasking mechanisms).