I am working in the field of real-time simulation for power electronics. The simulator is based on the most recent Intel and AMD processors. The simulation consists in a loop of code executed the fastest possible together with some I/O access to connect to real-world devices. In our current scheme, we use a custom Real-time Linux OS and we shield some CPU core to obtain the maximum speed.With this OS-based approach however, we cannot have execution cycle lower than 1-2 micro-seconds, because of some variation in the OS task (we believe). Access to I/O also limit the performance.
We actually can acheive sub-micro-seconds simulation using FPGA but the programming of FPGA is difficult and our FPGA computing structure slowly tend to mimic CPU-ALU, so I am saying to myslef, why not used Intel ALU and benefit from 50 years of optimization!
So, I am looking to past this limit of 1-2 us by using some kind of bare metal approach to the problem, using our Intel processors.I read that this is really difficult (and not recommended) for recent Intel processors.
But I wish to insist a little, just to get started with a proof-of-concept case. For example, toggling one bit in a forever loop, with some output to any I/O.
Could someone point to the best starting direction?