I have worked on commercial KVM units and can attest to the fact that they are way more than a trivial exercise to design and get working.
One of the challenges in such a design is capturing the analogue waveforms of the VGA/SVGA/XGA/WXGA video signals from an arbitrary computer and converting that into a digital format that can then be processed in the digital domain. Good news is that there are chips available that can do this job for you. One such is from Analog Devices called the ADV7604. This particular one can select from one of four video sources and digitize that to three 12-bit parallel data streams one for each of R, G and B. The part supports digitizing up to 170MHz.
Another challenge involved with designing a KVM unit is the capture of the high speed digital pixel information into a memory buffer where it can be processed before sending over the network to the remote site. It is necessary to use something like a high performance FPGA connected to a SDRAM for frame capture and video compression. You see that real time transfer of the complete video frame information is just not practical over the public networks. Just for video at up to 170 MHz * 36 bits that corresponds to a raw data rate of 6.12 gigabits per second. Successful KVM units work by storing the previous video frame(s) and then comparing to the current video frame and compute just the differences from frame to frame. It is those differences that are then sent over the network along with Sync information and captured keyboard and mouse signals.
Since it takes quite a bit of special processing to prepare the video difference data plus the sync and key/mouse information into packets for transport over Ethernet it is necessary to utilize a special processor device to connect between your FPGA and the network. There are a number of companies that make these processors as specialty products that KVM manufacturers embed into their KVM units. Some of these devices may actually contain custom logic to replace the high speed FPGA that was previously mentioned. It is common that these KVM processors utilize an ARM 9 class CPU with dedicated special DMA engines to move the video difference data from the capture buffers to the network port.
KVM units often capture the mouse and keyboard as USB signals which have to be converted into appropriate format to be included with the video data sent over the network. The KVM type processors include the USB ports to support this capture.
As you may have now come to realize the high performance CPU in the KVM processor needs to have a good bit if software embedded into it once you get past the task of getting all the hardware parts of the design together. On projects that I've worked on it took a talented team of multiple software developers a year or more to get all the software honed out for a successful KVM.
I hope this helps you see what steps you may be in for if you decide to embark on the exciting path to make your own remote KVM unit in order to save the few hundred dollars of purchasing a ready made one. If you decide to forego the design work and purchase in the end that is OK too. May I suggest that you search out and take a look at the Lantronix Spider product. This is a good IP type KVM unit that I have used and bundled in with another product that I had designed. Note that I have no affiliation with Lantronix other than being a user/customer.