As an interview question, it's usually asked just about the technical bits of doing an in-place swap of 8-bit items to reverse their order (regardless of what characters those might actually represent).
At the same time, especially if you're interviewing a relatively senior person, you could at least hope to hear some questions about the specification and the exact form of the input. Even if you direct them back to the simple case of just swapping 8-bit items, knowing whether or not they think in broader terms than that may be valuable.
If you do have to deal with a broad range of inputs, you just about have to think in terms of a "stack", a bit like a network stack. You have to build your software in a number of layers, each of which applies a fairly specific set of transforms in a specific order. This lets you keep each part of the transformation simple enough that you can keep it under control, and stand a reasonable chance of making it meet its requirements.
I'll outline one possibility that I have found at least somewhat workable. I'm the first to admit that there may be others who have better ideas though. At least to me, this seems a bit like brute-force engineering, with little real elegance.
You normally want to start by converting any other representation to UCS-4 (aka UTF-32). For this, you'd generally prefer to rely on input from the user than attempt to figure it out on your own. In some cases, you can be sure a particular sequence of octets does not follow the rules of a particular encoding scheme, but you can rarely (if ever) be sure that it does follow a particular encoding scheme.
The next step is optional. You can normalize the input to one of the four Unicode normalization forms. In this case, you'd probably want to apply the "NFKC" transformation: compatibility decomposition followed by canonical composition. This will (where possible) convert combining diacritical forms (such as the U+301 that Jon mentioned) into single code points (e.g., an "A" with a "U+301" would be converted to "Latin capital A with acute", U+00C1).
You then walk through all the characters from beginning to end, breaking the string into actual characters -- and if there are (still) combining diacritic marks, keeping them with the characters they modify. The result of this will typically be an index of the actual characters in the string, such as the position and length of each.
You the reverse the order of those complete characters, typically by using the index you created in the previous step.
You then (again, optionally) apply another Unicode normalization process, such as NFD (canonical decomposition). This will turn the aforementioned "Latin A with acute" back into two code points -- a "Latin capital A" and a "combining Acute". If your input happened to contain a U+00C1 to start with, however, it would also convert that into two code points as well.
You then encode the sequence of UCS-4 code points into the desired encoding (UTF-8, UTF-16, etc.)
Note that the Unicode normalization steps can/will change the number of code points needed to store the string, so if you include those, you can no longer plan on the result string fitting into the original storage. Obviously enough, the resulting code points may not correspond directly to the input code points either.