UTF-8 was specifically designed to be forwards- and backwards-compatible with ASCII, specifically it has these two properties:
- the encoding of characters within the ASCII character set is the same in UTF-8 as it is in ASCII
- all other codepoints are encoded as a sequence of 2-6 octets, all of which have their high-order bit (8th bit) set; since ASCII only uses 7 bits and always has the 8th bit unset, a single-octet ASCII character can never be mistaken for a part of a multi-octet sequence and vice versa
So, assuming that newlines work reliably for you using ASCII, they will also work reliably using UTF-8.
You will have to deal with different newline conventions of different operating systems, either by accepting all of \r\n
(DOS, Windows), \r
(Classic MacOS), and \n
(Unix), or by specifying one and only one (the Internet Standards all use \r\n
, because they are treated as a newline by all OSs, with maybe some additional garbage attached). And this is not even taking into account the various non-ASCII newline characters defined in Unicode.
However, there is a problem: newlines are valid characters in JSON; they can appear in between any two tokens and are ignored as whitespace
AFAICS, it is not that easy to find a character that is guaranteed to not appear in JSON. The spec is a bit vague, it talks about "whitespace" being allowed, but it does not specify what "whitespace" actually means.
One way to get around this, is to enclose the JSON documents into a JSON list, essentially making the JSON objects just elements of an outer JSON array.
Another way would be to switch to a different language: as of version 1.2, YAML is a proper superset of JSON, meaning that every valid JSON document is also a valid YAML document. One of the features YAML has that JSON doesn't, is a document end marker that allows you to put multiple documents into the same bytestream. So, if you just insert a YAML document end marker in between your JSON documents, you have a valid stream consisting of multiple YAML documents.