8

In my UART communication I need to know the start byte and the stop byte of the message sent. The start byte is easy but the stop byte, not so much. I have implemented two stop bytes at the end of my message, that is \n and \r (10 and 13 decimal). UART only works on bytes 0-255 values so how fail-safe is this? I can imagine, though low probability, that my message might contain the values "10 and 13" after each other when they are not the stop bytes.

Is there a better way to implement this?

Nick Alexeev
  • 37,739
  • 17
  • 97
  • 230
C. K.
  • 500
  • 5
  • 17
  • 7
    To send arbitrary data you either have to go to using packets or byte stuffing. In your case the probability of the pattern appearing in a certain location is 1/65536. Which gets to 1 if you have a long enough random data stream. – Oldfart Apr 18 '19 at 10:36
  • 4
    Can you provide context please. Stop bits are part of UART communication but stop bytes? This sounds like a pure software issue and depends what has been agreed by the sender and receiver. – Warren Hill Apr 18 '19 at 10:39
  • If your implementation is text *only*, use a single null (0) terminator. Otherwise use a packet structure like @Oldfart mentions. If you're concerned about integrity, then consider [error correction](https://en.wikipedia.org/wiki/Error_detection_and_correction) – RamblinRose Apr 18 '19 at 10:39
  • @WarrenHill Yes, stop bits are part of the UART cummunication but I will be sending a string of data, and I need to know when that string stops. Or rather, I will be sending many strings of data, and I need to know when each string stops. – C. K. Apr 18 '19 at 10:44
  • @RamblinRose my data are number values, that can easily be 0. Then a 0 would terminate the string, right? – C. K. Apr 18 '19 at 10:45
  • 2
    @MariusGulbrandsen if your data is truly arbitrary and not strictly text (think ASCII) then null termination will not work; you will have to implement a packet. – RamblinRose Apr 18 '19 at 10:47
  • 1
    @MariusGulbrandsen as this is strictly a processing/software question I suggest searching on StackOverflow, e.g. "UART packet" – RamblinRose Apr 18 '19 at 10:51
  • @RamblinRose I will do that, thanks. I was also thinking: I'm using a MCU to communicate UART to Bluetooth. I can program my BT module like a microcontroller with extra IO pins. Can I use an IO pin to set "start" when High and "stop" when Low? Is this a good implementation as well? – C. K. Apr 18 '19 at 10:54
  • Given your application is bottlenecked by BLE serial characteristic change notification which is far, far slower than the UART linking the MCUs, escaping or hex encoding everything on the UART would be a simple solution. Of course the real best approach may be to use the Nordic BLE chip (especially one of its more recent descendants) to solve your entire problem, eliminating the need to communication with a second processor by UART. – Chris Stratton Apr 18 '19 at 14:33
  • @RamblinRose: I have encountered text streams with null bytes. Perhaps byte 0x1C? – Joshua Apr 18 '19 at 15:24
  • @Joshua the reality is any value might be chosen as a stop byte IFF it will never occur in the stream as data. It really depends on the programmer to decide how to implement. A {[length][data][terminal]} format is a fundamental thing. – RamblinRose Apr 18 '19 at 15:49
  • 4
    BTW: That common practice is to put the carriage return _before_ the line feed: `"\x0D\x0A"`. – Adrian McCarthy Apr 18 '19 at 21:16
  • 3
    @AdrianMcCarthy I think the point of reversing it is to minimize the odds of it being a valid sequence. That said, two Windows line-endings in a row would give you `\r\n\r\n` which contains the `\n\r` sequence in the middle... – Mike Caron Apr 19 '19 at 16:05
  • 2
    I'd go back in history and use something that is proven and just works. Like the [XMODEM](https://en.wikipedia.org/wiki/XMODEM) protocol. It works. Anything that you try to create yourself is unlikely to work, unless you have a deep understanding of the subject or are extremely clever. – gnasher729 Apr 19 '19 at 11:43
  • 2
    Or [Kermit](https://en.wikipedia.org/wiki/Kermit_(protocol)). It is still in use today due to its simplicity. – Peter Mortensen Apr 19 '19 at 12:38
  • related: Serial protocol delimiting/synchronization techniques https://electronics.stackexchange.com/questions/186254/serial-protocol-delimiting-synchronization-techniques – davidcary Apr 23 '19 at 01:03

5 Answers5

15

There are different ways to prevent this:

  • Make sure you never send a 10/13 combination in your regular messages (so only as stop bytes). E.g. to send 20 21 22 23 24 25:

20 21 22 23 24 25 10 13

  • Escape 10 and 13 (or all non ASCII characters with an escape character e.g. . So to send 20 21 10 13 25 26 send: (see comment of/credits for: DanW)

20 21 1b 10 1b 13 25 26

  • Define a packet when sending messages. E.g. if you want to send message 20 21 22 23 24 25 than instead add the number of bytes to sent, so the package is:

< nr_of_data_bytes > < data >

If your messages are max 256 bytes send:

06 20 21 22 23 24 25

So you know after receiving 6 data bytes that is the end; you don't have to send a 10 13 afterwards. And you can send 10 13 inside a message. If your messages can be longer, you can use 2 bytes for the data size.

Update 1: Another way of defining packets

Another alternative is to send commands which have a specific length and can have many variances, e.g.

10 20 30 (Command 10 which always has 2 data bytes)

11 30 40 50 (Command 11 which always has 3 data bytes)

12 06 10 11 12 13 14 15 (Command 12 + 1 byte for the number of data bytes that follow)

13 01 02 01 02 03 ... (Command 13 + 2 bytes (01 02 for 256 + 2 = 258 data bytes that follow)

14 80 90 10 13 (Command 14 that is followed by an ASCII string ending with 10 13)

Update 2: Bad connection/byte losses

All of the above only work when the UART line is sending bytes correctly. If you want to use more reliable ways of sending, there are also many possibilities. Below are a few:

  1. Sending a checksum within the package (check google for CRC: Cyclic Redundancy Check). If the CRC is ok, the receiver knows the message has been sent ok (with high probability).
  2. If you need a message to be resent, than an acknowledgement (ACK/reply) mechanism needs to be used (e.g. sender sends something, receiver receives corrupt data, sends a NACK (not acknowledged), sender can than send again.
  3. Timeout: In case the receiver does not get an ACK or NACK in time, a message needs to be resend.

Note that all above mechanism can be simple or as complicated as you want (or need) to be. In case of resending message, also a mechanism for identifying messages is needed (e.g. adding a sequence number into the package).

Michel Keijzers
  • 13,867
  • 18
  • 69
  • 139
  • 1
    "Make sure you never send a 10/13 combination in your regular messages (so only as stop bytes)." – you've not said how to send data which *does* include a 10/13 combination – you need to escape it. So "20 10 13 23 10 13" might be sent as "20 1b 10 1b 13 23" with 1b as your escape character. – Dan W Apr 18 '19 at 18:33
  • 1
    Note that using a length field as proposed, you’ll get in trouble when your serial link is bad and loses a single byte. Everything will go out of sync. – Jonas Schäfer Apr 18 '19 at 19:55
  • @DanW If you use the first one or 2 bytes as the number of data bytes, it does not matter if 10 or 13 are part of those data... So 20 10 13 23 10 13 can be send as 06 20 10 13 23 10 13 where 06 is the number of data bytes that follow. – Michel Keijzers Apr 18 '19 at 22:08
  • @MichelKeijzers - yes, but that’s the second solution you mention. Your first solution is missing an explanation of escape sequences to prevent the stop bytes being transmitted. – Dan W Apr 18 '19 at 22:25
  • Both approaches work, and are commonly used, but they have different advantages and disadvantages, which you could add if wanted, though it’s beyond what the OP asked for. – Dan W Apr 18 '19 at 22:27
  • @DanW Ah now I understand you ... yes, it is under the assumption that only ASCII characters are sent (otherwise the second option could be used, but your way would be another alternative. I will add it too and thanks for your input. – Michel Keijzers Apr 18 '19 at 22:40
14

How fail-safe is \n\r as stop bytes?

If you send send arbitrary data -> probably not fail-safe enough.

A common solution is to use escaping:

Let's define that the characters 0x02 (STX - frame start) and 0x03 (ETX - frame end) need to be unique within the transmitted data stream. This way the start and the end of a message can be safely detected.

If one of these characters should be send within the message frame, it is replaced by prefixing an escape character (ESC = 0x1b) and adding 0x20 to the original character.

Original character replaced by

0x02 -> 0x1b 0x22  
0x03 -> 0x1b 0x23  
0x1b -> 0x1b 0x3b  

The receiver reverses this process: Anytime he receives an escape character, this character is dropped and the next character is subtracted by 0x20.

This only adds some processing overhead but is 100% reliable (assuming no transmission errors occur, which you could/should verify by additionally implementing a checksum mechanism).

Rev
  • 10,017
  • 7
  • 40
  • 77
  • 1
    Nice answer. The common escape character used for ASCII protocols was `'\x10'` DLE (Data Link Escape). Some of the Wikipedia pages suggest that DLE was often used in the opposite way: to say that the next byte was a control character rather than a data byte. In my experience, that's generally the opposite meaning for an escape. – Adrian McCarthy Apr 18 '19 at 21:23
  • 2
    One thing to watch our for here is that your worst case buffer size doubles. If memory is really tight that might not be the best solution. – TechnoSam Apr 18 '19 at 21:23
  • 1
    @Rev What's the rationale for adding 0x20 to the original character? Wouldn't the escaping scheme work without that just as well? – Nick Alexeev Apr 19 '19 at 21:02
  • 1
    @NickAlexeev: It is easier/faster to identify the actual frame boundaries if you remove any other occurrence of the reserved chars from the stream. That way, you can seperate frame reception and frame parsing (including the un-escaping). This may be especially relevant, if you have a very slow controller without FIFO and/or high data rates. So you can just copy the incoming bytes (between STX/ETX) into the frame buffer as they arrive, mark the frame as complete and do the processing with lower priority. – Rev Apr 21 '19 at 18:08
  • @TechnoSam: Good point. – Rev Apr 21 '19 at 18:08
5

You know, ASCII already has bytes for these functions.

  • 0x01 : start of heading -- start byte
  • 0x02 : start of text -- end headers, begin payload
  • 0x03 : end of text -- end payload
  • 0x04 : end of transmission -- stop byte
  • 0x17 : end of transmission block -- message continues in next block

It also has codes for various uses inside the payload.

  • 0x1b : escape (escape the next character -- use in payload to indicate next character is not one of the structure describing codes used in your protocol)
  • 0x1c, 0x1d, 0x1e, 0x1f : file, group, record, and unit separator, respectively -- used as simultaneous stop and start byte for parts of hierarchical data

Your protocol should specify the finest granularity of ACK (0x06) and NAK (0x15), so that negative acknowledged data can be retransmitted. Down to this finest granularity, it is wise to have a length field immediately after any (unescaped) start indicator and (as explained in other answer(s)) it is wise to follow any (unescaped) stop indicator with a CRC.

Eric Towers
  • 219
  • 1
  • 3
  • I will be sending arbitrary data, I guess it might have been confusing to use "\n\r" in my question when I'm not sending ASCII data. Even though, I like this answer, it's very informative on sending ASCII over UART – C. K. Apr 19 '19 at 21:50
  • @MariusGulbrandsen : As long as your protocol establishes where payload is and which codes must be escaped in each payload section, you can send anything, not just text-ish data. – Eric Towers Apr 19 '19 at 22:57
4

UART is not fail-safe by its very nature - we are talking about 1960s technology here.

The root of the problem being that UART only syncs once per 10 bits, allowing a lot of gibberish to pass between those sync periods. Unlike for example CAN which samples every individual bit multiple times.

Any double bit error occurring inside the data will corrupt an UART frame and pass undetected. Bit errors in start/stop bits may or may not get detected in the form of overrun errors.

Therefore, no matter if you use raw data or packets, there is always a probability that bit flips caused by EMI result in unexpected data.

There exist numerous ways of "traditional UART quackery" to improve the situation ever so slightly. You can add sync bytes, sync bits, parity, double stop bits. You could add checksums that count the sum of all bytes (and then invert it - because why not) or you could count the number of binary ones as a checksum. All of this is widely used, wildly unscientific and with a high probability of missing errors. But this was what people did from 1960s to 1990s and lots of weird things like these lives on today.

The most professional way to deal with safe transmission over UART is to have a 16 bit CRC checksum at the end of the packet. Everything else isn't very safe and has a high probability of missing errors.

Then on the hardware level you can use differential RS-422/RS-485 to drastically improve ruggedness of the transmission. This is a must for safe transmission over longer distances. TTL level UART should only be used for on-board communication. RS-232 should not be used for any other purpose but backwards compatibility with old stuff.

Overall, the closer to the hardware your error detection mechanism is, the more effective it is. In terms of effectiveness, differential signals add the most, followed by checking for framing/overrun etc errors. CRC16 adds some, and then "traditional UART quackery" adds a little bit.

Lundin
  • 17,577
  • 1
  • 24
  • 67
  • 7
    This advice is fairly tangential - you haven't actually addressed the question asked. In particular, your proposed solutions may solve other problems, **but they do not solve the basic problem of the question on this page**, which is confusion between framing byes and payload byes. At most, your proposal would reject *valid* data embedding a framing byte due to CRC or similar failure, with no way to communicate such. – Chris Stratton Apr 18 '19 at 14:28
  • 3
    In fact, this answer makes it worse. The original had just data bytes and stop bytes. This adds a third category, CRC bytes. And as presented here, those can take on any value, including {10,13}. – MSalters Apr 18 '19 at 15:53
  • 1
    @MSalters: The CRC can be ASCII encoded hex to prevent this issue. Another trick that I've seen on RS485 is to set bit 7 on the start / address byte. – Transistor Apr 18 '19 at 16:03
  • Re *"CAN which samples every individual bit multiple times."*: The actual sampling of the bit value is only once per bit. What are you referring to here? Some kind of error checking, like by the sender? Clock synchronisation? – Peter Mortensen Apr 19 '19 at 12:33
  • The inverting of the checksum was done so that summing the entire block of data would result in a zero, which is a bit easier to code and a bit faster to execute. Also, CRC is much better than you make it out to be, look it up in the Wikipedia. – toolforger Apr 19 '19 at 22:04
  • @ChrisStratton The question asks how fail safe a certain UART protocol is, so it is pretty important to point out that no matter what you do, it is not particularly safe at all. Or else the OP might run off to use it for something mission-critical, as we've seen over and over again. Since the site is about _engineering_, we have a responsibility to point this out, since historically, most home-brewed UART protocols were based on quackery rather than probability theory. – Lundin Apr 23 '19 at 06:46
  • Well, you misunderstand this site. You are welcome to make *additional* recommendations in an answer, but it is **not an answer** if it does not address the question which was actually asked. And your posting *simply ignores* what the asker actually wanted to know. At a practical level, the poster's scheme *will not work at all* until they solve what they want to know. While many practical schemes *do not have to deal with* what you are trying to add. – Chris Stratton Apr 23 '19 at 06:47
  • @ChrisStratton Q: "How fail-safe is \n\r as stop bytes?" A: Not at all safe for any purpose. – Lundin Apr 23 '19 at 06:48
  • Well, unlike your proposal they actually work *most of the time*. Your proposal will not work at all, **because you ignore the actual issue the question is seeking to solve**. – Chris Stratton Apr 23 '19 at 06:50
  • @Transistor No, you cannot send a FCS as ASCII. The CRC polynomial is designed to meet the probability of various errors not only in the payload but in the FCS itself. If you ASCII-encode that, a single bit error might screw over the "hamming distance", so that a single bit error results in drastic changes of data. – Lundin Apr 23 '19 at 06:56
  • @PeterMortensen Well, once per bit edge and once per sync point. The SJW allowing it to differ a bit. – Lundin Apr 23 '19 at 06:57
0

... I can imagine, though low probability, that my message might contain the values "10 and 13" after each other when they are not the stop bytes.

A situation when a portion of data is equal to terminating sequence should be considered when designing the format of a serial data packet. Another thing to consider is that any character can get corrupted or lost during transmission. A start character, a stop character, a data payload byte, a checksum or CRC byte, a forward error correction byte aren't immune to corruption. The framing mechanism has to be able to detect when a packet has corrupt data.

There a several ways to approach all this.

I'm making the working assumption that packets are framed only with the serial bytes. Handshake lines aren't used for framing. Time delays aren't used for framing.

Send packet length

Send the length of the packet in the beginning, instead of [or in addition to] the terminating character at the end.

pros: Payload is sent in a efficient binary format.

cons: Need to know the packet length at the start of the transmission.

Escape the special characters

Escape the special characters when sending the payload data. This is already explained in a an earlier answer.

pros: Sender doesn't need to know the length of the packet at the beginning of the transmission.

cons: Slightly less efficient, depending on how many payload bytes need to be escaped.

Payload data encoded such that it can't contain start and stop characters

The payload of the packet is encoded such that it can't contain the start or stop characters. Usually, this is done by sending numbers as their ASCII or Hex-ASCII representation.

pros: Human-readable with common terminal programs. No need for code to handle escaping. No need to know the length of the packet at the start of the transmission

cons: Lower efficiency. For one byte of payload data, several bytes are sent.

Nick Alexeev
  • 37,739
  • 17
  • 97
  • 230