Data Exchange in Embedded Systems – JSON vs Protocol Buffers

Embedded Engineer

Visuals

Dubravko Tuksar

Published Mar 15, 2023

Both JSON and Protocol Buffers can be used to exchange data between systems. The main difference between the two is that JSON is just text, while Protocol Buffers are binary.

This difference can have a significant impact on the performance and speed of information transfer between different devices. In this blog post, the speed of information transfer refers mainly to the size of the transmitted message, while the underlying protocol used to transmit and receive messages is a separate issue currently out of our scope. Basically, the smaller the message, the faster the transmission and reception.

However, the differences go beyond performance and speed of transfer, and another important aspect to consider is the ease of use. Depending on system requirements and priorities, either JSON or Protocol Buffers might prove to be the better option.

We’ll use the following message in C++ as the basis for our comparison:

class TemperatureSensorReading
{
public:
    TemperatureSensorReading(const char* id, int value) : sensor_id(id), sensor_value(value) {};

    int sensorId;
    float sensorValue;
};

TemperatureSensorReading sensorReading("temperature_sensor_1", 24);
serialize_and_send(&sensor_reading);

JSON

JSON is a human-readable data interchange format based on key-value pairs contained inside of an object. Each JSON message must have at least one root object that contains key-value pairs or other objects. There are several data types in JSON, and the value of a key-value pair can be any one of them:

object

array

string

number

boolean

null

The key part of a key-value pair must always be a string type.

To send messages using JSON, we first need to create the data object. In C, this is a structure filled with values. The next part is serializing this object into the string. This string is then sent using the communication protocol of our choice.

To receive messages using JSON, we have to parse the received strings into objects that represent data in JSON messages. This means that in order to use JSON in your application, you only need a JSON parser that parses and serializes JSON messages.

Most popular programming languages have native or third-party support for JSON. For example, Python has a native JSON parser, while C/C++ uses third-party libraries for JSON parsers.

If we want to use JSON to send the message from our example, we’ll first have to serialize the object representing the message into a JSON-formatted string. This string is then transmitted. If we use a logic analyzer to listen for data on the transmission lines, we’ll see the next string to be transmitted:

{"sensorId": 32,"sensorValue": 24.5}

This string is 36 characters long, but the information content of the string is only 6 characters long. This means that about 16% of transmitted data is actual data, while the rest is metadata. The ratio of useful data in the whole message is increased by decreasing key length or increasing value size, for example, when using a string or array.

Protocol Buffers

Protocol Buffers are Google’s mechanism for serializing structured data that uses a binary format to transfer messages. Because of their small memory footprint, Protocol Buffer messages can also be used for data storage, especially on devices with limited memory, like embedded systems.

For example, if an embedded system needs to buffer messages in the flash or EEPROM memory because the connection is down, it’s convenient to store Protocol Buffer messages right in that memory. There’s no need to convert them into more basic data structures as they are already designed to be compact.

Using Protocol Buffers in your code is slightly more complicated than using JSON. With JSON, all we need to do is include a library, unless of course we are writing the parser ourselves, while Protocol Buffers require a couple more steps. The user must first define a message using the .proto file. This file is then compiled using Google’s protoc compiler, which generates source files that contain the Protocol Buffer implementation for the defined messages.

This is how our message would look in the .proto definition file:

message TemperatureSensorReading {
    optional uint32 sensor_id = 1;
    optional float sensor_value = 2;
}

Google supports most common languages like Python, Java, and C++. Unfortunately, there is no official support for C even though it’s a language widely used for embedded development, but there are third-party implementations that offer Protocol Buffers for C.

When we serialize the message from our example, it’s only 7 bytes long. This can be confusing at first because we would expect uint32 and float to be 8 bytes long when combined. However, Protocol Buffers won’t use all 4 bytes for uint32 if they can encode the data in fewer bytes. In this example, the sensor_id value can be stored in 1 byte. It means that in this serialized message, 1 byte is metadata for the first field, and the field data itself is only 1 byte long. The remaining 5 bytes are metadata and data for the second field; 1 byte for metadata and 4 bytes for data because float always uses 4 bytes in Protocol Buffers. This gives us 5 bytes or 71% of actual data in a 7-byte message.

Field types

The field can have any of the data types from the list below. These data types determine how the data is encoded, or in other words, how many bytes the specified field takes up in the encoded message.

float, double

int32, int64

uint32, uint64

sint32, sint64

fixed32, fixed64

sfixed32, sfixed64

bool

string

Bytes

enum

Each of the above types is best suited for a certain type of data. For example, both int32 and sint32 can encode signed integers but sint32 is more efficient when encoding negative numbers, so if you only use negative numbers in the specified field, it is better to use sint32 or sint64. If you’re using both positive and negative numbers, then you should use int32 or int64. The enum type allows us to specify a list of possible values.

Depending on the language used, the types in the generated code may also differ. For example, in Python, the bytes type is str type in Python2 and bytes type in Python3. Bytes are also converted into strings in C++, PHP, and ByteString in Java.

Most of the above types are variable-length. This means that even if the type is named int32, it might not take 4 bytes. This is the property that enables Protocol Buffer messages to have a small memory footprint. The exceptions are fixed32, fixed64, sfixed32, and sfixed64, which are 4 and 8 bytes.

Protocol Buffers allow for message nesting, which means that we can define one message and use it as a field type in a different one.

Field rules

These rules define how to fill the fields when creating the message. If we fill the fields in a different way from the one specified in the field rules, the message won’t be formed well, and depending on the version of the encoder, it might fail at encoding. There are three types of field rules:

required

optional

repeated

The optional rule is the most common one because it offers the most flexibility. With this rule, messages can be sent without setting some of the fields. If we have a message with a number of fields and we rarely need to send all of them, we can fill only those with changing values while leaving the rest unfilled. This will, in turn, decrease the encoded message size.

Another use case for this rule is the situation when a field is no longer needed. Instead of updating the firmware, we can just stop sending that one field.

The required rules must be used carefully because it can be difficult to make changes to this field when changing a message. For example, this rule can be used for a version field, the field you can use when you’re defining your own message. This is useful because you can change the Protocol Buffer message structure later on and still receive old versions of a message, and this field helps us determine what version the device is using.

Field Numbers

Field numbers are used to identify fields in the received message since the message is in binary format. Once a field number is assigned to a field, it can’t be used for another field. If the field is removed, its field number should not be reused for another field.

The reasoning is that when a field is removed, some devices can send the message with this removed field, but when we receive it, the removed field won’t be in the data structure. This allows us to add new fields in the message and send it to the device without updating the firmware of the receiving device. The new fields will simply be ignored by the receiving device.

Numbers from 1 to 15 take 1 byte of memory and should be used for the most frequently used fields. Numbers from 16 to 2047 use 2 bytes and should be used for less frequent fields or very long ones, like long strings where the one extra byte doesn’t contribute much to the overall message size. This saves one byte field for more frequent fields we might need in the future.

JSON vs Protocol Buffers

When comparing the use of JSON and Protocol Buffers for communication purposes, we can observe some key differences. Getting started with JSON is simpler than with Protocol Buffers. To start using JSON, just download the library and start using it in the code. Protocol Buffers are more complex at first because they require us to define messages in .proto files, install the protoc compiler, compile these messages to generate headers and sources, and import libraries like nanopb for C. On the other hand, JSON messages take up more space when serialized, while Protocol Buffers messages take up less space than JSON messages when encoded.

Protocol Buffers offer a way to convert JSON to a Protocol Buffer message and vice versa. This allows for more flexibility when dealing with devices that communicate with each other, some using JSON and some using Protocol Buffers.

If we have a battery-powered embedded device, server, and web client, we can use Protocol Buffers for communication between the embedded device and the server, then convert the Protocol Buffer message to JSON on the server side, and then use it in communication between the server and the web client.

If we assume that the server application is written in Python and uses the message defined above, then we can receive a message, convert it into JSON, and save it in the NoSQL database, where we can later read it when a web client requests data.

# Import Protocol Buffer  JSON conversion package
from  google.protobuf import json_format
# This file is generated by protoc from .proto file.
import temperature_sensor_reading_pb2
...
while(True):
    # Receive Protocol Buffer message from the sensor.
    temperatureSensorReading = temperature_sensor_reading_pb2.TemperatureSensorReadingJSON()
    if (receive_message(temperatureSensorReadingJSON)):
        # Convert message to JSON.
        temperatureSensorReadingJSON = json_format.MessageToJson(temperatureSensorReading)

        # Save JSON to NoSQL database.
        save_to_database(temperatureSensorReadingJSON)
...
def GET_Handler():
    # Read from the NoSQL database directly in JSON format.
    temperatureSensorReadingJSON = read_from_database()
    # Send response.
    send_resp(temperatureSensorReadingJSON)
...

The choice is yours

Since both protocols are valid options, the choice between JSON and Protocol Buffers will depend on the use case. We should first think about the needs of our embedded device that communicates with the app or the backend in terms of how much data is transmitted and what kind of data is transmitted, for example.

With embedded devices that do not communicate heavily with applications or the backend, there’s not much difference what protocol you decide to use since only a small portion of processing power is diverted to the communication itself. Of course, there can be exceptions, like when the device is battery-powered and communicates using some wireless method. In that case, it would be preferable to use Protocol Buffers to minimize radio transmission time, as this operation can consume a significant amount of energy.