Protocol Framing: Teaching Mobile Engineers to Speak Firmware
Making sense of headers, delimiters, and byte arrays when mobile apps talk to IoT devices

Seasoned Software Engineer with 11 years of experience specializing in native iOS development using Swift, SwiftUI and UIKit. Additional expertise in cross-platform mobile development with React Native and backend API development using Django REST Framework. Proficient in Swift, JavaScript, and Python.
Throughout my career, I have balanced roles as a team lead, mentor, code architect, individual contributor and solo developer. I have contributed to products across diverse environments, from early-stage startups to publicly listed companies, spanning industries such as AI, IoT, Travel & Hospitality, Ride Hailing, Navigation, E-commerce and Streaming. .
Currently I am exploring possibilities in the emerging fields of AI and AR/VR, by developing applications in Generative AI and Vision OS, via personal projects.
Introduction
When I was first tasked with building communication between an iOS application and an electric scooter over Bluetooth, I thought it would be straightforward. My plan was simple: use Apple’s Core Bluetooth framework to pair with the scooter and just send string commands like "lock" and "unlock". Easy, right?
Well, not quite.
Very quickly, I was pulled into the world of message encryption, authentication, and byte array communication - the nitty-gritty of IoT protocols. What I expected to be simple string messages turned out to be arrays of strange-looking numbers. To make sense of them, I had to revisit the foundations of software engineering and refresh my understanding of the binary world of communication.
It didn’t stop there. Working with these number arrays brought its own set of challenges. To me, they looked like a jumble of numbers. But to the firmware engineer, each array was a structured packet with strict rules. That disconnect led to hours of frustrating debugging, only to discover that the problem was a basic byte-ordering issue.
That’s when it hit me: understanding these fundamentals isn’t optional, it’s essential. Protocol framing and packet structure are the foundation of all communication protocols, whether in IoT, networking, or operating systems. Without this understanding, you’re bound to hit confusing roadblocks. You send what looks like perfectly valid data to a smart lock, a fitness tracker, or a connected lightbulb, only to have the device ignore you, or worse, respond in unexpected ways.
As mobile developers, we often work with frameworks that hide the complexities of networking. That abstraction is convenient, but it becomes a trap when we enter the world of IoT, where we may need to design custom protocols or work directly with firmware engineers. Here, the device only speaks the language of binary, and without understanding framing, we’re essentially speaking gibberish.
That is why learning protocol framing is so important when stepping into IoT. Framing is the grammar that turns raw bytes into meaningful conversations. Think of it as the difference between throwing random words at someone versus speaking in complete, structured sentences. Without proper framing, your carefully crafted commands are just noise. With it, you gain the ability not only to use byte arrays from documentation but also to design, debug, and reason about communication protocols like a firmware engineer.
Binary and Hexadecimal: The Native Tongues of Bytes
Before we dive into packet structures, let's establish our foundation. All digital communication, whether it's your React Native app talking to a server or your Swift code sending commands over Bluetooth, ultimately boils down to ones and zeros flowing through circuits.

Binary Basics: The Foundation of Everything Digital
Binary is the language of electronics. A single bit can be either 0 (off/low voltage) or 1 (on/high voltage). Group eight bits together, and you have a byte, the standard unit of data we work with. Here's what the number 202 looks like in different representations:
Binary: 11001010 (8 bits = 1 byte)
Decimal: 202 (what we humans prefer)
Hex: 0xCA (what engineers prefer)
Why does this matter? When your mobile app sends a "lock" command to a smart lock, it's not sending the string "lock". It's sending a specific pattern of bits that the firmware interprets as the lock command (like you and your firmware engineer may agree on something like to interpret the decimal number 1 - which in binary will be 0001 and hex 0×01 - as the lock command). Understanding this helps you debug when things go wrong.
Hexadecimal: The Engineer's Shorthand
At the lowest level, computers speak only in binary - long streams of 0s and 1s. While machines thrive on that, humans don’t. Reading or debugging a binary sequence like 11001010 quickly becomes overwhelming.
That’s where hexadecimal (base 16) comes in. It acts as a shorthand, mapping neatly onto binary while remaining far more compact and readable:
1 hex digit = 4 bits (a nibble)
2 hex digits = 1 byte
0xCA = 1100 1010
C = 1100 (12 in decimal)
A = 1010 (10 in decimal)
This clean mapping makes hex perfect for representing byte arrays. When you see a BLE characteristic value like 0x0102030405, you're looking at 5 bytes of data. Compare these representations of the same 4-byte sequence:
Binary: 00000001 00000010 00000011 00000100 (hard to read!)
Decimal: [1, 2, 3, 4] (loses the "byte-ness")
Hex: [0x01, 0x02, 0x03, 0x04] (clean and clear)
The hex version is both shorter and perfectly aligned with byte boundaries, making it the preferred format when working with memory dumps, byte arrays, and packet data. This is why network analyzers like Wireshark, BLE debugging tools, and firmware documentation all prefer hex dumps. When you're debugging why your app can't communicate with a device, being comfortable reading hex will save you hours of frustration.
Byte Array Communication: Speaking in Packets
At its core, IoT communication is about sending sequences of bytes (byte arrays) between devices. These aren't random bytes; they're carefully structured packets that both sender and receiver understand.
Let's start with a simple example. Imagine you're building an app to control a smart lock over BLE. The lock might expect commands like this:
Lock command: [0x01, 0x02, 0x03, 0x04]
Unlock command: [0x05, 0x06, 0x07, 0x08]
Status query: [0x09, 0x0A, 0x0B, 0x0C]
Your Swift code might look something like this:
// Sending a lock command
let lockCommand: [UInt8] = [0x01, 0x02, 0x03, 0x04]
peripheral.writeValue(Data(lockCommand),
for: characteristic,
type: .withResponse)
This works fine for simple commands, but what happens when you need to send more complex data? What if you want to update the lock's firmware, send a schedule for automatic locking, or transmit user access codes? Suddenly, you need more structure. You need protocol framing.
Other IoT devices follow similar patterns. A smart bulb might expect color commands as [0xC0, R, G, B] where 0xC0 identifies it as a color command followed by RGB values. A fitness tracker might send heart rate data as [0xHR, timestamp_byte1, timestamp_byte2, bpm]. The key insight is that these aren't arbitrary; they follow agreed-upon structures.
Why Protocol Framing is Needed
Without protocol framing, byte streams are like receiving a book with no punctuation, spaces, or paragraph breaks. Imagine trying to read: "thequickbrownfoxjumpsoverthelazydog" versus "The quick brown fox jumps over the lazy dog." The same principle applies to byte communication.
Protocol framing ensures that both sender and receiver agree on three critical things: where messages start, where they end, and what structure they follow. Without this agreement, the receiver can't distinguish between one command ending and another beginning, can't verify if data arrived correctly, and can't handle variable-length data efficiently.
Framing Strategies with Examples
Different scenarios call for different framing strategies. Let me walk you through the four main approaches you'll encounter:
1. Fixed-Length Framing
Every message is exactly the same size. Simple but inflexible.
// Every command is exactly 4 bytes
Lock: [0x01, 0x00, 0x00, 0x00]
Unlock: [0x02, 0x00, 0x00, 0x00]
Status: [0x03, 0x00, 0x00, 0x00]
This works great for simple commands but falls apart when you need to send variable data like strings or firmware updates.
2. Length-Based Framing
Include the message length at the beginning.
// [Length][Type][Data...]
Small command: [0x04, 0x01, 0xAA, 0xBB] // 4 bytes total
Large command: [0x10, 0x02, 0xAA, 0xBB, 0xCC, ...] // 16 bytes total
This is extremely common in IoT because it handles variable-length data elegantly. The receiver reads the length byte first, then knows exactly how many more bytes to expect.
3. Delimiter-Based Framing
Use special bytes to mark start and end.
// Using 0x7E as start/end delimiter
[0x7E, 0x01, 0x02, 0x03, 0x04, 0x7E]
This approach needs escape sequences if your data might contain the delimiter value, adding complexity but providing clear message boundaries.
4. Hybrid Framing
Combine multiple strategies for robustness.
// [Start][Length][Type][Data...][CRC][End]
[0x7E, 0x08, 0x01, 0xAA, 0xBB, 0xCC, 0xDD, 0x45, 0x67, 0x7E]
This is what you'll often see in production IoT protocols. It provides multiple ways to verify message integrity and boundaries.
BLE Small Commands vs Firmware Updates
The difference between sending a simple "unlock" command and performing a firmware update illustrates why flexible framing matters. A lock command might be just 4 bytes and fit in a single BLE packet. But a firmware update could be 100KB or more, requiring sophisticated framing to handle chunking, sequencing, acknowledgments, and error recovery.
For firmware updates, you might see a protocol like:
// Start transfer command
[0xFF, 0x01, total_size_4_bytes, chunk_size_2_bytes]
// Each chunk
[0xFF, 0x02, chunk_number_2_bytes, chunk_data..., crc_2_bytes]
// End transfer command
[0xFF, 0x03, total_crc_4_bytes]
This structure allows the firmware to verify each chunk individually, request retransmission of corrupted chunks, and verify the entire update at the end.
Protocol Framing Beyond IoT: The OSI Layer View
Protocol framing isn't unique to IoT; it's the backbone of all network communication. Understanding how framing works across the networking stack helps you see the bigger picture and apply these concepts more broadly.
In the OSI model, each layer adds its own framing around the data from the layer above. It's like Russian nesting dolls, where each layer wraps the previous one with its own structure:
Data Link Layer (Ethernet) creates frames that look like this:
[Preamble][Destination MAC][Source MAC][Type][Payload][CRC]
Network Layer (IP) adds packet structure:
[Version][Header Length][Type of Service][Total Length][ID][Flags][Fragment Offset]
[TTL][Protocol][Header Checksum][Source IP][Destination IP][Payload]
Transport Layer (TCP) adds its own headers:
[Source Port][Destination Port][Sequence Number][Acknowledgment Number]
[Data Offset][Flags][Window][Checksum][Urgent Pointer][Payload]
Application Layer (HTTP) uses text-based framing:
GET /data HTTP/1.1\r\n
Host: example.com\r\n
Content-Length: 27\r\n
\r\n
{"temp": 25, "status": "ok"}
Notice how that Content-Length header in HTTP is doing exactly the same job as a length field in IoT framing? Whether it's BLE, TCP/IP, or HTTP, all communication boils down to protocol framing - defining message boundaries and fields so both sides agree on the structure.
When your iOS app makes an API call, it's actually creating frames at multiple levels. Your JSON payload gets wrapped in HTTP framing, which gets wrapped in TCP segments, which get wrapped in IP packets, which get wrapped in Ethernet frames. Each layer's framing serves a specific purpose, from routing (IP) to reliability (TCP) to application semantics (HTTP).
What Can Go Inside a Packet?
Now that we understand why framing matters, let's explore the components you'll commonly find in IoT packets. Think of these as the building blocks you can combine to create robust communication protocols:
Delimiters mark the boundaries of your message. Common choices include 0x7E (used in many serial protocols) or 0xAA55 (a pattern that's easy to spot in hex dumps).
Headers contain metadata about the message: things like protocol version, message type, or device ID. Headers help receivers quickly determine how to process the incoming data.
Length fields tell the receiver how many bytes to expect. This can be the total packet length or just the payload length, depending on your protocol design.
Type fields identify what kind of message this is. Is it a command, a response, a notification? Your firmware can quickly route messages to the right handler based on this field.
Nonces are random numbers that ensure each packet is unique, preventing replay attacks (we'll dive deeper into this soon).
Sequence numbers maintain message ordering and detect missing packets. Critical when you're sending multi-part messages or need guaranteed delivery.
Timestamps provide temporal context and can detect stale messages. Especially important for time-sensitive commands or sensor readings.
Key IDs or Session IDs specify which encryption key or session context to use. Essential when managing multiple simultaneous connections or rotating keys.
Payloads carry your actual data: the temperature reading, the lock command, the firmware chunk.
CRCs and Checksums detect transmission errors (but don't provide security).
MACs and Authentication Tags provide both integrity checking and authentication (cryptographically secure).
Here's what a complete packet might look like with all these components:
[Start: 0x7E] // Delimiter
[Length: 0x0A] // 10 bytes following
[Type: 0x01] // Command type: Lock
[Nonce: 0x1234ABCD]// Replay protection
[Sequence: 0x0001] // First message
[Payload: 0xAABB] // Actual command data
[Auth Tag: 0xDEAD] // Cryptographic verification
[End: 0x7E] // End delimiter
Big Endian vs Little Endian: Byte Order Matters
Here's a classic source of mobile-to-firmware miscommunication that's bitten every IoT engineer at least once: endianness. When you need to send a number larger than one byte, you have to decide which byte goes first.
Endianness determines how multi-byte values are stored and transmitted. Consider the 32-bit hexadecimal number 0x12345678. In memory or on the wire, this needs to be broken into four bytes. But which byte comes first?
Big-endian (also called network byte order) puts the most significant byte first:
Value: 0x12345678
Bytes: [0x12][0x34][0x56][0x78]
↑MSB LSB↑
Little-endian puts the least significant byte first:
Value: 0x12345678
Bytes: [0x78][0x56][0x34][0x12]
↑LSB MSB↑
This matters enormously in protocol framing. Let's say you're sending a length field with the value 16 (0x0010 in hex):
Big-endian: [0x00][0x10] = 16 decimal ✅
Little-endian: [0x10][0x00] = 4096 decimal ❌
That's a huge difference! Your firmware expects 16 bytes but your app just told it to expect 4096 bytes. The connection will likely timeout or crash.
Here's a real-world IoT example. Your app needs to send a Unix timestamp (1693849200) to schedule when a smart lock should engage:
// Timestamp: 1693849200 = 0x64F3F270
// Big-endian (network byte order)
[0x64, 0xF3, 0xF2, 0x70]
// Little-endian (common in embedded systems)
[0x70, 0xF2, 0xF3, 0x64]
If your mobile app sends big-endian but the firmware expects little-endian, instead of scheduling the lock for September 4, 2023, you might schedule it for May 26, 1978!
The rule of thumb: your protocol specification must clearly state the endianness for every multi-byte field. When in doubt, network protocols typically use big-endian (that's why it's called network byte order), while many embedded systems use little-endian (especially ARM-based microcontrollers). Always verify with your firmware engineer or check the documentation carefully.
Nonce: The Number Used Once
A nonce might sound like cryptographic jargon, but it solves a very real problem in IoT: replay attacks. Imagine you're unlocking your smart door lock with your phone. An attacker with a BLE sniffer captures your unlock command: [0x05, 0x06, 0x07, 0x08]. Later, they replay that exact same command to unlock your door. Without a nonce, the lock can't tell the difference between your legitimate command and the attacker's replay.
A nonce ensures each packet is unique, even if it contains the same command. Here's how it works in practice:
// Without nonce (vulnerable to replay)
Unlock command: [0x05, 0x06, 0x07, 0x08]
Attacker replays: [0x05, 0x06, 0x07, 0x08] // Works! 😱
// With nonce (replay protected)
First unlock: [0x05, nonce=0x12345678, 0x06, 0x07, 0x08]
Second unlock: [0x05, nonce=0x87654321, 0x06, 0x07, 0x08]
Attacker replays: [0x05, nonce=0x12345678, 0x06, 0x07, 0x08] // Rejected! 🛡️
The firmware keeps track of used nonces and rejects any command with a repeated nonce. Some systems use incrementing counters (each nonce must be higher than the last), while others use random numbers with a cache of recently seen values.
Nonces become even more critical in scenarios like garage door openers, car key fobs, or any system where unauthorized replay could cause security breaches. Modern car keys use rolling codes (a type of nonce system) where each button press uses a different code, preventing thieves from recording and replaying your unlock signal.
Checksums, CRCs, MACs, and Authentication Tags
These terms often get confused, but they serve different purposes in protocol design. But they all belong to the category of Integrity & Authenticity mechanisms in protocol framing. They’re different techniques, but they all exist to answer one critical question: “Has this message arrived intact and unaltered and do I know it really came from the right sender?”
Why This Category Is Needed in Protocol Framing
Just like delimiters mark boundaries, or a nonce prevents replay, integrity/authenticity fields protect the packet against two main risks:
Accidental corruption
Bits can flip during transmission due to noise, interference, or hardware faults.
Example: You send
0xCA, the receiver reads0xC8.
Malicious tampering
Attackers might try to alter commands or inject fake packets.
Example: A scooter firmware update packet is intercepted and modified.
Without integrity checks, the receiver has no way to distinguish between valid packets and corrupted/malicious ones.
How the Solutions Differ Inside the Category
Checksums are the simplest form of error detection. They're typically just the sum of all bytes in the message, possibly with some simple operation applied. IPv4 headers use a checksum where you add up 16-bit words and take the one's complement. Checksums catch basic transmission errors but are easy to forge:
Data: [0x01, 0x02, 0x03]
Checksum: 0x01 + 0x02 + 0x03 = 0x06
Packet: [0x01, 0x02, 0x03, 0x06]
CRCs (Cyclic Redundancy Checks) provide stronger error detection using polynomial division. Ethernet frames, Modbus, and many IoT protocols use CRC-16 or CRC-32. CRCs catch more types of errors than checksums but still aren't secure—anyone can calculate a valid CRC:
Data: [0x01, 0x02, 0x03, 0x04]
CRC-16: 0x8B47
Packet: [0x01, 0x02, 0x03, 0x04, 0x8B, 0x47]
MACs (Message Authentication Codes) use a secret key to create a cryptographic signature. HMAC-SHA256 is common in protocols like IPsec and older TLS versions. Only someone with the key can create or verify a MAC, providing both integrity and authentication:
Data: [0x01, 0x02, 0x03, 0x04]
Key: "secret_key_shared_by_both_sides"
HMAC-SHA256: [32 bytes of cryptographic hash]
Authentication Tags are the modern approach, built into AEAD (Authenticated Encryption with Associated Data) algorithms like AES-GCM or ChaCha20-Poly1305. The tag is generated during encryption and verified during decryption, providing encryption, integrity, and authentication in one operation:
Plaintext: [0x01, 0x02, 0x03, 0x04]
After AES-GCM: [encrypted_data] + [16-byte auth tag]
When should you use each one? If you just need to catch accidental corruption over a reliable channel, a CRC is fine. But for any security-sensitive application (which is most IoT), you need cryptographic protection. Modern best practice is to use AEAD modes that give you everything in one package.
Encrypting the Payload: Keeping Secrets in IoT
In IoT, encryption isn't optional - it's essential. Your devices are broadcasting data over radio waves (BLE, WiFi, LoRa) that anyone nearby can intercept. Without encryption, attackers can eavesdrop on sensor data, replay commands, or inject malicious packets.
Modern IoT encryption typically uses one of these approaches:
AES-CBC + HMAC (the older, two-step approach): First encrypt with AES in CBC mode, then add an HMAC for authentication. This works but requires careful implementation to avoid vulnerabilities:
// Encrypt
ciphertext = AES_CBC_Encrypt(key1, iv, plaintext)
mac = HMAC_SHA256(key2, ciphertext)
packet = [iv][ciphertext][mac]
// Decrypt and verify
mac_check = HMAC_SHA256(key2, ciphertext)
if (mac_check == mac) {
plaintext = AES_CBC_Decrypt(key1, iv, ciphertext)
}
AES-GCM (modern AEAD approach): Handles encryption and authentication in one operation, reducing the chance of implementation errors:
// Encrypt with authentication
[ciphertext, auth_tag] = AES_GCM_Encrypt(key, nonce, plaintext, associated_data)
packet = [nonce][associated_data][ciphertext][auth_tag]
// Decrypt and verify in one step
plaintext = AES_GCM_Decrypt(key, nonce, ciphertext, auth_tag, associated_data)
// Throws exception if authentication fails
ChaCha20-Poly1305 (mobile and IoT friendly): Faster than AES on devices without hardware AES acceleration, making it perfect for battery-powered IoT devices:
[ciphertext, tag] = ChaCha20_Poly1305_Encrypt(key, nonce, plaintext)
packet = [header][nonce][ciphertext][tag]
The encrypted packet structure typically looks like this:
[Header (plaintext)][Nonce (plaintext)][Encrypted Payload][Auth Tag]
The header and nonce remain unencrypted so the receiver knows how to decrypt, but they're still authenticated by the tag to prevent tampering.
Putting It All Together: Example Packet in Hex
Let's walk through a complete example packet, byte by byte, to see how all these concepts combine in practice. Imagine this is a command from your mobile app to a smart lock:
7E 00 0A 01 12 34 AB CD 00 01 DE AD BE EF 5A 7E
Let's decode this:
7E - Start delimiter (beginning of packet)
00 0A - Length field (10 bytes following, big-endian)
01 - Command type (0x01 = lock command)
12 34 AB CD - Nonce (replay protection)
00 01 - Sequence number (packet #1, big-endian)
DE AD - Encrypted payload (actual lock parameters)
BE EF - Authentication tag (cryptographic verification)
5A - CRC8 of entire packet
7E - End delimiter (end of packet)
When your mobile app constructs this packet, it follows these steps:
First, it generates a random nonce (0x1234ABCD) to ensure this packet is unique. Then it increments its sequence counter to get 0x0001. The actual command data gets encrypted with the shared key, producing the ciphertext 0xDEAD and authentication tag 0xBEEF. All of this gets wrapped with delimiters and a CRC for additional error detection.
When the smart lock receives this packet, it reverses the process. It finds the start delimiter, reads the length to know how many bytes to expect, checks the CRC for transmission errors, verifies the nonce hasn't been seen before, checks the sequence number is in order, decrypts and authenticates the payload using the tag, and finally executes the lock command.
This might seem like a lot of overhead for a simple command, but each component serves a purpose. The delimiters help resynchronize if bytes are lost, the length field handles variable-sized commands, the nonce prevents replay attacks, the sequence number detects missing packets, the encryption keeps your commands private, the authentication tag prevents tampering, and the CRC catches any corruption that might slip through.
Conclusion
When you started reading this, byte arrays might have looked like mysterious sequences of numbers. Now you can see them as structured conversations between devices, complete with their own grammar and syntax rules.
Mobile engineers often see arrays of numbers where firmware engineers see structured messages with headers, payloads, and checksums. This difference in perspective can lead to frustrating debugging sessions and communication breakdowns between teams. But once you understand protocol framing - the delimiters that mark boundaries, the length fields that size messages, the nonces that prevent replays, and the authentication tags that ensure integrity - you become bilingual in the language of IoT communication.
The next time you're working with a firmware engineer on an IoT project, you'll speak the same language. When they mention "big-endian length fields" or "AEAD encryption with nonce rotation," you'll know exactly what they mean and why it matters. When something goes wrong (and it will), you'll be able to capture packets, decode them byte by byte, and pinpoint whether it's an endianness issue, a missing delimiter, or an authentication failure.
Protocol framing is the bridge between the mobile and embedded worlds. It transforms raw bytes into reliable, secure communication channels. Whether you're building a fitness tracker, a smart home system, or an industrial IoT solution, these concepts remain the foundation of device communication.
Remember, every successful IoT product is built on clear communication, not just between devices, but between the engineers who build them. Understanding protocol framing makes you a more effective collaborator and a more capable IoT developer. The packets flowing between your app and those devices aren't just data anymore; they're conversations you can understand, debug, and improve.





