gyx1000.dev

How flatbuffers saved my day

Context

When I was working for a company providing TV services, we were asked to provide an electronic program guide (EPG) to our customers for different media (Web, Set-Top-Box, Mobile). The EPG allowed users to browse a 14-day show list and over 300 TV channels using a 2D grid. The entire program was generated once a day, so there were no temporal changes during the day.

Example of data
{
 "channel_idx1": [
  {
   "id": "SHOW0001",
   "start": "2024-01-06T09:10:00.000Z",
   "stop": "2024-01-06T09:30:00.000Z",
   "display": "Les bisounours en vacances avec Alf"
  }
 ],
 "channel_idx2": [...],
 ...
}

Issues

1. Data volume

Transmitting 14 days of data for over 300 channels was unthinkable. The memory required and the waiting time to receive the data were problematic.

My first approach (Brutforce) was to separate the timeline into 4-hour chunks in JSON format. This drastically reduced the amount of data to be processed. Depending on the size of the user's screen, the client could request, say, 3 chunks to fill the entire grid. As the user navigated through the grid, the client would simply request the missing chunks.

2. Parsing the response

During the client implementation, the UX was really unsatisfactory. Backend response times were fine (chunks could be cached overnight). The problem lay in the slow JSON parsing. Even if this work was done on another thread, the waiting time was not pleasant.

Solution

To avoid this unnecessary and slow parsing work, I needed a "zero-copy deserialization" solution ... Flatbuffers

Schema

I soon ran into a problem with flatbuffers. Unlike JSON, where I could make a channel->shows[] relationship, flatbuffers didn't allow this type of data structure. To overcome this, I decided to create the schema as follows:

Flatbuffers tables
namespace EPG;
 
table Show {
  id: string;
  start: string;
  end: string;
  display: string;
}
 
table ChannelShows {
  shows: [Show];
}
 
table EPGChunk {
  start: string;
  stop: string;
  channels: [string];
  channelshows: [ChannelShows];
}
 
root_type EPGChunk;

A chunk contains two lists.

  1. channels where all the distinct identifiers of the channels with a show in this chunk are present.
  2. channelshows contains the shows list per channel list. This list must be ordered in the same way as channels.

This data structure means that, before loading buffers, the client must know the list of all the user's channels. When a buffer is loaded, I've created a mapping table for each chunk. This is indeed extra work, but it's nothing compared to JSON parsing.

Chunk post processing (client-side)
 interface Chunk {
   fb: EPGChunk;
   channelsMapping: Record<number, string>;
 }
 
  // buffer is the backend (or cache) response
 function onChunkDataLoaded(buffer: ArrayBuffer) {
   const bytes = new Uint8Array(buffer);
   const fbBuffer = new flatbuffers.ByteBuffer(bytes);
   const chunk = {
     fb: EPGChunk.getRootAsEPGChunk(fbBuffer),
     channelsMapping: {}
   }
 
   for(let i=0; i<chunk.fb.channelsLength(); ++i) {
     chunk.channelsMapping[i] = chunk.fb.channels(i);
   }
 
   return chunk;
 }

Now, from the grid draw, simply iterate over the loaded and visibles chunks and place them in relation to the linked channel.

Sample draw call when chunks changed
 function draw() {
   visiblesChunks.forEach(chunk: Chunk => {
     for (let i=0; i<chunk.fb.channelshowsLength(); ++i) {
       const channelId = chunk.channelsMapping[i];
       for (let j=0; j<chunk.fb.channelshows(i).showsLength(); ++j) {
         drawShowOnChannel(
           chunk.fb.channelshows(i).shows(j),
           channelId
         );
       }
     }
   });
 }

Conclusion

As flatbuffers is ported to different languages, we've been able to use the mechanism in our javascript, swift and kotlin developments.

This post was my very first and I hope it was a pleasant read. Feel free to give me feedback on twitter or github.