JSON Compression by Rotating Data 90°

A 'simple' technique that provides impressive compression results.

Posted by Malcolm Hollingsworth on

This has been updated to include Gzip calculations which are shown at the end of the article.

I have been researching various ways to make small changes to how JSON is sent using APIs without introducing in-line or substitution compression variants.

I have worked out a simple technique to provide huge lossless compression ratios when providing JSON objects.

The Basics

With a standard API return object you often send multiple records with the same key names. This is essentially extra overhead as you must repeat this for as many results you are going to include in the object array.

What if we could provide the same information but in a different way that no longer includes a key name for every key value within an object array but kept it just as accessible?

[
    { "firstname": "Tim", "lastname": "Cook", "company": "Apple" },
    { "firstname": "Satya", "lastname": "Nadella", "company": "Microsoft" },
    { "firstname": "Sundar", "lastname": "Pichai", "company": "Google" }
]
Example JSON with 3 object arrays - 225 bytes
{
    "firstname": [ "Tim", "Satya", "Sundar" ],
    "lastname": [ "Cook", "Nadella", "Pichai" ],
    "company": [ "Apple", "Microsoft", "Apple" ]
}
Converted version of the same data - 152 bytes

Results

3 record comparison
Original225 bytes
Compressed152 bytes
Saving73 bytes
Percentage32%

To test the technique with more data, I duplicated the existing data 5 times to give our new sample set of data 15 records.

If we then repeat the process - the results are even more impressive.

15 record comparison
Original1,113 bytes
Compressed496 bytes
Saving617 bytes
Percentage55%

Accessing the data

You can access the data almost exactly as you did before, simply shift the position of the index look-up. Example; a simple full name alert:

alert(data[0].firstname + ' ' + data[0].lastname);
Original
alert(data.firstname[0] + ' ' + data.lastname[0]);
Compressed

Both of these examples provide the same information but only require minor changes to the data structure at both ends.

In order to loop the data you need to find the length - this is also slightly different:

var rows = data.length;
Original
var rows = data.firstname.length;
Compressed

Tip: you can query any root node as each will return the same length answer.

Further Compression Options

There are other common ways to compress data:

  • White-space removal
    Obvious, but not always followed
  • Shortening key names
    Less used but very effective
    • firstname › f
    • lastname › l
    • company › c

Bandwidth reduction requires suitable time to plan and manage, but can be very effective especially when you include this new technique.

JSON savings when combined with other techniques:

MethodRawWhite SpaceShorteningBoth
Original1,113 bytes916 bytes798 bytes601 bytes
Compressed496 bytes425 bytes475 bytes404 bytes
Saving617 bytes491 bytes323 bytes197 bytes
Percentage55%54%40%33%
Keep in mind all these tests are like for like.

If we now look at our original source data compared to our compressed version using; JSON, white space removal and node name shortening then we end up with a 46% saving overall for just 15 records.

All Techniques Combined

Final savings
Original1,113 bytes
Compressed404 bytes
Saving709 bytes
Percentage64%

A total saving of 64%, not bad considering all the data is as perfectly readable as it was to start with.

If you only used the white-space and key name shortening you could save 46%. But when you can gain an additional 28% compression with little effort - why not.

One word of caution, this works best with object arrays that have consistent node patterns. You can provide a null for a key value not provided in the original object of course. But keep in mind that too much of this will reduce the benefits of this technique.

How does this compare with GZIP?

If we add in the benefits of GZIP are the benefits of my technique lost or improved on?

After GZip the original file was squeezed down to just 132 bytes, however GZip took my technique down to just 99 bytes which saves a further 33 bytes.

This is even more impressive as using my technique with GZip reduces the API bandwidth another 25%.

Final savings & GZip
Original1,113 bytes
All compression99 bytes
Saving1,014 bytes
Percentage91%

Source Data

[
    { "firstname": "Tim", "lastname": "Cook", "company": "Apple" },
    { "firstname": "Satya", "lastname": "Nadella", "company": "Microsoft" },
    { "firstname": "Sundar", "lastname": "Pichai", "company": "Google" },
    { "firstname": "Tim", "lastname": "Cook", "company": "Apple" },
    { "firstname": "Satya", "lastname": "Nadella", "company": "Microsoft" },
    { "firstname": "Sundar", "lastname": "Pichai", "company": "Google" },
    { "firstname": "Tim", "lastname": "Cook", "company": "Apple" },
    { "firstname": "Satya", "lastname": "Nadella", "company": "Microsoft" },
    { "firstname": "Sundar", "lastname": "Pichai", "company": "Google" },
    { "firstname": "Tim", "lastname": "Cook", "company": "Apple" },
    { "firstname": "Satya", "lastname": "Nadella", "company": "Microsoft" },
    { "firstname": "Sundar", "lastname": "Pichai", "company": "Google" },
    { "firstname": "Tim", "lastname": "Cook", "company": "Apple" },
    { "firstname": "Satya", "lastname": "Nadella", "company": "Microsoft" },
    { "firstname": "Sundar", "lastname": "Pichai", "company": "Google" }
]
Original data-set of 15 records.
{"f":["Tim","Satya","Sundar","Tim","Satya","Sundar","Tim","Satya","Sundar","Tim","Satya","Sundar","Tim","Satya","Sundar"],"l":["Cook","Nadella","Pichai","Cook","Nadella","Pichai","Cook","Nadella","Pichai","Cook","Nadella","Pichai","Cook","Nadella","Pichai"],"c":["Apple","Microsoft","Apple","Apple","Microsoft","Apple","Apple","Microsoft","Apple","Apple","Microsoft","Apple","Apple","Microsoft","Apple"]}
Compression using my rotation technique plus white-space and key name shortening.