CS: Data Compression

CS: Data Compression

It was not too long ago that if you purchased a computer or phone, a primary concern was the amount of memory or space available on the device. This is still a factor in purchasing a device, especially for mobile devices, and if you plan on storing a lot of media. Now that so much data and information comes from the web, developers have to constantly think of how they can keep the data usage size down so all of the memory on a device is used up or a lot of expensive data use charges incurred.

Information can be compressed a couple of ways. For text, you can use acronyms like 3D or DNA or abbreviations like TTYL, assoc, orig, or gov. Search for Arecibo message to see an attempt to communicate crucial pieces of human culture and information in a very small amount of space. You might have heard the phrase, A picture is worth a thousand words and images can truly convey a significant amount of information about a complex scene or emotion.

You use images in web pages, social network posts, and messages. Often, when you do that, it requires an upload and download of that information. Depending on the size and quality of the image or video, this can lead to a long time of waiting for an image and potentially high data costs. When developing software that uses media, such as text, images, video, decisions are made about how much quality can be offered for the amount of data recipients are willing to download or stream.

An image is represented on a screen as a collection of pixels. The pixels of an image are bits of data and to reduce the memory of the image, a bitmask might be used. Below, you should see two images that look identical and have a number indicating how many bytes (the amount of memory) the image would take up if you were to download this image to a device. In fact, these two images are identical because nothing has been done to reduce the number of bits represented in the image.



A bitmask applies some logic to the bits of a pixel. This activity uses an & (and) bitmask which will force a bit to be 1 only if both bits are 1. Change the bitmask on line 26 from 0xFFFFFFFF to 0xFF000000 and press play Pencil Code UI Play Button to run the code again.

Note: the first FF refer to the bits for the pixel's opacity and can be ignored for this activity to leave it fully opaque so you can see the resulting image. The prefix 0x indicates the number is a hexadecimal (base16).

Background Information

Representing pixels as binary and hexidecimal numbers

An image is represented on a screen as a collection of pixels, each of which can be represented in many ways. Each pixel contains some information about its red, green, blue, and opacity (how transparent an image is) values. Here are a few ways of representing the color data for a pixel:

  • a binary value represents the color value numerically in base2, where each binary digit (or bit) is represented either as a 0 or a 1


Red

1111 1111 0000 0000 0000 0000

Green

0000 0000 1111 1111 0000 0000

Blue

0000 0000 0000 0000 1111 1111



  • a hexadecimal (hex) value represents the color value numerically in base16, where each decimal number from 0-15 can be represented from 0 through F (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F).


Red

FF0000

Green

00FF00

Blue

0000FF



By modifying the characters, you can make all of the colors you with which you are familiar. When you are changing the color on a website you can alter the color value of the background or perhaps the text. If you have ever set the background color on a webpage to white, you likely used the hex code #FFFFFF which sets the the red, green, and blue values of each of the background pixels on the website to 100%, which results in a white color on the screen. Conversely, if you have set the text on the website to black you used #000000, which sets the red, green, and blue values of the pixel to 0%.

If you are interested in learning more about representing numbers in different base systems like base2 (binary) or base16 (hexadecimal), check out this page on Wikipedia about Hexadecimal and the Math is Fun page on number systems.


With a bitmask of 0xFF000000, the red, green and blue bits of the bitmask are set to 0. After the bitmask is applied, each bit is set to 0 and the overall results would be no color information left in the pixel as you can see in the table below. This is because for a & binary operation the result is 1 if and only if both of the inputs are 1. So 0 & 0 = 0, 0 & 1 = 0, 1 & 1 = 1. The resulting image should be essentially blank with a few bytes remaining to indicate that it is an image.


Red

1111 1111 0000 0000 0000 0000

& Bitmask

0000 0000 0000 0000 0000 0000

Result

0000 0000 0000 0000 0000 0000



If you change the bitmask back to 0xFFFFFFFF, as it was at the beginning, there would be no change to the information originally in the pixel. This is why the size of the image is the same as the original. This is what you saw in at the beginning of this activity.


Red

1111 1111 0000 0000 0000 0000

& Bitmask

1111 1111 1111 1111 1111 1111

Result

1111 1111 0000 0000 0000 0000



Now, you have seen the two extreme possibilities. When a bitmask of 0xFF000000 is used, none of the bits are able to pass through the bitmask and the result is an image with a large reduction in memory, but only by sacrificing all of it's quality and information. When the bitmask of 0xFFFFFFFF is applied, all bits are passed through and there is no reduction in memory.

Try a few other combinations on your own by changing the value of the bitmask. You are trying to find a pattern that will enable you to make a decision about what is the best tradeoff between memory size and quality. If you need help converting your binary into hexadecimal search for binary into hexadecimal converter or binary into hexadecimal chart.

Now that you have tried a few combinations, what patterns did you recognize? Did you find a bitmask that kept sufficient quality while reducing the size of the image by a significant amount?

Here are some combinations you might want to try. Keep track of the resulting memory and rate the quality on a scale of 1-10. After you make a change to the bitmask value. press Play Pencil Code UI Play Button to run the code again.


Bitmask value (base16)

Memory Size (bytes)

Quality (1-10)

0xFFFFFFFF

142845 (original size)

10

0xFF000000

1389

1

0xFFAAAAAA



0xFFCCCCCC



0xFFEEEEEE



0xFFA0A0A0



0xFFC0C0C0



0xFFF0F0F0





Which of the bitmask values did you prefer or did you find another value that you liked even more?

While the quality might subjectively vary among users, what patterns did you recognize as you tried out each of the bitmasks mentioned in the table? Come up with a generalization or abstraction of the patterns you found so you could explain to someone else how he or she could reduce the amount of information while maintaining a certain level of quality?

If you are developing software for sending photos to your friends, you might have different requirements for quality than if you were developing photo editing software. Change the URL for picture on line 2 to other types of photos to see whether this technique applies to high resolution photos, as well as black and white bitmaps. Understanding these general principles enables you to develop an algorithm and fine tune it to best suits your needs.

In this activity you applied computational thinking to look for patterns in memory and quality after applying the bitmask. These patterns helped you determine which values you prefer and made it possible to develop a general rule for applying bitmasks to reduce the memory of an image. If you are interested in exploring this topic further you could search the Internet for Arecibo message, bitmasks, bitmaps, color quantization, posterization, dither, media computation.

CS Unplugged has a text compression activity as well. Potential standards this activity could align with if used with students.