[Next] [Up] [Previous] [Index]

Steganography

The preceding section described a way of preparing the output from encipherment for transmission. There, the purpose was simply to allow the text to be transmitted properly.

Another form of ciphertext preparation, which can be considered an encipherment method in itself, is designed to conceal another secret: that there even is an encrypted message being stored or transmitted.

Many forms of steganography were devised in the era of paper-and-pencil encryption.

The first method listed above can lead to a stilted cover message that obviously has something wrong with it. Thus, improved forms of this method that involve concealing something subtler than a letter of the message in each word of the cover message have been devised.

As we saw in the section on fractionation, one can encode the letters of the alphabet as follows:

A 111    D 121    G 131    J 211    M 221    P 231    S 311    V 321    Y 331
B 112    E 122    H 132    K 212    N 222    Q 232    T 312    W 322    Z 332
C 113    F 123    I 133    L 213    O 223    R 233    U 313    X 323

and so convert a message composed of letters into one composed only of the digits 1 through 3. Then, a word of 1 or 4 or 7 syllables could stand for 1, a word of 2 or 5 syllables for 2, a word of 3 or 6 syllables for 3. This system, among others, is discussed in Helen Fouché Gaines' [Elementary] Cryptanalysis, which begins by discussing such systems, continues with transposition ciphers, and only afterwards introduces substitution ciphers. This is, in a way, a logical order for introducing the subject, since with each step the letters of the original message become less accessible. (But on the other hand, the case of simple substitution, because it is so familiar, is one that is attractive to discuss first.)

And, of course, a letter can also be converted into five digits which are all either 0 or 1. Long before Èmile Baudot, using a and b as his two symbols, Sir Francis Bacon discussed such a representation of the alphabet specifically for the purpose of steganography.

If we're discussing binary coding, we may as well address the computer era.

The fact that images can be usefully subjected to lossy compression methods has suggested that extra information could be concealed in them.

One popular format for storing pictures on a computer involves first determining 256 colors, each one expressed as three one-byte values for each of its Red, Green, and Blue components, which are optimal for rendering the image with as little change as possible, and then providing first a table of the 256 colors used, and then the image with one byte indicating the color from that table to be used for each pixel. The image in that form is compressed using a form of lossless compression.

A popular steganography program conceals a bit in every pixel by ordering the 256 colors so that similar colors are adjacent to each other in the table, and then modifying each pixel so that the least significant bit of the color code for that pixel has the right value.

For other forms of image encoding, one might wish to place one's message more directly in the image itself. One way of doing this might be as follows:

Compress the image using a lossy technique such as JPEG. Then, modify the original image so that the differences between it and the compressed image, on a pixel-by-pixel basis, contain the message.

Since the idea will be to send only the modified image, it will be necessary to check that the modified image, when compressed by the same technique, produces a compressed form which has the same differences from the modified image as the compressed form of the original image did. This requires a coding which does not disturb the compressed image excessively, and it requires a coding which is flexible, so that different ways of representing the hidden message are possible.

An example of the kind of coding that might work is noted below. 0 represents an unchanged pixel, + one that is made brighter, - one that is made less bright.

00000000 null     0000-00+ 00110    000-00+0 01110
                  0000+00- 01111    000+00-0 01111
000000-+ 00000    0000-0+0 01000    000-0+00 10000
000000+- 00001    0000+0-0 01001    000+0-00 10001
00000-0+ 00010    0000-+00 01010    000-+000 10010
00000+0- 00011    0000+-00 01011    000+-000 10011
00000-+0 00100    000-000+ 01100    00-0000+ 10100
00000+-0 00101    000+000- 01101    00+0000- 10101

0000--++ 00000000    000-0-++ 00000110
0000-+-+ 00000001    000-0+-+ 00000111
0000-++- 00000010    000-0++- 00001000
0000+--+ 00000011    000+0--+ 00001001
0000+-+- 00000100    000+0-+- 00001010
0000++-- 00000101    000+0+-- 00001100

The table is not complete. First, one can choose to leave all pixels undisturbed if necessary, and in that case one skips that group of eight pixels without encoding any data. If one can alter two pixels, and only combinations where the net alteration balances out are shown, then there are 56 possibilities, so five bits can be coded. If one can alter four pixels without altering the compressed form of the image, then there are 420 possibilities, so eight bits can be coded.

If JPEG compression does not work well to form a baseline image, one can simplify calculations by using a very simple lossy compression algorithm: averaging adjacent pixels in pairs.

Of the RGB components of an image, the Green component contributes the most to the brightness, and the Blue component the least, so using the blue part of the pixel to encode information makes sense.

Note that, due to the current interest in watermarking images, it is entirely likely that the technique described above may be covered by someone's patent at this time.


[Next] [Up] [Previous] [Index]

Table of Contents
Home Page