Major Technique: Compression
Version   04/12/99 18:11 - 7
How can you fit a quart of program and data into a pint pot of memory?  
* The memory requirements of the code and data exceed the memory space in the system primary memory, secondary storage, read-only memory, or some combination.
* You need to transmit information across a communications link, and the memory requirements of the information exceed the capacity of the link.
* You cannot THINK SMALL and delete some of the data or code 
* You cannot choose SUITABLE DATA STRUCTURES to reduce the memory requirements further
Sometimes, when you're programming for a small system, you just don't have enough memory to go around.  Most often, the program needs to store more data than the space available, but sometimes the executable code is too large to be stored, especially if there's a lot of program data too.
The other chapters in the book consider ways to reduce our program's requirements for main memory. For example, choosing a suitable DATA STRUCTURE ensures that the right amount of memory is allocated to store the data; SECONDARY STORAGE and READ-ONLY STORAGE move the data out of RAM memory and store it elsewhere. These techniques have one main limitation: they don't reduce the total amount of storage - of whatever kind - needed to store the underlying data.  Rather, they tailor memory allocation to reduce the amount of memory allocated in the program, or displace information that cannot be stored in main memory to secondary storage.  Tailoring memory allocation doesn't help if there really isn't enough memory to go around, and displacement may simply move the space problems from primary to secondary storage.
For example, the Strap-It-On wrist-mounted PC needs to store the data for the documents the user is working on.  It also needs sound files recorded by the internal microphone, data traces from optional body well-being monitors, and a large amount of executable code downloaded by the user to support "optional applications" (Tetris, Space Invaders, Qix, and Hunt-the-Wumpus).  This information exceeds the capacity of the Strap-It-On's primary memory, and will sorely test its secondary memory.  
In fact, no matter how much memory a system has, we can guarantee there will be users who need more.  Extra storage is expensive, so we're always under pressure to use what we have as effectively as possible. 
Therefore:  Use compression to reduce the memory required to store the information.
Store the information in a compressed form and decompress when you need to access it.  There are a wide variety of compression algorithms and approaches you can choose from, each with different space and time trade-offs.  In particular there are smaller ad-hoc techniques such as run length encoding and byte codes, and larger techniques such as adaptive compression algorithms.
For example, the Strap-It-On PC uses a variety of compression techniques to reduce its memory requirements. The optional applications are stored as bytecodes, the sound files and data traces are stored using sequence code, and document texts are stored using string compression.  The device drivers for Strap-It-On's secondary storage use adaptive file compression algorithms to compress all of the user's data files.
Consequences
Your memory requirements decrease because some code or data is compressed.
However:  the compression scheme will make the program more complex, requiring programmer effort to implement and discipline to use.  
The decompression process reduces time performance and may require extra temporary memory, increasing the possibilities for failure.  
The compression process also takes time, although this time can often be spent when the program is written rather than at run-time.
Compressed information may more difficult to process from within the program.  Some compression techniques permit only sequential access to the compressed information, so they cannot be used if the information must be accessed randomly.  Even purely sequential access is impractical with some more extreme compression algorithms, so the compressed information must be decompressed before it can be accessed, requiring enough temporary memory to store all the decompressed information, in addition to the temporary memory needed by the decompression algorithm,
Implementation
The key idea behind compression is that most program data contains a large amount of redundancy - information that is included in the data set but which is not strictly required.  Character sets commonly used to encode Western languages are classic examples.  The ASCII code defines around 100 printable characters, all of which could be stored in seven bits, but most text files use eight-bit bytes to facilitate processing on eight-bit machines.  If the eight-bit bytes could be replaced by seven-bit bytes, this would reduce the number of bits needed to store a text string.  The amount of compression is expressed as the compression ratio - the compressed size divided by the decompressed size.  Storing characters in seven bit-bytes rather than eight-bit bytes would give a compression ration of 7/8 or 87.5%.
Techniques that ensure that the result of decompressing some compressed information is exactly the same as the original information (before compression) are known as lossless compression techniques.  Lossless compression works by removing redundancy; either removing it altogether, as above, or by storing redundant information once, and replacing further copies with references to the previously stored version.
Lossy compression techniques are sometimes suitable alternatives.  Lossy compression produces an approximation to the original information, rather than an exact copy.  So it is important that the approximations do not affect the use to which the decompressed data will be put.  We can think of lossy compression as making a generalisation of the original input file to increase its redundancy, then compressing the generalised version.  The compression patterns in this chapter are lossless; however, several can be adapted to provide lossy compression when it is required.
Specialised Patterns
The rest of this chapter contains six specialised patterns that describe a range of compression techniques.  The patterns form a sequence starting with simple patterns which can be implemented without too much programmer effort or time or space costs and progressing to more complicated patterns.  Each of these patterns removes different kinds of redundancy from programs or data, in different ways, and with different consequences for accessing the compressed data.

Figure 1: Compression Patterns

The patterns are as follows:
* STRING COMPRESSION reduces the number of bits used to store each character of a string of text.  Most character encodings are partially redundant, because they can store many more characters than are really needed. For example, English text requires only about seventy characters, but commonly used character sets support 120, 256, or even sixty-five thousand characters.  Reducing the number of bits used to store common characters reduces the memory required for storing the whole text.
* SEQUENCE COMPRESSION is similar to string compression, but addresses data series or sequences, rather than text strings. Rather that storing the absolute value of each data item, Sequence Compression stores deltas - differences between the current item and the previous item.  If the values of the series of data items are similar across the series, storing relative item values will require less memory than the storing the absolute value of each item.  Furthermore, if many items have exactly the same absolute value, storing the value repeatedly is redundant, so sequence compression stores a count of the number of items.
* FILE COMPRESSION can store large amounts of bulk data very efficiently.  File compression is based on adaptive compression algorithms that find common sequences across entire data sets, and then store these sequences only once, typically working at the level of individual bits to remove many kinds of redundancy.  File compression algorithms need significant programmer effort to implement (and high levels of specialised skills to implement well), however many effective adaptive compression algorithms are available in commercial or open source libraries.  Compared with other compression techniques, adaptive compression algorithms have high processor time and temporary memory costs.
* BYTE CODING can store program code in a form that is still executable. Real machine codes include a large amount of redundancy to ensure that they can be efficiently decoded and implemented in hardware.  Bytecodes are instruction sets designed to be executed in software, and to minimise memory requirements.  Typically you use an interpreter to execute Byte codes, although some environments translate them to machine code, an approach which takes much more programmer effort.  Bytecodes can also increase the portability of software, because the interpreter insulates the program from the architecture of the underlying machine - any machine with a suitable interpreter can execute the software.
The figures below contrast these patterns in two different ways.  One shows typical memory gains you might from using each pattern as against the amount of programming required.  The other contrasts the local memory required to decode the compression as against the compression achieved.  As they show, you get the greatest benefits from using File Compression, but the run-time overhead required for each use often makes it unsuitable in practice.

See Also
Rather than (or as well as) compressing information, you may be able to store it in SECONDARY STORAGE.  On the other hand, THINKING SMALL, making a MEMORY BUDGET, and MAKING THE USER WORRY may make your program small enough that you don't need to use compression.
Known Uses
Compression is used widely through the industry.  Operating systems use compression to store more information on secondary storage, communications protocols use compression to transmit information more quickly, and programming languages use compression to reduce the memory requirements of programs.
______________________________
String Compression Pattern
Also know as: Nibble Coding, Character Coding.
How do you reduce the memory taken up by static text strings in a program?
* You have lots of small-to-medium sized strings in your program - all different
* You need to reduce your program's memory requirements.
* You need random access to individual strings.
* You don't want to expend too much extra programmer effort, memory space, or processing time on managing the strings.
Many programs need lots of strings - to display information or messages to the user, to supply keys or query expressions for databases, to describe fields or attributes in self-describing data formats, or to retrieve resources for window systems. All these strings can take up significant amounts of memory, increasing the program's memory requirements.
Programs need to be able to perform typical string operations such as determining their length and initial characters, concatenating strings, or substituting parameter strings into constant format strings.  Each string in a collection of strings needs to be individually accessible.
Strings can cause problems even when not stored in main memory.  Strings stored in READ-ONLY MEMORY or SECONDARY STORAGE can also occupy too much space, especially when the ROM or Secondary Storage capacity is limited, or is needed to store other parts of the program or data.
Finally, although storing strings is important, it is seldom the most significant memory use in the system.  Typically, you don't want to put to much programmer effort into the problem. 
One tempting approach to this problem is to apply a large-scale FILE COMPRESSION algorithm to all the strings in the program; you could use adaptive text compression algorithms, which are well known, available, and provide good compression ratios.  Unfortunately file compression algorithms are seldom appropriate for program strings.  First, they can only access the compressed data sequentially, so if all the strings are compressed together, all must be decompressed together before a particular string can be retrieved.  Second, few file compression algorithms work well on short to medium length strings. File compression algorithms usually need to store auxiliary information to guide decoding along with the compressed data, so it is infeasible to use them to compress many small strings individually.  Third, file compression algorithms typically require a reasonable amount of processing time and memory buffer space to execute. 
For example, the Strap-It-On PC needs to store and display a large number of information and error messages to the user.  The messages need to be stored in scarce main memory or read-only memory - and there isn't really enough space to store all the strings directly.  The programs must access each string individually, to show its message to the user.  Given that many of the strings describe exceptional situations such a memory shortage, they need to be able to be retrieved and displayed quickly, efficiently, and without requiring extra memory.
Therefore: store the strings using a compact encoding for each character.
They key to string compression is that there is a lot of redundancy implicit in the design of "standard" character set encodings.  First, each character is always stored in the same number of bits (typically six, seven, or eight).  This represents an underlying assumption that each character is equally likely to appear in a string. This is not the case: in most Roman languages, for example, some letters (such as "e" and "a" and "t") appear much more frequently than others (such as "z" or "q").  Figure 1 below, for example, shows the frequency of lowercase letters used in this chapter.  Second, there is an even greater difference in frequency between characters and letters - capitals are used much less frequently than lower-case letters, and symbols such as "|" or "~" appear even less frequently than letters in most written texts.  Finally, all characters are often stored in more bits than they need - for example, large amounts of text are coded in standard ASCII, which supports only 127 characters and control codes and was designed to be stored in seven bits. These seven-bit characters are routinely padded by 15% to fit into the eight-bit bytes that can be processed conveniently by modern CPUs.

Figure 2 Distribution of Characters in this Chapter
We can reduce the memory requirements for strings by removing the redundancy from the string encoding.  The most important technique is to reduce the number of bits stored per character, typically down to six, five, or even four bits per character.  Since the reduced bit sizes only allow you to store 64, 32, or 16 distinct characters, some character codes are represented using special escape codes that change the interpretation of the following codes.
 Figure 3 Character distribution, sorted by frequency
For example, figure 3 shows what a significant proportion of characters are one of the most common 15 characters (those to the right of character 'u').  So if we encode the characters such that these most common 15 characters take less bits, and the least common take more, that will give us a significant compression.
Consequences
Typically you get a reasonable compression for the strings themselves (a compression ratio of 70-75% of the original size), reducing the program's memory requirements.
Most operations on compressed strings execute almost as fast as operations on native strings, preserving time performance. 
String compression is quite easy to implement, so it does not take much programmer effort.
Each string in a collection of compressed strings can be accessed individually, without decompressing all proceeding strings.
However: the total compression of the program data isn't high, so the program's memory requirements are not greatly reduced.
String operations that rely on random access to the characters in the string may execute up to an order of magnitude slower than the same operations on decompressed strings.  Because characters may have variable lengths (see below), you can only access a specific character by scanning from the start of the string.  It's even more complicated to implement operations that change the characters in the string; in practice the only realistic option is to uncompress the string and do the operations on that.
The compressed strings are more difficult to compile than strings in standard encodings, especially when they are part of the programs literal text. Compressed strings require either manual encoding or a string pre-processing pass, either of which increases complexity.
You have to test the compressed string operations, but these tests are quite straightforward.
Implementation 
There are three main techniques used to compress strings - reducing the number of bits used to store each character, using escape codes to switch between different character encodings, and storing variable length strings. 
Reducing the number of bits per character.  
If the underlying character set has only 128 characters, it certainly makes sense to store each character in seven bits, rather than eight, sixteen, or thirty-two bits.  But most text strings in programs use far less than 127 characters - for a start, the first 32 ASCII characters are unprintable control codes.  Many text strings in programs require only upper and lower-case letters, numbers, and a few punctuation characters - less that seventy characters - just slightly over what you can store in six bits.
Escape Codes
Escape code changes character encodings.  You can reduce the number of bits per character still further by using more than one encoding from stored codes to characters, and then allocating one or more escape sequence codes to change encodings.  For example, you could store uppercase and lowercase letters and numbers in five bits per character (five bits gives 32 codes for characters) by first coding the 26 lowercase letters (using codes 0-25) and the most important punctuation characters (say space, period, comma, return in codes  26-29). The remaining two codes are used as escape sequences - meaning that the following code is to be interpreted either as an uppercase letter (code 30), or as a number or graphical symbol (code 31).
Code Character        Code Character        Code Character
0        a            300     A            310     0
1        b            301     B            311     1
2        c            302     C            312     2
10       j            3010    J            3110    @
20       t            3020    T            3120    %
Using escape codes means that characters are now coded with variable lengths - some characters (say lowercase letters) are coded in five bits, others (uppercase letters or numbers) now occupy ten bits (five for the escape code, five for the main code).  Variable length coding reduces string's memory requirements compared with a uniform seven or eight-bit code, provided that the most frequent characters are coded with the shortest codes - that is, without using the escape codes. 
For example, here's how the sentence "I'm all right Jack!" would be represented in the five-bit code.
I    '      m [space] a l  l  [space]  r  i g h t   [space] J     a c k  !
31 9 32 12 13  27     1 12 12  27      18 9 7 8 20   27     31 10 1 3 11 32 14
This encoding takes up 23 5-bit characters, or 115 bits - about 15 8-bit bytes - as against 19 8-bit bytes with a standard encoding.  Without the punctuation (which generally appears less frequently than in this example) we would need only 19 5-bit characters (storable in 12 8-bit bytes) versus 17 full 8-bit characters.
Variable length strings
Sentences and phrases in natural languages have different lengths, and most strings in programs do not confirm to an arbitrary eighty-character size limit.  Storing strings in fixed length tables will typically waste the storage between the end of the string and the end of the space allocated for it.  Choosing a suitable representation - a variable length data structure - to store the compressed strings will ensure that this memory is not wasted.
Compression can also be used to store string literals in program texts, and lossy compression can be used for larger amounts of text that will be reformatted later for display.
Lossy String Compression
Most string compression techniques are lossless - the result of decompressing their output is exactly the same as their input1.  Lossy compression is not often used for short strings, because leaving out characters would change the meaning or validity of the text.  Some text compression algorithms are lossy with respect to white space (spaces and carriage returns), however. For example, you might be storing a paragraph with extra spaces to provide full left and right justification on an eighty-column output screen, carriage returns at the end of lines, and double carriage returns at the end of paragraphs.  But a better approach would be to encode the text using two special codes: one for white space and one for the end of paragraphs.  Then the decompression code to display the text can automatically insert extra spaces for justification and carriage returns to end lines and paragraphs.
Examples
A nibble code is a string compression code where each character is coded into four bits.  A nibble code is a little easier to implement than the five-bit code described above, because a nibble is always half a byte (while a five-bit code must manage codes across byte boundaries). A basic nibble code works as follows: codes zero to fourteen represent the most common characters in the document (i.e. "etoasrinclmhdu" based on Figure 1 above) and the space character.  Code fifteen is followed by two nibbles containing the eight-bit ASCII code of the next character. In this code "All the world's a stage" requires 32 nibbles, or 16 bytes - a compression of 70% compared with using standard bytes. 
A     l l   t h e   w     o r l d '     s   a   s t a g     e
f 4 1 a a 0 2 c 1 0 f 7 7 3 6 a d f 2 7 5 0 4 0 5 2 4 f 6 7 1 f
Here is a Java class to create this nibble code.  It uses a byte array output stream to simplify creating  -the compressed string - equivalent in C++ would use an ostrstream instance.
class StringCompression  {
    protected final String NibbleChars = " etoasrinclmhdu";
    protected final int NibbleEscape = 0xf;
    protected int lastNibble;
    protected ByteArrayOutputStream outStream;

    protected byte[] encodeString(String s) {
        outStream = new ByteArrayOutputStream();
        lastNibble = -1;
        for (int i = 0; i < s.length(); i++) {
           encodeChar(s.charAt(i));
        }
        if (lastNibble != -1) { // We've one left over
            putNibble( NibbleEscape );
        }
        byte [] result = outStream.toByteArray();
        outStream = null;
        return result;
      }
The most important routine, of course, is the one to encode a specific character:
    protected void encodeChar(int c) {
        int p = NibbleChars.indexOf(c);
        if (p != -1) {
            putNibble(p);
        } else {
            putNibble(NibbleEscape);
            putNibble( c >>> 4);   
            putNibble(c & 0xf);
        }
    }
    protected void putNibble(int n) {
        if (lastNibble == -1) 
            {lastNibble = n;} 
        else {
            outStream.write((lastNibble << 4) + n); 
            lastNibble = -1;
        };
    }
}
Decoding is similar to encoding.  For convenience the decoding methods belong to the same class; they use a ByteArrayInputStream to retrieve data.
    protected ByteArrayInputStream inStream;

    protected String decodeString(byte [] s) {
        inStream = new ByteArrayInputStream(s);
        StringBuffer outString = new StringBuffer();
        lastNibble = -1;
        int charRead;
        while ((charRead = decodeChar()) != -1) {
            outString.append( (char)charRead );
        }
        return outString.toString();
      }
    protected int decodeChar() {
        int s = getNibble();
        if (s == -1) return -1;
        if (s != NibbleEscape) {
            return NibbleChars.charAt(s);
        } else {
            s = getNibble();    
            if (s == -1) return -1;
            return (s << 4) + getNibble(); }
    }
protected int getNibble() {
        int result;
        int byteRead;
        if (lastNibble == -1) {
            byteRead = inStream.read();
            if (byteRead == -1) { return -1;}
            lastNibble = byteRead & 0xff;
            result = lastNibble >>> 4;
        } else {
            result = lastNibble & 0xf; 
            lastNibble = -1; 
        }
        return result;
    }
 }
Compressed String Literals
Compressed strings are more difficult to handle when programming.  While programming languages provide string literals for normal string encodings, they do not generally support compressed strings.  String literals in languages like C, C++, and Java support escape codes (such as "\x34") which allow any numeric characters to be stored into the string. These escape codes can be used store compressed strings in standard string literals.  For example, here's a C string that stores the nibble codes for "All the world's a stage":
char* AllTheWorldsAStage = 
          "\xf4\x1a\xa0\x2c\x10\xf7\x73\x6a\xdf\x27\x50\x40\x52\x4f\x67\x1f";
You can also write a pre-processor to work through program texts, and replace standard encoded strings with compressed strings. This works particularly well when compressed strings can be written as standard string or array literals.   Alternatively, in systems that store strings in RESOURCE FILES, the resource file compiler can compress the string, and the resource file reader can decompress it.
UTF8 Encoding
To support internationalisation, an increasing number of applications do all their internal string handling using two-byte character sets - typically the UNICODE standard.  Given that most European languages the character sets require less than 128 characters, the extra byte is clearly redundant.  So for storage and transmission, many environments encode their strings using the UTF8 encoding. 
In UTF8, each UNICODE double byte is encoded into one, two or three bytes (though the standard supports further extensions).  The coding encodes the bits as follows:
UNICODE value
1st Byte
2nd Byte
3rd Byte
000000000xxxxxxx
0xxxxxxx


00000yyyyyxxxxxx
110yyyyy
10xxxxxx

Zzzzyyyyyyxxxxxx
1110zzzz
10yyyyyy
10xxxxxx

So in UTF8, standard 7-bit ASCII characters are encoded in a single byte, and most common double byte characters encode as two bytes.
Of course there is a further benefit for UTF8 - the main purpose of its design - that it is suitable for transmission down a serial connection.  A terminal receiving UTF8 characters can always determine which byte represents the start of a UNICODE character. Any UTF8 bytes with the top bits equal to "10" are always 2nd or 3rd in sequence and should be ignored unless the terminal has received the initial byte.  
This same feature also means that every character starts and ends on a byte boundary.  Thus you can identify a substring within a larger buffer of UTF8 data using just a byte offset and a length.
Known Uses
Nibble codes were widely used in versions of text adventure games for small machines [Blank+95].  Symbian's EPOC16 operating system used string compression in resource files.
Philip Gage used a similar technique to compress an entire string table [Gage97]. 
Early versions of MacWrite used string compression ubiquitously, both in main memory and in text files [Bell+90].
Slightly incompatible variants of the UTF8 encoding are used ubiquitously in Java, Plan/9, and Windows NT to store Unicode characters [Unicode/UTF8, Java-Unicode, Plan9HelloUnicodeWorld, and WinNT]. 
See Also
File Compression uses more powerful algorithms to achieve better compression ratios for large data items that do not need to be accessed randomly. Byte Codes use similar techniques to compress instructions.  Compressed strings can be stored in Resource Files in Secondary Storage or Read-Only Memory, as well as primary storage.
______________________________
Sequence Compression Pattern
Also know as: Run Length Encoding, Delta Coding, Communication Commpression
How do you reduce the memory taken up by sequences of data?
* You need to reduce your program's memory requirements
* You have large streams of data in your program 
* The data streams which will be accessed sequentially
* There are significant time or financial costs of transferring data.
Many programs need to store sequences or series of data - for example, sequential data such as audio files or animations, or time series such as stock market prices, or the sequences of values read by a sensor in the outside world. All these strings can take up significant amounts of memory, increasing the program's memory requirements.
Generally, this sort of streamed data is typically accessed sequentially, beginning at the first item and then processing each item in turn.  Random access into the middle of the data is required much less frequently than sequential access.
Because many programs need to deal with large amounts of sequential data, their memory requirements can cause problems even when they are stored in Read-Only Memory or Secondary Storage. Even if the raw memory capacity is sufficient to store the data, you may prefer to allocate it to other uses. Sequential data can also cause problems when they have to be transmitted over slow communication links
Finally, although storing the data is important, it often isn't the largest problem you have to face - gathering the data is often much more work than simply storing it.  Typically, you don't want to devote too much programmer effort, processing time, or temporary memory space to managing this data.
Aiming to keep things simple generally rules out the idea of applying File Compression algorithms to the data - although well known, adaptive file compression algorithms are complex, and can have quite large time and space overheads. Compact variable length character based encodings as used in String Compression also do not perform well for data series, because the data is typically made up of stream of multibyte quantities rather than characters, with no obvious frequency distribution of the items making up the data set.
For example, the Strap-It-On PC needs to store results collected from the Snoop-Tronic series of body wellbeing monitors.  These monitors are attached onto strategic points on the wearer's body, and regularly measure and record various physiological, psychological, psychiatric and psychotronic metrics (heartbeats, blood-sugar levels, alpha-waves, influences from the planet Gorkon, etc).  This information needs to be stored in the background while the Strap-It-On is doing other work, so the recording process cannot require much processor time or memory space.  The recording is continuous, gathering data whenever the Strap-It-On PC and Snoop-Tronic sensors are worn and the wearer is alive, so large amounts of data are recorded.  The memory requirements to store these data must be reduced somehow.
Therefore: store data items based on their differences from the proceeding items.
Continuous data sequences are rarely truly random - the recent past is often an excellent guide to the near future, because the values stored change little between adjacent items. Data series often contain runs of items with exactly the same value.  Because of this, knowing the value of one data item makes it much easier to predict the following item.  Storing the complete value of every element in a data series introduces redundancy, because it makes it just as easy to store a widely differing values (which is very unlikely) as similar values (which are quite common).  We can reduce the memory requirements for sequences by removing this redundancy from the sequence encoding, minimising the memory required to store series of similar values.  The most important technique is to reduce the number of bits stored per item by storing the differences between adjacent items or the number of repeated items, rather than the absolute value of each and every item. 
For example, the data recorded by the Snoop-Tronic monitors is very suitable for sequence compression.  Each monitor is a source of continuous data stream, and the values of adjacent items in the data stream are very close or the same for long periods of time.  The Strap-It-On PCs driver for the Snoop-Tronic sensors uses sequence compression techniques on the data streams as they arrive from each sensor, buffers the data, and stores it to secondary storage - without imposing a noticeable overhead on the performance of the system.  If the user wishes to keep the data traces for long periods, file compression can be used explicitly on the data files in secondary storage.
Consequences
Sequence compression achieves a reasonable compression ratio for data series (70-75% of the original size, or more), reducing the program's memory requirements.  Sequential operations on the compressed data can execute almost as fast as operations on native strings (depending upon the encoding you choose) preserving time performance. Sequential compression is quite easy to implement, so it does not take much programmer effort, or extra temporary memory.
However: the total compression of the program data isn't high, so the program's memory requirements are not greatly reduced.
The compressed sequences are more difficult manage than sequences of absolute data values - it particular, it is difficult to provide random access.
You have to test the compressed sequence operations, but these tests are quite straightforward.
Implementation 
There are two main techniques used to compress sequences - storing differences (delta coding) between item values, and storing a single item and the number of copies for runs of repeated items (run length encoding).
Delta Coding - Storing differences between items
Items in data series often follow each other closely.  Rather than storing the absolute value of each data item, store a delta - the difference between this item and the previous item.  This is known as delta coding. Delta coding saves memory space because deltas can be stored in smaller amounts of memory than absolute values. For example, you may be able to encode a slowly varying stream of sixteen bit values using only eight-bit delta codes.  Of course, the range of values stored in the delta code is less than the range of the absolute item values (16-bit items range from 0 to 65536, while 8 bit deltas give you +- 127).  So the difference between subsequent items can be larger than the range of delta values.  In these cases, emit an escape code (a special delta value, say -128 for an 8-bit code) and then the full absolute value of the data item.
Run Length Encoding - Storing runs of repeated items.
You can reduce the storage required for many data series still further, if the data contains many runs of repeated items with exactly the same value - this is known as run-length encoding.  Run-length encoding compresses runs by storing the value of the items in the run, and then length of the run, rather than the value of each individual item.  For example, we can extend the delta code to compress runs by always outputting a count byte and an absolute value, following the escape code. Runs of longer than four items (and less than 256 items) can be stored using the escape code, the absolute value of the repeated item, and the count.  Runs of longer than 256 items can be stored as separate runs of 256 characters, plus one more run of the remainder.
For example, here's a data sequence of sixteen-bit values:
4067 4069 4072 4062 4064 4064 4064 4064 4064 4064 4066 4066 4059 4061 4062 4061
Here's the same sequence encoded with a sequence code.  The sequence code is a byte code, with most values being read as twos-complete delta values (-127 to +127). The code -128 is an escape code which must be followed by a two byte value (high byte first) and then a one byte unsigned count (read as 0-255)
-128  15  236  1  +2  +3  -10  -128  15  233  6  +2  +0  -7  +2  +1  -1  
This encoding takes up 17 8-bit bytes - as against 16 16-bit words (or 32 bytes) for the standard representation.
Lossy Sequence Compression
Delta coding and run length encoding are lossless compression techniques. Lossy compression can achieve better compression ratios by discarding some of the fine detail of the data series - by coding sequences of slightly different values as if they were runs of one single value.  For example, in the data series above, differences within a quarter of a percent of the absolute value of the data items may not be significant in the analysis.  Quite possibly they could be due to noise in the recording sensor or the ambient temperature when the data item was recorded. A quarter of one percent of 4000 is 10 - so the entire run of data could be coded as a single run of 16 items of value 4066, since the highest and lowest values in the sequence (4072 and 4059) are well within the tolerance of 4066  10.
Resynchronisation
Sequence compression algorithms are often used for broadcast communications - serial or network connections.  In many cases, particularly with multimedia or terminals, it doesn't matter bery much if part of the sequence is lost, so long as later data can be read correctly.
So algorithms designed for this kind of data include resynchronisation information; every now and again the transmitter sends enough information that the receiver can restructure the entire data from it.
Specialist Algorithms
There are many more specialist algorithms for specific kinds of sequence data.  In particular multimedia data (sound and video) are essentially highly redundant; the eye and ear can detect only a fraction of the 'information' contained in a sound signal.  However the most common internal formats of the data (sequences of samples of bitmap or sound intensity) are not particularly well suited to algorithmic compression, and much research and effort has gone into devising algorithms to handle them.
There are large numbers of algorithms for sound compression [Audio Formats], each with its own features and devotees.  
GSM	Obtains a very high compression ratio indeed, for speech only.  It requires only 9600 baud to transmit normal spoken speech.
MPEG3	is suited for music - it gets very good compression.
LAW (MuLAW, ALAW) obtain lower compression ratios for music, but are easier to encode.
[Video compression, anyone?]
Encoding vs. Decoding
Some algorithms reduce the processing cost of decoding the data by correspondingly putting a higher load on the encoder.  This is particularly advantageous if there is only one encoding system and many decoders (common in broadcast systems).  Or if the encoder can have a much higher specification than the decoder (also common in broadcast systems).
MPEG3, for example, requires more effort to encode than decode, making it particualrly suited to its purpose of encoding sound for transmission.
Examples
The following code compresses a sequence of two-byte (Java's short) values into a sequence of bytes using both difference compression and run length encoding.   The only escape sequence it supports contains both the complete value and the sequence length; thus we can use it to represent both changes too large for delta encoding, and sequences of identical values.  The bytes of the escape sequence are as follows:
<escape> <high byte of repeated value> <low byte> <sequence count>
The encodeSequence method takes a sequence of shorts, and passes each one to the encodeShort method, which will actually encode them:
class SequenceCompression
{
    protected final int SequenceEscape =  0xff;
    protected final int MaxSequenceLength =  0xFE;
    protected short lastShort;  // the last value read
    protected short runLength;  // how many of that value received

    protected byte[] encodeSequence(short []inputSequence) {
        lastShort = 0;
        runLength = 0;

        for (int i = 0; i < inputSequence.length; i++) {
           encodeShort(inputSequence[i]);
        }
        flushSequence();
    }
The encodeShort method does most of the work.  It first checks if its argument is part of a sequence of identical values, and if so, simply increases the run length count for the sequence - if the sequence is now the maximum length that can be represented, an escape code is written.  If its argument is within the range of the delta coding ( 128 from the last value) an escape code is written if necessary, and a delta code is written.  Finally, if the argument is outside the range, an escape code is written to terminate the current run length encoded sequence if necessary. In any event, the current argument is remembered in the lastShort variable.
protected void encodeShort(short s) {
        if (s == lastShort) {
            runLength++;
            if (runLength >= MaxSequenceLength) {
                flushSequence();
            }
        } else if (Math.abs(s - lastShort) < 128 ) {
            flushSequence();
            writeEncodedByte(s - lastShort + 128);
        } else {
            flushSequence();
            runLength++;
        }
        lastShort = s;
    }
The function flushSequence() simply writes out the escape codes, if required, and resets the run length.  It is called whenever a sequence may need to be written out - whenever encodeShort detects the end of the current sequence, or that the current sequence the longest that can be represented by the run length escape code. 
    protected void flushSequence() {
        if (runLength == 0) return;
        writeEncodedByte(SequenceEscape);
        writeEncodedByte(lastShort >>> 8);
        writeEncodedByte(lastShort & 0xff);
        writeEncodedByte(runLength);
        runLength = 0;
    }

    protected void writeEncodedByte( int b ) {
    // Writes a single byte to the encoded output stream
    }
}
The corresponding decoding functions are straightforward. If an escape code is read, a run of output values is written, and if a delta code is read, a single output is written which differs from the last output value by the delta.
    protected void decodeSequence(byte [] s) {
        inStream = new ByteArrayInputStream(s);
        int byteRead;
        lastShort = 0;
        
        while( (byteRead = inStream.read()) != -1) {
            byteRead = byteRead & 0xff;
            
            if (byteRead == SequenceEscape) {
                lastShort = (short) (((inStream.read() &0xff ) << 8) +
				     (inStream.read() & 0xff)); 
                for (int c = inStream.read(); c > 0; c--) {
                    writeDecodedShort(lastShort);
                }
            } else {
                writeDecodedShort(lastShort += byteRead -128);
            }
        }
protected void writeDecodedShort(short s) {
    // Writes a single short value to the output stream
    }
Known Uses
Smalltalk provides a RunArray that compresses arrays by storing counts of repeated characters [Goldberg].
Window systems often use a form of run length encoding to compress events. For example, the X Window System can return a single compressed window exposure motion event that represents a number of smaller exposure events - the compressed event contains a count of the number of uncompressed events it represents [X Window System].
Reuters IDN system broadcasts the financial prices from virtually every financial exchange and bank in the world, aiming - and almost always succeeding - in transmitting every update to every interested subscriber in under a second.  To make this possible, IDN represents each 'instrument' as a logical data structure identified by a unique name (Reuters Identification Code); when the contents of the instrument (prices, trading volume etc.) change, IDN transmits only the changes.  To save expensive satellite bandwidth further, these changes are transmitted in binary form, using a carefully-tailored compression algorithm.  However to ensure synchronisation of all the Reuters systems worldwide, the system also transmits a background 'refreshing' stream of complete images of every instrument.
Fax transmission protocols (xx.2 -  [fax]) use sequence compression to reduce the time taken to transmit a scanned image.
See Also
Byte coding uses similar techniques to compress machine instructions, and String and Sequence compression to compress strings and sequences. File Compression uses more powerful algorithms to achieve better compression ratios for large data items that do not need to be accessed randomly. Compressed sequences can be stored in Secondary Storage or Read-Only Memory, as well as primary storage.
______________________________
File Compression Pattern
How can you compress a large amount of bulk information?
* You have a large amount of data to store
* You have transient memory space for processing the data
* You don't have persistent memory space to store the information long term, or you need to send the data across a slow telecommunications link
* You don't generally need random access to the data 
A high proportion of the memory requirements of many programs is devoted to bulk data. Some programs have to deal with large to very large amounts of data -a year's worth of rainfall measurements in the Amazon, the composition of the soil on Mars, a high-resolution picture of the Mona Lisa.  Other programs need to store program-specific resources supposedly to enhance the user's experience - such as multimedia, images, sound, or animations.  For example, a digital book reader needs to store a large amount of text for the books to be read.  A multimedia information kiosk similarly needs to store a large amount of text, but also images, icons, sound files, maps, personalities of cute interactive cartoon characters, and so on.
Programs that are designed to handle bulk data usually have sufficient main memory to be able to process the data they need to work on at any given time.  If large amounts of bulk data are involved, they may not be able to afford enough main memory (or enough secondary storage or read only storage) to store all of the bulk data they will eventually need. For example, a book reader has enough memory to store and display individual chapters, and the kiosk has enough memory to display all the images on the screen, and to play five minutes of music.  But storing one chapter or one screen image is one thing: storing five hundred books or five thousand images is something else entirely. 
An important characteristic of this kind of bulk data is that most programs only need to process it sequentially. Random access in to the middle of a large collection of bulk data is not generally required. For example, a sound file or animation is typically played from the beginning to the end; an image is either displayed in toto or not displayed at all.  Random access to images and sounds is required for specialised editing tasks but these tasks are generally performed on special machines with more capacity than the machines used to play the sounds and images.
A similar problem arises with networked devices that need to communicate large amounts of data over a slow network link.  In these cases the problem is not the amount of primary or secondary storage needed to store the data once it arrives on the machine - if not, why would you send the data to the machine - the problem is transmitting the data in the first place.  Bulk data transmission, especially over slow links, is, of course, sequential access par excellence - every byte of the data has to be transmitted and only one byte can be transmitted at once!
For example, the latest application planned for the Strap-It-On PC is ThemePark: UK, a tourist guide being produce in conjunction with the Unfriendly Asteroid travel consultancy.  Based on existing ThemePark products, which each guide users around one park in Southern California, ThemePark: UK is planned to treat the whole of the UK as a theme park.  Unfortunately, the whole of the UK is rather larger that most US theme parks, so the application will sorely test the machine's memory. The designers are considering supplying only basic information with the application (and even that will be difficult to cram in to the machine) and then allowing the user to download more information over the StrapInOn's wireless modem.  Furthermore, to enhance the user's experience of ThemePark: UK, the marketing division have decreed that the Strap-It-On PC needs to store a ten-minute video of the founder of the company, to be shown on the 2in wrist-mounted screen, describing how the Strap-It-On saved him from getting lost outside the Guildhall on his latest trip to Cambridge.  
Therefore: use an adaptive file compression algorithm
The key idea behind file compression algorithms is that large data sets often have large amounts of redundancy, but this is often distributed throughout the whole of the file. Text files for example, have lots of repeated characters, repeated words, and repeated phrases.  Treating files purely as streams of characters can also reveal redundancy within and across words and paragraphs, and treating files as streams of bits can reveal even more redundancy. The particular details of the redundancy differ from file to file - for example, each text document will have its own peculiar repeated words and phrases.
Adaptive compression algorithms take advantage of this redundancy, by compressing the file based on the actual structure of the redundancies they detect within it - the details of the compression are tailored to the particular files being compressed.  Because they adapt to the data being coded, adaptive compression techniques provide better compression ratios than static algorithms that compress data based on a priori assumptions about their properties.  For example, a string compression algorithm may use a fixed table of character frequencies to decide which characters to code using a single nibble, and which characters to code using escape codes.  An adaptive compression algorithm would make these decisions based on the actual character frequencies of the data being compressed.
There are a number of standard adaptive compression algorithms suitable for bulk data.  Implementations of many algorithms are available publicly, either as free or open source software, or from commercial providers.  Choose one of these algorithms (preferably one for which you can get the implementation).  Compress data that is not needed immediately as it arrives. Decompress it into some transient memory when the data needs to be processed or displayed, and then delete or recompress the expanded version when processing is finished.  Use the algorithm to compress bulk data transmissions (always ensuring the receiver can handle the compression).
For example, Strap-It-On PC adopts the ZIP file compression standard. Its operating system has incorporated the standard lzip library [LZIP] and makes it available to all Strap-It-On applications.  Sound, image and other multimedia files can be automatically compressed when not in use (typically stored on flash RAM secondary storage) and are automatically decompressed when accessed.  The Strap-It-On stores Our Founder's video in the standard MPEG format, which intrinsically supports compression.
Consequences
Modern adaptive compression algorithms provide great compression ratios (reducing your memory requirements), are widely used, and are incorporated into popular industry standards for bulk data.
File compression can also reduce the secondary storage requirements for program code - for example, compressing 40 files in the MS-Windows Command directory with Zip reduced their storage requirements to about 55% of the decompressed files. 
However: File compression can require quite a large amount of processing time to compress and decompress large bulk data sets.  Some temporary memory (primary or secondary) will be necessary to store the decompressed results and to hold intermediate structures.
File Compression algorithms compress their input data in one monolithic lump.  The main advantage of adaptive file compression algorithms over simpler static algorithms is that the adaptive algorithms alter their compression behaviour based on the properties of their input. This means that file compression algorithms work sequentially, and to compress on decompress a particular portion of the data, they must have read all the proceeding data.  Static algorithms always compress the same data in the same way (regardless of the context in which the data is found), so they do not make use of the context of a particular data set.
The performance of compression algorithms can vary depending on the type of data being compressed, so you have to select your algorithm carefully. If you cannot reuse an existing implementation you will need programmer effort to code up one of these algorithms, because they are quite complex.  Some of the most important algorithms are patented, although you may able to use non-patented alternatives.
Implementation
File compression algorithms employ combinations of more simple compression techniques to provide very good compression ratios - reducing files to less than 50% of the original size [Bell].  Various adaptive compression algorithms work with individual bits, rather than bytes or nibbles, make aggressive use of variable length codes (in some cases, even coding in unary), store common portions of the data to be compressed in dictionaries that are included with the compressed files and used to reproduce the common portions during decoding. The algorithms also use more extreme versions of techniques we have seen before, such as delta coding and run encoding from Sequence Compression.  Designing efficient file compression algorithms is a very specialised task, especially as the compression algorithms must be tailored to the type of data being compressed.
For most practical uses, however, you do not need to design you own text compression algorithms, as libraries of compression algorithms are available both commercially and under various open source licences.  Sun's Java, for example, now officially includes a version of the zlib compression library, implementing the same compression algorithm as the pkzip and gzip compression utilities.  In most programs, compressing and decompressing files or blocks of data is as simple as calling the appropriate routine from one of these libraries.
Specialised Algorithms
Graphics images contain a lot of redundancy, and can be compressed as a single file.  
The GIF compression algorithm uses forms of sequence compression, in two dimensions, to encode an image.  It is particular effective with cartoons and diagrams - pictures with only large blocks of identical colours, and lines.
JPEG analyses a graphical image in terms of its waveforms (similar to sound compression algorithms), and is much better suited to photographs.
Examples
Here's an example using the zlib libraries from Java.  Java-Zlib works as streams that tack onto existing streams.  This makes it easy to compress from or to anything that can be implemented as a stream - for example a TCP/IP connection.  So, to compress some data, we open a stream on that data, and pass it through a compressing stream to an output stream on our destination.
protected static byte[] encodeSequence(byte[] inputSequence) throws IOException {
	InputStream is = new ByteArrayInputStream(inputSequence);
	ByteArrayOutputStream outputArrayStream = new ByteArrayOutputStream();
	GZIPOutputStream out = new GZIPOutputStream(outputArrayStream);
	
	byte[] buf = new byte[1024];
	int len;
	while ((len = is.read(buf)) > 0) {
	    out.write(buf, 0, len);
	}
	out.close();
	return outputArrayStream.toByteArray();
    }
In this model, decompressing is much like compressing.  This time, the compressing stream is on the reading side; but in all other respects the code is virtually the same.
protected static byte [] decodeSequence(byte [] s) throws IOException {
	GZIPInputStream is = new GZIPInputStream(new ByteArrayInputStream(s));
	ByteArrayOutputStream out = new ByteArrayOutputStream();
	
	byte[] buf = new byte[1024];
	int len;
	while ((len = is.read(buf)) > 0) {
	    out.write(buf, 0, len);
	}
	out.close();
	return out.toByteArray();
    }
Known Uses
Lots of graphics file formats must use sequence compression. Jpeg, for example, uses lossy sequence compression, but in two dimensions rather than the basic one-dimensional sequence compression we have described. [Jpeg format, other graphics formats]
Lempel-Ziv and variant compression algorithms are an industry standard, evidenced by the many PKZip and gzip file compression utilities. These algorithms have also been incorporated into many image file formats such as GIF [GIF87].  File compression is also used architecturally in many systems.  For example, Linux kernels can be stored compressed and are decompressed when the system boots, and Windows supports optional file compression for each disk volume [Linux, Windows].  Java's JAR format uses zip compression (isn't it just a restricted version of a standard zip format?).  Some backup tape formats use compression, notably the Zip and Jaz drives.
Text Compression  [Bell+90] and Managing Gigabytes [Bell] describe many other examples, and provide a good overview of most of the common compression algorithms, including algorithms tailored for non-textual data.  Further information on the lzip library and ZIP compression algorithms are available here.
The ZIP file format is an Internet standard, described in RFC 1056 and RFC 1069 [GabrielZip, RFC1056, RFC1069].
See Also
SEQUENCE COMPRESSION will also encode a file quite happily.
You may need a DATA CHAINING using a temporary file to compress the data.
______________________________
Byte Coding Pattern
Also known as: byte-coded interpreter, virtual machine.
How can you compress machine code?
* The memory requirements for program code exceed the space available.
* You cannot do a FEATURECTOMY, deleting some features from the program's requirements and deleting the code that supports those features.
* SECONDARY STORAGE would be too slow, too expensive or too demanding of memory buffers.
* FILE COMPRESSION would require too much space to store the decompressed code in main memory. 
* You would like to be able to dynamically load code into your program.
* You would like to increase the portability of the program.
In order to be executed by a CPU, a program's machine code needs to be resident in main memory. But, for many programs, the memory requirements of the code are a large fraction of the memory requirements of the whole program.  Unfortunately, if you want to support the features in the program's requirements, they have to be embodied somehow in the code, and so take up memory space: There ain't no such thing as a free lunch. [Hacker's Dictionary] 
One option to reduce the memory requirements of the program would be to store programs on SECONDARY STORAGE until they were required.  While this reduces the storage needed for inactive programs, executing programs must be in main memory (this is what distinguishes main memory from secondary storage).  Sophisticated techniques like SEGMENTATION and PAGING can be used to give the illusion of executing code from secondary storage, but this still requires large amounts of memory for buffer space.  So, storing programs in secondary storage doesn't reduce their requirements for main memory.
For similar reasons, while FILE COMPRESSION can be used to reduce the size of the executable, it is not practical when the program's code will not fit into main memory.  Using file compression, the whole executable (or large portions of it) must be decompressed in one piece, and the resulting decompressed version must be stored in main memory while the program is running.
Another option is to store the code for the files on READ ONLY STORAGE such as ROM, because this is less expensive (and consumes less power) than main memory.  For read only storage to be a practical option, however, the program's code must still be able to fit into the available read only storage capacity.  If a program is so large that it cannot be accommodated in main memory, it may also be impractical to store it in read-only storage.  Read-Only Storage is also harder to manage than secondary storage - precisely because it is read-only, it is very difficult to write to, to install or update programs stored there.
Finally, Dynamic loading and Binary code portability are becoming increasingly important for some kinds of small machines. As wireless networking and portable computing devices become more ubiquitous (machines like palmtops, telephones, intravenous computers, and wrist-mounted PCs) these small devices will need to be able to run a range of programs.  They will even be upgraded in the field by downloading code directly into the device.  Unlike the desktop environment, which has historically been dominated by one (or, at most two) competing machine architectures, small machines are built using a wide variety of computer architectures.  Programs for small devices need to be portable across a number of different architectures - dynamically and transparently.
For example, the Strap-It-On PC requires a large amount of executable code, both for its operating system and user programs. Primary storage needs to be used to store users' data, as well as code, so the Strap-It-On has very little main storage available.  Although the PC has secondary storage - primarily flash RAM attached to the strap, but also belt-mounted floppy drives (and hard drives in extreme cases) - access to secondary storage is slow and consumes large amounts of power.  In any event, the Strap-It-On's CPU cannot execute programs from secondary storage, so the program would need to be copied into main storage to execute - and the problem is precisely that there isn't really enough main storage.  Compressing executable files would save space on secondary storage, but, once again, this still doesn't solve the problem, as main storage is required to decompress, store and execute them.  So, how can we reduce a program's storage requirements while still allowing it to run?
Therefore:	Use byte codes to store the instructions, rather than full machine language.
Bytecodes are an intermediate representation for program code that is designed to occupy less space than traditional machine instructions, but can still be executed.  To save space, the most common instructions in a bytecode occupy only one byte - this is why they are called 'bytecodes'. Once a program's code is compiled into bytecodes, the bytecodes can be interpreted directly at runtime, with a reasonably minor memory overhead.
For example the Strap-It-On PC uses a bytecode to store and execute many of its user programs. Programs are compiled into bytecodes by a compiler that runs on a desktop PC, and then the bytecodes are loaded down to the wrist-top via a infrared or wireless link.  The Strap-It-On's core operating system (compiled from C to native machine code) contains an interpreter that executes the byte codes from main memory.
Consequences:
Byte codes occupy much less space than native machine code or source code, reducing the program's memory requirements.  Byte codes can be executed directly by an interpreter, so little extra memory is required for execution. 
A single byte code instruction set can be portable across different machine architectures, once you have implemented an interpreter for type bytecodes on that architecture.
Because the machine executing the bytecodes is completely under your control, you can easily implement more advanced facilities, such as dynamic loading or paging, but also tracers, debuggers, and execution profilers.  It's easier to modify a program interpreting a bytecode instruction set, than it is to modify a real piece of silicon.
However: byte codes are typically slower to execute than native machine instructions, reducing time performance because they must be decoded and interpreted.  In some rare cases a well-designed byte code set may provide better end-to-end performance, for example, when the time to load the codes over a slow network is included. 
Byte codes are quite easy to understand and encode, so they while they require programmer effort to implement and discipline to use, the cost is not overwhelming.  You have to test the byte code compiler and interpreter, but these tests are quite straightforward.
More advanced bytecodes systems typically translate the bytecodes to native machine code on the fly, using a dynamic translator (also know as a Just-In-Time or JIT compiler). While this can certainly increase the program's time performance, it also increases the memory requirements, as memory is required to store the more complex dynamic translator/code generator, to provide temporary working space during the translation, and then to store the translated machine code.
Implementation
There are three main steps to implementing a byte coded system - designing an instruction set, implementing a compiler, and implementing an interpreter.
Designing a byte-code instruction set
The classical design for bytecode instruction set is based on a stack machine. A stack machine has no registers; rather, temporary results of computations are stored on a stack.  Bytecode instruction sets for stack machines contain two types of instructions - memory instructions and operation instructions.  Memory instructions - loads and stores - simple read a value from memory and push it onto the stack, or, alternatively, pop item off the top of the stack and store it back in memory.  Operation instructions do "real work" such as computing arithmetic, string operations, and so on, by popping one of more arguments off the top of the stack, and then popping the result of the computation back onto the stack.  Design your bytecode set so that it has enough memory instructions to access every kind of memory location you will need, and has the operation instructions to invoke the basic operations of the computing platform.
The key to making your bytecode compress executable instructions is to ensure all the most common instructions in programs can be coded as briefly as possible.  For example, reads from parameters, reads and assignments to local variables, and (in object-oriented languages) access to self are much more common in most programs than operations to compute the hyperbolic square root or restart a failing disk drive.  Use escape codes to represent uncommon operations in two or more bytes, to ensure that the most frequent instructions can be coded in the minimum of space.
Overall though, the easiest way to design a bytecode instruction set is to adapt an existing bytecode set to suit your needs.  The specification of the bytecode sets for Smalltalk [Goldberg], Java [JVM Spec], Limbo [Limbo Spec], and MUSH [Mush Spec] are all readily available.
Implementing a compiler
Once you have designed your bytecode set, you much somehow turn your programs into bytecodes. The most common solution to this is to write a compiler from a high-level language into the bytecodes. A bytecode compiler in theory needs all of the "front end" (lexical analyser, parser, and symbol table manager) of traditional compiler.  However generally it can make do with a much simpler "back end" code generator, because bytecodes are typically much simpler - and much more regular - than the instruction set of a real machine. In practice, working compilers for stack machines can be build every easily - every variable read or write becomes a load or store instruction, and every operation becomes an operation instruction.
For example, the following expression (x * x) + (y * y) is represented as the following parse tree:

To produce byte codes from a parse tree, simply make a post-order traversal of the tree, that is, traverse the tree depth-first, visiting each node, and emitting code for a node after visiting all it's children. In the example, this would produce: 
x  x  *  y  y  *  +
To complete the compilation, you should emit bytecodes to load or store variables and perform operations. For example, here are the Java bytecodes for that sum of squares example.  Note how each bytecode corresponds directly to one node in the tree traversal.
   0 iload_1                push first argument on to the stack
   1 iload_1                and again
   2 imul                   multiply top two stack items, and deposit result
   3 iload_2                push second argument on to the stack
   4 iload_2                and again
   5 imul                   multiple top two stack items, deposit result
   6 iadd                   add top two stack items, deposit result
The techniques and tools for writing compilers are well know, well described elsewhere, and are unrelated to byte codes per se - For more information about building a compiler, see [Dragon Book][Some other books][APST]
Implementing an interpreter
A bytecode interpreter must take in bytecodes and interpret them (no kidding!)  The basic structure of an interpreter is shown in the missing figure below.  For a stack machine, the interpreter needs a stack, a stack pointer and an instruction pointer into the bytecodes.  The instruction pointer iterates over the instructions, and the interpreter then executes each instruction, typically either pushing or popping operands from memory onto (or out of) the stack, or performing some operation on the top few stack items.

Implementing a byte-code interpreter, and a compiler to the bytecodes, is generally quite easy.  Making it perform well is the difficult part. Typically, a bytecode interpreter will execute between 10 and 100 times slower than native machine code.  Most variations and complications aim to increase the execution speed of a bytecode interpreter. Unfortunately for us, they generally do this by trading off memory space for execution time.
Caching
One simple example, which can work well for object-oriented languages, is to cache the results of method lookups.  In an object-oriented language each message send could be retrieved by potentially hundreds of objects, and the code has to be searched for in the receiving object's inheritance hierarchy.  Lookup caches, which simply record the result of the last few method-name/class pairs, can greatly optimise this process of searching for methods [Bits of History, Smalltalk book]
Brute-force macho programming techniques can also speed up interpreters, in some cases also reducing their memory footprint.  For example, storing the most crucial variables of an interpreter in registers (as against in memory) and inlining any subprocedures into the interpreter main loop can help optimise an interpreter's performance.  The GCC compiler has a number of extensions to the C language to make these optimisations easier (global register variables and function inlining). The truly macho can read Eliot Moss's interesting paper about doing much the same by running sed on the C compiler's output.
Dynamic Translation to Machine Code
More recently, dynamic translators (also know as Just-In-Time or JIT compilers) have become more popular implementations for bytecodes.  This is because JIT compilers generally perform better - often much better - than a simple interpreter.  There is no theoretical reason why a JIT compiler should not be able to produce just as good code a traditional, "static" compiler, although in practice the best JIT compilers produce code which runs about half as fast as traditional optimising compiler. 
Of course, a JIT compiler is more complex to implement that a simple interpreter.  But because the bytecodes provide a portability layer, you can implement a simple interpreter initially, and then replace it with a dynamic compiler if the effort seems justified (and the system has enough memory to support it). 
If you plan on using a JIT compiler, you can design your bytecode to be easier to translate into machine code.  In particular, you can design a simple register machine, rather than a stack machine - that is, you implement instructions that explicitly address operands in registers (e.g. add r1 r2 r3), rather than implicitly addressing operands on the stack.  A register machine generally requires fewer instructions than a stack machine, but the instructions must be more complicated.  The Inferno system uses a "bytecode[k1]" based on a register, because register instructions are easier to map to the instruction sets for real machines, which, after all, have registers.
Polish Notation
A final variation makes compilation easier, rather than interpretation.  Most programming languages use an infix notation (such as a + b * c). Following the usual rules of mathematics, the multiplication must be computed before the addition, so this expression requires some effort to translate into bytecodes such as 
iload_2                    load b
iload_3                    load c
imul                       compute b * c
iload_1                    load a 
iadd                      compute a + (b * c)
A full compiler with a parser and code generator can perform this translation easily, but typical compiler designs often require quite a large amount of memory.  Languages based on other notations, however, can be easier to translate into bytecodes.  
The easiest languages to translate into bytecodes are based on are so-called Polish notation, where operations are written either before (prefix) or after (postfix) their arguments.  For example, Cambridge Polish notation, used in Lisp, writes expressions within parenthesis, beginning with the operator symbol. In Cambridge Polish notation, the expression a+b*c becomes
(+ (* b c) a)
This can be translated easily to the parse tree shown in above (each parenthesised expression makes a new node, and its internal expressions become its children), which can then be translated into bytecodes by the postfix traversal.
Another notation, called Reverse Polish Notation (or RPN) is even easier to translate to bytecodes. In RRN, operators are written after their operands (RPN is postfix), so the expression a+b*c becomes
b c * a +
RPN is used by the FORTH family of languages [Moore], including the FORTH Derivative Postscript [Adobe], and a number of scientific calculators, precisely because it so very easy to translate to bytecodes for a stack machine. Each "word" in RPN becomes one bytecode instruction, and the order of the bytecodes matches the RPN expression exactly.
Example
First, here's some Java code for a very simple method, which returns the sum of the squares of its two integer arguments. It also gratuitously calls the check method before returning the sum as the result of the method.
public int sumsq(int x, int y) {
    int z = (x * x) + (y * y);
    check(z);
    return z;
  }
Now, here are the resulting bytecodes (Sun's JDK includes a tool called javap which will print your bytecodes, amongst other things about your files).
   0 iload_1                push first argument on to the stack
   1 iload_1                and again
   2 imul                   multiply top two stack items, and deposit result
   3 iload_2                push second argument on to the stack
   4 iload_2                and again
   5 imul                   multiply top two stack items, deposit result
   6 iadd                   add top two stack items, deposit result
   7 istore_3               store top of stack into location 3
   8 aload_0                push self onto the stack
   9 iload_3                push location 3 onto stack
  10 invokevirtual #7 <Method void check(int)>
                            call the check method, passing arguments
  14 iload_3                push location 3 onto the stack
  15 ireturn                return from the method,
                            with the the top of the stack as result
The key point here is that the bytecodes act on a stack.  The load bytecodes push the contents of memory locations onto the stack.  The store bytecodes remove the top value from the stack and store it into memory.  And the other bytecodes typically rearrange the stack, popping off one or more elements, doing some computation, and pushing the result back on to the stack.  For example, here are the contents of the stack just before instruction 2 is to be executed:
    Bottom:                 x            contents of first argument
    Top:                    x            contents of first argument
And after executing the imul instruction (for integer multiply), the two arguments have been pulled off the stack and the result of the multiplication has been pushed on.
    Bottom:                 x * x         square of first argument
This value stays on the stack while two copies of the second argument are pushed onto the stack, 
    Bottom:                 x * x         square of first argument
                            y             contents of second argument    
    Top:                    y             contents of second argument
Then the second imul pops both off and replaces them with their product. 
    Bottom:                 x * x         square of first argument
    Top:                    y * y         square of second argument
Then the two products are popped of in their turn, and replaced with the sum of the product, 
    Top:                    (x*x) + (y*y)    sum of squares
which is popped off and stored into a temporary memory location. 
The key to a stack machine is, of course, the stack.  Generally a stack has to keep track of a number of items of different types.
union stackitem {
	char *string;
	int *integer;
	double *real;
};
A stack is just an array of these things, with operations to pop and push them.
union stackitem stack[100];
union stackitem *sp;
#define push_item(I)    (++*sp)=I
#define pop_item()      (*sp--)
#define push_double(f)  (++*sp).real=f
#define pop_double()    (*sp--).real
#define push_integer(i) (++*sp).integer=I
#define pop_integer()   (*sp--).integer
Now, here's the core of an interpreter that could use the stack to execute these bytecodes. The interpreter is called with an array of bytecodes to execute, and an array of stack items as arguments to the method.  The core of the interpreter is a huge switch statement, with one entry for each bytecode. The interpreter itself just loops around the switch until a return bytecode is reached.
Here is the loop and switch, and code to handle three load and one store bytecodes:
item interpret(bytecodes *method,stackitem args[]){
	union item stack[100];
	union item *sp;
	bytecode *ip = method;
	for (;;) {
		switch(*ip) {
		case ILOAD_1:	push_integer(args[1]); ip++; continue;
		case ILOAD_2:	push_integer(args[2]); ip++; continue;
		case ILOAD_N:	push_integer(args[*++ip]); ip++; continue;
		/ ... /
		case ISTORE_1	args[1] = pop_integer(); ip++; continue;
		/* ... */
		}
	}
}
The ILOAD_1 and ILOAD_2 bytecodes are single byte instructions, which push the first and second arguments onto the stack, respectively. The ILOAD_N bytecode is a two-byte instruction; the first byte is the ILOAD_N opcode, and the second byte an argument index from 1 to 255.  Using ILOAD_1 and ILOAD_2 allows the first and second arguments to be pushed onto the stack using only one byte - ILOAD_N allows methods to access more arguments. This works because most methods use their first or second arguments much more frequently than others do (the actual Java bytecodes has short forms for up to NN arguments).  The ISTORE_1 bytecode is the store equivalent to ILOAD_1; there is a store equivalent to each of the other LOAD instructions; the Java bytecode set also includes very similar bytecodes to deal with different data types which can be stored on the stack.  After doing the actual work of the bytecode, the instruction pointer is incremented (to point to the next instruction) and the loop is re-entered.
The operation bytecodes, imul, iadd and soon on, can be implemented by popping values from the stack, and pushing their result.
case IMUL:	push_integer(pop_integer() * pop_integer()); ip++; continue;
case IADD:	push_integer(pop_integer() + pop_integer()); ip++; continue;
case ISUB:	push_integer(pop_integer() - pop_integer()); ip++; continue;
The pop and return bytecodes similarly manipulate the stack - the return instruction, of course, simply returns from the method.
case POP:	pop_integer(); ip++; continue;
case IRET:	return pop_integer(); ip++; continue;
Finally, the INVOKE-VIRTUAL instruction must call another method to execute.  Method execution can be very simple or tremendously complex, depending on the language you are implementing.  Details of method lookup are really unrelated to byte coding per se.  Here's one approach:
case: INVOKE+VIRTUAL {
			int argc;
			method *meth = lookup_literal(*ip++);
			item *args = calloc(meth->argc * sizeof(stackitem));
			for (a=0; a < meth->argc; a++) {
				args[a] = pop_item();
			}
			push_item(interpret(meth->bytecodes,args));
			ip++;
			continue;
		}
A magic lookup method uses the next bytecode in the instruction stream to find the method to execute.  The method structure describes the number of arguments it needs, and these are popped from the stack and into an argument array.  Then, the bytecode interpreter is called recursively to execute the bytecodes, and the result pushed back onto the stack.
Known Uses
Bytecodes have been used in small (or smallish) systems for at least the last twenty years.  Most Smalltalk variants have been implemented using bytecodes, which are traditionally interpreted directly, although more modern Smalltalk's use a dynamic compiler, and the original Smalltalk bytecode interpreter was implemented in microcode on a Xerox Dorado. Smalltalk's bytecode interpreter is described in detail in the Smalltalk "Blue Book": Smalltalk-80: the Language and its Implementation, complete with code - in Smalltalk.  Unfortunately later editions of this book deleted the description of the interpreter. The Smalltalk instruction set is a classic stack-based bytecode set, and has remained mostly unchanged until today.  Smalltalk: Bits of History, Words of Advice, which Addison-Wesley sadly sent to the landfill just as the second Smalltalk boom in the early 1990's was starting, contains lots of interesting observations about building bytecode interpreters in practice.
The source code to GNU Smalltalk is freely available, and includes a Smalltalk interpreter written in C but based closely on the Blue Book.  Going one better, the retro 'Squeak!' dialect of Smalltalk, built by some of the original Smalltalk team, uses a Smalltalk-to-C translator (written in Smalltalk) to translate a version of the Blue Book interpreter into C, which is then compiled into machine code.  Squeak now also boasts a JIT compiler, also written in Smalltalk and translated into C.
The most famous early JIT compiler was written by Peter Deutsch and Alan Schiffman for Xerox Smalltalk in the late 1980's, and became the core of ObjectShare's VisualWorks Smalltalk system. At the time of writing (1999) the performance of this dynamic translator was still roughly parallel with those used by the Java JIT compilers then in production, in spite of the fact that the Deutsch-Schiffman compiler was then over then years old, and the Java bytecodes provide many more hints for speed optimisation than Smalltalk bytecodes. [DS89]
The ultimate dynamic compilers were those built for the Self programming language, which use a number of different compilers to inline, split, cache, and otherwise aggressively the translated code.  These complex techniques are generally unusable in a small machine environment, since most small machines cannot dedicate 128 megabytes to their compilers.  [Self papers]
The most well-known modern systems based on bytecodes are Java and Inferno, which always compile code to a byte code format; minimising code size and transmission time, and incidentally ensuring these systems' cross-platform portability.  The Java bytecode is also a classic stack-based instruction set, although including rather more type information than the Smalltalk bytecodes.  Inferno, on the other hand, implements a register machine, rather than a stack machine, to facilitate dynamic translation to machine code.
Other retro systems which used bytecode sets to save memory included the UCSB P-system, an early Pascal compiler used to write the first version of Wizardry game and one of four alternative operating systems for the original IBM PC.  They also included BCPL [Richards], Emacs [Stallman], NewtonScript [Smith99] and the Zork series of adventure games, [Zork].
Forth is the classic example of a system based on reverse polish notation and implemented by a threaded code interpreter [Moore].  Postscript is also based on reverse polish notation.
The SubArctic toolkit uses byte codes to store the constraints used for interface layout [Hudson+96]. 
Antero Taivalsaari has described Java-In-Java - a Java Virtual Machine implemented within Java.  [Taivalsaari98]
See Also
Byte codes are interpreted by a variant of the INTERPRETER pattern [GOF]. STRING COMPRESSION uses similar techniques to compress strings. FILE COMPRESSION is an alternative means of compressing code held in secondary storage, but requires a time overhead to decompress the code, and, more importantly, RAM overhead to store the executing code. 
Because a bytecode interpreter is completely under your control, it is possible to implement patterns that need pervasive access to the environment (such as SEGMENTATION, PAGING, COPY ON WRITE, or SHARING) within a bytecode interpreter for a virtual machine.  This requires significantly less programmer effort, than somehow patching them into an operating system or implementing them separately within each application.
______________________________
 
1 The largest single exception is the compression technique used for command names in the Unix system - it appears that many command names are compressed by omitting all vowels.  This "copy" becomes "cp", and "move" becomes "mv".  This produced the following quote in a signature file: "f y cn rd ths y mst b sng nx" (if you can read this, you must be using Unix").
[k1](actually a 16 bit instruction set) ... say howmany bits inferno uses!


String Compression Pattern		UNTITLED by Weir, Noble



	(c) 1999 Charles Weir, James Noble 	Page 7





