Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
(Let me preface this with some background, I've written a few format specific compressors, and have implemented a few standard compressors.) Lets talk about *what* compression is. Compression is representing a theoretically larger range of data with a smaller range. Compression is removing "knowledge" about the data from the data, leaving only the "interesting" part that is needed to reconstitute the data based on knowledge. Think of it this way: If you have a text document, and you KNOW you'll never need more than 6 bits to represent any character, then, you can encode this using 6 bits instead of 8 bits. Saving 25% of the space. Now, jpeg is a lossy compression, it takes knowledge about bitmap images and photohraphy and makes assumptions about what the important aspect of the image is. Specifically, squashes the dynamic range of each pixel, but samples areas of the image to encode the range for multiple pixels. It uses the reduced pixel information and the sampled range for a group of pixes to reconstitute a pixel. Compression is based on the reduced aamount of data for each pixel. The idea, then, is to take knowledge of a numerical series or knowledge about data. The theoretical number range may be X but in reality there is only Y variation, and if Y < X, the data should be compressable. When you look at a ZIP compressed file, it does not compress well using standard stream compressors. This is because there are few if any repeating paterns. The fact that it has few repeating patterns, makes this stream interresting. It means that it is less likely in a universe of streams. If it is less likely, then it can be represented with fewer bits. Standard compressors work on the notion that repeated patterns can be saved once and re-used when they show up again. LZW actually uses more bits per byte for each byte, but many fewer bits for strings of repeated bytes. A "re-compressor" is different, rather than "repeated" bytes, it would use information about special sequences of data that have little or no repitition and few recognizable patterns. This may be hugely expensive to compute and offer only a compressability of a few percent, however, if it repeatable, then it can iterate and iterate untill the overhead for the compression is larger than the data. I'm not saying it is practical, workable, or really possible, I'm just saying that, in theory, it may be possible. On Aug 12, 2004, at 8:57 AM, markw at mohawksoft.com wrote: > Jules has mentioned that he has a compressor that will compress > compressed > files. I looked at this a while back and it is a really cool idea. > > It is theoretically possible. > > Use conventional compression at first. Numerically speaking, a > traditionally compressed file is interresting because it has virtually > no > repeating numbers. It is also unlikely to have any serious non-random > seeming number sequences. > > As such, this file may be able to be described with fewer bites than it > contains. That, my friends, is the nature of compression. > Sorry, your writing sounds a bit too much like Jules's. What is the nature of compression? Your writing jumps cleanly from introduction to conclusion without saying anything. How can a file with no repeating numbers be described with fewer 'bites' (sic)?
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |