Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Having read the link you submitted, I think there is a common misunderstanding between the people who think the repeatable compression is possible and those who don't. Now, again, I'm not saying I believe it is possible, I'm saying that some of the arguments sound plausible and it may be possible. The argument of the people who say it is impossible is this: You can't compress data beyond its universe, i.e. if you have 2 billion tokens, without recognizable streams (repetitive tokens, known sequences of tokens, etc.) You can not compress below 2 billion tokens. This is actually not a dissimilar argument put forth from the repeatable compression people, only what is "recognizable" is in question. Now, the theory is this: standard compressors take repeating patterns and basically reduce the universe of a stream to a subset. A compressed stream of data we know is smaller then its possible universe would inhabit because it is not normal for a stream of random data NOT to contain any repeating patterns. The crux of the argument is the quantification of how much smaller the universe is for the compressed stream. It is inarguable that if, by pure random chance, the bytes in a compressed stream are identical to the output bytes of a specific random number generator with a specific seed, an astonishing amount of compression could be had. The premise now is to search any and all known (or new) pseudo random sequence generators, compressors and keys that may produce all or some limited runs within the stream. Something like this: A segment is a run within the stream that can be represented by a compressor. The stream is a data stream. while(stream) { best_compressor = NULL; best_seed = NULL; best_count = 0; foreach(compressor) { foreach(seed) { bytes_represented = compare(stream, compressor, seed); if(bytes_represented > best_count) { best_compressor = compressor; best_seed = seed; best_count = bytes_represented; } } } if(best_count > segment_overhead) { put(best_compressor, best_seed, best_count); stream += best_count; } else { put(NULL, stream, 1); stream++; } } As you can see, with any interesting number of compressors and interesting range of seeds, this can be HUGELY expensive. It is an interesting theory. Numerically, it sounds sort of fishy, but if you can get lucky enough, it works. I think the people who say it can't work are kind of closed minded, like the scientists who said we couldn't travel faster than 60 MPH, break the sound barrier, or fly. Plus, there is a lot of computational power available today as well as a lot of storage. If it takes a 100G of data dictionaries and code and vast amounts of CPU on each system to encode and decode data, that's not automatically impractical, especially for long range communications, say like mars. To their detriment, the recompression people are also a bit unreasonable. They ignore a lot of practical problems and generally do not seem to have a good grasp on the actual issues. For me, you guys have called me a crank. That's fine, but I also remember when 100MBIT over twisted pair for any practical length was considered impossible. I also remember scientists saying barriers will never be broken for some reason or another. Chip manufacturers do a lot of things that seemed impossible 25 years ago. To dismiss as impossible is silly, because that is based on the notion that everything you know right now is 100% correct, and we all know it isn't. To believe it is possible is also silly because no one has done it and the numbers are shaky, based on what we currently know. For me, it is interesting. Sort of like alchemy, turning lead into gold seems much less impossible today than at any other time in history. Doesn't anyone else find this interesting? or has it become impossible to discuss unproven theories without personal attacks? On Fri, 2004-08-13 at 10:07, Seth Gordon wrote: > markw at mohawksoft.com wrote: > > > > Not the point, if you have a repeatable compressor, then compression ratio > > is largely unimportant. If you can compress something 3% over and over > > again, then you can make it really small. > > Yes, and if you have a perpetual motion machine, then Middle Eastern oil > reserves are largely unimportant. > > Assume that you have a "repeatable compressor" that will losslessly > shrink any string of bits by 3% every time it is run. What happens when > it receives a 100-bit input? There are 2^100 possible inputs, but only > 2^97 possible outputs. Therefore, there must be *some* inputs for which > this compressor cannot produce a 97-bit output--indeed, there must be > some inputs for which this compressor cannot produce a 99-bit output. > Therefore, this repeatable compressor cannot exist. QED. Seth is on-target. The above is stated with much greater rigor at: http://www.faqs.org/faqs/compression-faq/part1/ [Section 9.1] Apparently, cranks have been flirting this topic for some time and just can't seem to let it go. I wish this group would let it go. Please? Ed -- Edward H. Hill III, PhD office: MIT Dept. of EAPS; Rm 54-1424; 77 Massachusetts Ave. Cambridge, MA 02139-4307 emails: eh3 at mit.edu ed at eh3.com URLs: http://web.mit.edu/eh3/ http://eh3.com/ phone: 617-253-0098 fax: 617-253-4464
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |