Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Date: Fri, 7 Jan 2005 16:06:34 -0500 (EST) From: "Rich Braun" <richb at pioneer.ci.net> "Robert L Krawitz" <rlk at alum.mit.edu> challenged me: > Think about a financial services company that issues credit cards, and > they need to store data on every single transaction for years. They > *absolutely* need that backup. Think about however many billions of > transactions we're talking about every month. OK do some math. How much data per transaction? Let's call it 2 Kb, probably less (a hundred bytes is enough for the name, card number and other minimal stuff). A gigabyte can hold 500,000 such transactions. 500,000 transactions is 500 transactions * 1000 stores. 500 transactions in a day works out to maybe 40 per hour, assuming that the store's open 12 hours per day. That's not a whole lot. If you're doing the backups for the Wal-Mart store chain then I agree with you. But most of us work for a non-retail establishment that doesn't issue credit cards. In fact I'd hazard a guess that there are only a couple hundred companies in the entire world that have to handle more than a million transactions per day. It has to be more than that. Every fast food chain, every decent size department store chain, etc. will easily exceed that number. Major credit card issuers will exceed that by orders of magnitude. So if we assume a typical company that does 100,000 or fewer transactions per day, then a month's worth of data would take about 6 gigabytes. *Not* terabytes. And you only need to keep a few months of data online, older data can be put in an archive that doesn't need daily backup. There's no such thing as "typical"; company sizes are all over the map. Maybe a smaller company doesn't have that much in the way of storage requirements, but plenty of bigger companies do. If you want to do personalized customer service based on history (oh, sir, I see you're buying more batteries to go with the camera you bought last month; would you like an extended warranty to go with that?), you do need quite a bit of data easily accessible. You need a fantastic number of spindles to make that possible (lots of 36 GB 15000 RPM drives work a lot better than a few 250 GB 7200 models). Don't sneer; this is what retailers want to do. Loyalty cards writ large. Think about a credit card issuer trying to prevent fraud by matching up a transaction against someone's history and flagging a suspicious transaction. That needs to compare a transaction against long term history on the fly, and that means on line data -- lots and lots of it. Everything from the size of the purchase to what's being purchased to where it's being purchased to the time of day, where I've bought other things recently. When you think about a credit card company that may have hundreds of millions of cards with each card being used for 200 transactions/year, it all adds up in a hurry. If we're talking about 1E11 transactions/year, your 2K/transaction (which offhand doesn't feel too far off the mark, although I don't know the exact number) comes out to 2E14 -- 200 terabytes -- per year, and growing. 500 gigabytes/day. And that's just the transactional data, not the analytics. Think about Google. Currently over 8 billion web sites indexed, cached, etc. Now, if you simply don't *like* the fact that companies want to do that, that's your right, but don't say that you don't see the need for it. Does your office do 100,000 transactions per day? I'm still trying to come up with a reason for *terabytes* of online storage. I'm in software development, swinging around software builds that might take a gigabyte each for a large number of developers. Some developers have multiple build trees for different projects. > You'd be surprised (or maybe not, if you reflect on it) ... Are you surprised at my analysis? I have reflected on this and am simply amazed at how much data companies are storing. It just can't be useful. I'm not "surprised" at your analysis; I just think it's a bit too narrowly focused. Companies also want to store as much data as they can about everything they do so that they can use it in the future, even if they aren't using it now.
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |