Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Backups was Restoring MBR - Solved



   Date: Fri, 7 Jan 2005 16:06:34 -0500 (EST)
   From: "Rich Braun" <richb at pioneer.ci.net>

   "Robert L Krawitz" <rlk at alum.mit.edu> challenged me:
   > Think about a financial services company that issues credit cards, and
   > they need to store data on every single transaction for years.  They
   > *absolutely* need that backup.  Think about however many billions of
   > transactions we're talking about every month.

   OK do some math.  How much data per transaction?  Let's call it 2
   Kb, probably less (a hundred bytes is enough for the name, card
   number and other minimal stuff).  A gigabyte can hold 500,000 such
   transactions.

500,000 transactions is 500 transactions * 1000 stores.  500
transactions in a day works out to maybe 40 per hour, assuming that
the store's open 12 hours per day.  That's not a whole lot.

   If you're doing the backups for the Wal-Mart store chain then I
   agree with you.  But most of us work for a non-retail establishment
   that doesn't issue credit cards.  In fact I'd hazard a guess that
   there are only a couple hundred companies in the entire world that
   have to handle more than a million transactions per day.

It has to be more than that.  Every fast food chain, every decent size
department store chain, etc. will easily exceed that number.  Major
credit card issuers will exceed that by orders of magnitude.

   So if we assume a typical company that does 100,000 or fewer
   transactions per day, then a month's worth of data would take about
   6 gigabytes.  *Not* terabytes.  And you only need to keep a few
   months of data online, older data can be put in an archive that
   doesn't need daily backup.

There's no such thing as "typical"; company sizes are all over the
map.  Maybe a smaller company doesn't have that much in the way of
storage requirements, but plenty of bigger companies do.  If you want
to do personalized customer service based on history (oh, sir, I see
you're buying more batteries to go with the camera you bought last
month; would you like an extended warranty to go with that?), you do
need quite a bit of data easily accessible.  You need a fantastic
number of spindles to make that possible (lots of 36 GB 15000 RPM
drives work a lot better than a few 250 GB 7200 models).  Don't sneer;
this is what retailers want to do.  Loyalty cards writ large.

Think about a credit card issuer trying to prevent fraud by matching
up a transaction against someone's history and flagging a suspicious
transaction.  That needs to compare a transaction against long term
history on the fly, and that means on line data -- lots and lots of
it.  Everything from the size of the purchase to what's being
purchased to where it's being purchased to the time of day, where I've
bought other things recently.  When you think about a credit card
company that may have hundreds of millions of cards with each card
being used for 200 transactions/year, it all adds up in a hurry.  If
we're talking about 1E11 transactions/year, your 2K/transaction (which
offhand doesn't feel too far off the mark, although I don't know the
exact number) comes out to 2E14 -- 200 terabytes -- per year, and
growing.  500 gigabytes/day.  And that's just the transactional data,
not the analytics.

Think about Google.  Currently over 8 billion web sites indexed,
cached, etc.

Now, if you simply don't *like* the fact that companies want to do
that, that's your right, but don't say that you don't see the need for
it.

   Does your office do 100,000 transactions per day?  I'm still trying
   to come up with a reason for *terabytes* of online storage.

I'm in software development, swinging around software builds that
might take a gigabyte each for a large number of developers.  Some
developers have multiple build trees for different projects.

   > You'd be surprised (or maybe not, if you reflect on it) ...

   Are you surprised at my analysis?  I have reflected on this and am
   simply amazed at how much data companies are storing.  It just
   can't be useful.

I'm not "surprised" at your analysis; I just think it's a bit too
narrowly focused.  Companies also want to store as much data as they
can about everything they do so that they can use it in the future,
even if they aren't using it now.




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org