Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

NoSQL vs SQL



On Tue, Dec 07, 2010 at 12:07:13AM -0500, Mark Woodward wrote:
> 
> The most important aspect of these scalability discussions, and one 
> which I frequently find lacking in the "NoSQL" camp is a critique of 
> just how many full scale transactions a SQL database can have.

...

> average head positioning time.  This gives us a worst case average of 
> 77  arbitrary write procedures per second.
> 
> The actual iops is higher based on seek time these days, but lets use 
> the worst case scenario. (We could use a RAID system and multiply 
> performance)
> 
> In an ACID database configuration, assume 1/2 maximum, i.e 36 writes per 
> second on the database. That's 36 transactions (read/write) per second. 
> Assuming 1 transaction per page view, that amounts to about 90 million 
> page views a month (on average) as a sustainable number.
> 
> So, a good SQL database with no scaling tricks on a bog stock modern PC 
> based server will serve a web site as busy as all but the very most 
> popular sites on the web. Someone, please tell me, what are the NoSQL 
> guys going on about with regard to scalability?

There are basically no sites that use a database and only want one record
view/update per page view.

Let's say we have, oh, a dating site. The login page may only need one DB
query. How about the second page, where they will helpfully show you the
names, photos and blurbs of ten or so eligible people, plus a check on
how many messages you've received, plus another on how many you've sent,
and... you can get into 50 queries pretty quickly. Add some billing and
logging records and you could have a couple of updates in there, too.

Now, a clever system will aggregate the most often requested data into
a single row, when it can... that helps with the second page, but not
with profile pages or a mail-analogue system. And a clever system will
cache as much frequently-requested data as possible.

But then there's Facebook, and anything even vaguely like it.  The obvious
first thing to do is to optimize time at the cost of storage: store a
copy of the new message with each of the recipients, so that when they
read their update page, that takes one transaction instead of as many
as they have friends. And pretty quickly you run out of disk space. So
you decide that you have a read-mostly environment, and have many, many
read-only slave databases to handle lookups, and most of those lookups
aren't needing anything near the power of SQL -- really, you could get
away with a key-value lookup, if only it were FAST -- and before you
know it, you've re-invented MongoDB.

By way of contrast, I know there are sites which do real work on
data people care about in sophisticated ways -- and they
generally use full SQL databases, because that's what they need.

SQL and ACID solve some tough problems. Key/value systems solve
some speed issues, as do caches. There are lots of solutions
because there are lots of problems.


-dsr-

-- 
http://tao.merseine.nu/~dsr/eula.html is hereby incorporated by reference.
You can't defend freedom by getting rid of it.






BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org