[Discuss] Rob Conery's critique of MySQL?

Thu Aug 2 20:44:56 EDT 2012

On 8/2/2012 4:36 PM, Mark Woodward wrote:
> Not to be snide, but 8 million is not a big number.

That's 8 million patients.  Multiply that by everything that the VA has 
on each and every one of them and you get a very large data set.

It's not the largest data set that I'm aware of.  The largest is the 
data out of the LHC which is around 200 petabytes.  CERN went the other 
way.  They started with an object databases but eventually dropped it 
due to poor market development OODBMSs.  They currently use relational 
databases for storing and retrieving metadata.  Bulk data is stored in 
flat files.

> Well, "billions" of transactions per day should be doable in a cluster.

That's what Ameritrade and Oracle thought but they couldn't make it work.

> If your oracle database is crashing, it is misconfigured.

The Oracle techs working with Ameritrade couldn't keep the cluster 
going.  They eventually gave up when Ameritrade wouldn't commit to 
replacing the entire cluster with bigger servers.

> Financial
> transactions are a dangerous thing, you really do need ACID for
> fiduciary responsibility.

Cache' delivers full ACID guarantee.  I told you I wasn't talking about 
NoSQL/MongoDB.

> You are avoiding the topic, the "storage system," is separate from the
> implementation of the objects. The objects know how to serialize and
> restore themselves as well as upgrade. The storage and location of
> objects is not involved.

Of course I am.  It's not relevant to the topic, which is the technical 
merits of object vs relational databases.

> That is not a "how," it is a adjective and a plural noun. One does not
> need to use relations in a database, but one has them if they need them.
> An RDBMS is a tool not some kind of mandate.

Then why bother with a relational database at all?  The singular 
strength of a relational database is the relations between data.  If you 
don't use relations then the relational database is the wrong tool for 
the job.

> Yes, ok, that is done with the XML/JSON class description. What's the
> problem?

The problem is that you're stuck with tables.  You don't have an object. 
  You have an object stored in a table.  Even if it is a table with a 
single column and a single row it's still a table.

> If I said the XML was stored in a binary polymorphic object file and it
> could be retrieved by its ID, would that make a difference? Because,
> that is exactly what is happening. For convenience, we call the the
> polymorphic object file a "table."

Sure, that works.  Again, why bother with a relational database if you 
want to short-circuit all of the relational functions?  Which was my 
original point: why bother with inferior tools like relational databases 
when superior tools like object databases are available?

> Sorry, no. It is either a hash table, or they are hiding the index from
> you. Either way, it doesn't matter because databases have hash indexes.

Nope.  Binary trees or multidimensional arrays.  Typically, an object 
database doesn't cache index data which it doesn't have.  It caches objects.

> And if you say that objects don't need that kind of indexing, then you
> miss the real power of database. If you have 8 million objects, say
> patients in a database. How do you find them by social security numbers?
> How about by last name? How about by symptoms?

You walk a balanced b-tree.  The worst case for a binary tree search is 
O(log n).  Then the patient object is loaded into cache and data access 
times drop to O(1).

>> Better performance,
> How? Prove it.

O(log n) typical worst case for object searches vs. O(log n) typical 
best case for relational searches.  In real applications object searches 
are 2-20 times faster than comparable relational searches.

>> greater scalability,
> How? Prove it.

Ameritrade.

>> faster deployment,
> How? Prove it

The VA Hospital's ahead of schedule and under budget deployment.

>> easier
>> maintenance,
> How? Prove it

Admittedly it is company propaganda, but case studies from InterSystems' 
customers show that Cache' is easier and faster than Oracle for 
application development and support.

>> and typically at a lower cost for all of it.
> PostgreSQL is free. It doesn't get much lower in cost.

Hardware, sysadmins, DBAs, application developers, test teams.  All 
these cost money.  If you can deliver an application on leaner hardware 
then you reduce cost.  If you can deliver it in less time then you 
reduce cost.

> No, I needed a DNS system that could replicate, allow user access,
> managed rights and privileges, etc. I could coble something together, or
> use a package that worked out of the package. It was a no brainer.

I implemented something similar at a previous gig using shell scripts. 
It worked perfectly.  It was a no-brainer.  And that's still my own 
confirmational bias speaking.

> I do have some expertise in PostgreSQL, sure, but I always try to find
> the best tool for the task. I have used SQLite and I have done a fair
> amount of storage systems where an RDBMS is not appropriate.

Consider this for your next project: a relational database is never 
appropriate.  Work from that.  I'm certain that you will be surprised, 
in a good way, at what you discover.

-- 
Rich P.