[Discuss] Rob Conery's critique of MySQL?

Thu Aug 2 22:57:25 EDT 2012

On 08/02/2012 08:44 PM, Richard Pieri wrote:
> On 8/2/2012 4:36 PM, Mark Woodward wrote:
>> Not to be snide, but 8 million is not a big number.
>
> That's 8 million patients.  Multiply that by everything that the VA 
> has on each and every one of them and you get a very large data set.
>
> It's not the largest data set that I'm aware of.  The largest is the 
> data out of the LHC which is around 200 petabytes.  CERN went the 
> other way.  They started with an object databases but eventually 
> dropped it due to poor market development OODBMSs. They currently use 
> relational databases for storing and retrieving metadata.  Bulk data 
> is stored in flat files.
>
>
>> Well, "billions" of transactions per day should be doable in a cluster.
>
> That's what Ameritrade and Oracle thought but they couldn't make it work.
>
>> If your oracle database is crashing, it is misconfigured.
>
> The Oracle techs working with Ameritrade couldn't keep the cluster 
> going.  They eventually gave up when Ameritrade wouldn't commit to 
> replacing the entire cluster with bigger servers.
Ahhh Bingo! We have the problem.
>
>> Financial
>> transactions are a dangerous thing, you really do need ACID for
>> fiduciary responsibility.
>
> Cache' delivers full ACID guarantee.  I told you I wasn't talking 
> about NoSQL/MongoDB.
>
Cool. Cache' is an interesting system, and there are pros and cons to 
using it of course. Suffice to say, if you don't need precisely what it 
provides, you wouldn't buy it.
>
>> You are avoiding the topic, the "storage system," is separate from the
>> implementation of the objects. The objects know how to serialize and
>> restore themselves as well as upgrade. The storage and location of
>> objects is not involved.
>
> Of course I am.  It's not relevant to the topic, which is the 
> technical merits of object vs relational databases.
You are arguing semantics. An RDBMS is just a data store as is an object 
based db. Now, clustering an RDBMS system has the same set of problems 
as clustering an object store. In the end, they are just storing data. 
The "scaling" part is, of course, product specific and each system 
implements clustering differently, but in the end its just data going on 
to disk in an order that is retrievable. Hopefully with ACID compliance.

>
>
>> That is not a "how," it is a adjective and a plural noun. One does not
>> need to use relations in a database, but one has them if they need them.
>> An RDBMS is a tool not some kind of mandate.
>
> Then why bother with a relational database at all?
Who's bothering?
> The singular strength of a relational database is the relations 
> between data.
This is No-SQL nonsense. The strength of an RDBMS is the man-centuries 
of work and science embodied in retrieving data. The relational 
capability is a very powerful tool, sure, but in the end the real 
science is finding the data you want.

> If you don't use relations then the relational database is the wrong 
> tool for the job.
This is absolutely incorrect. Relations are a feature not the sole 
purpose. Finding specific data in a large data set is no easy task. Take 
this SQL query:

select * from songs where artist = 'various';

A songs table can have many millions of entries, and a good percent of 
them will have artists as 'various' because the come from collections. 
Now, a good RDBMS will understand that it is probably faster to ignore 
the index. Now take this query:

select * from songs where artist = 'Joe Kidd'
Then the system will find only a few songs and the RDBMS will understand 
that it should use the index.

Note that these are not relational queries. There are no joins, but SQL 
is used to find the data.

Despite what you say, and object store is a data graveyard if it does 
not support aggregation or location of data. Data has no value if it can 
not be processed. Chances are the object stores will be dumped to an 
RDBMS for OLAP. My argument is why bother with the object store?

>
>
>> Yes, ok, that is done with the XML/JSON class description. What's the
>> problem?
>
> The problem is that you're stuck with tables.  You don't have an 
> object.  You have an object stored in a table.  Even if it is a table 
> with a single column and a single row it's still a table.
>
>
>> If I said the XML was stored in a binary polymorphic object file and it
>> could be retrieved by its ID, would that make a difference? Because,
>> that is exactly what is happening. For convenience, we call the the
>> polymorphic object file a "table."
>
> Sure, that works.
OK, then you see my point.
> Again, why bother with a relational database if you want to 
> short-circuit all of the relational functions?
Because A "Relational Database Management System" is a super set of 
"Database Management System." You need some sort of DBMS on which to 
base an object store, and PostgreSQL gives you the "R" for free.

> Which was my original point: why bother with inferior tools like 
> relational databases when superior tools like object databases are 
> available?
Because object stores are not superior, they are, by definition, 
inferior. A good RDBMS can do the job of an object store easily *AND* do 
far far more.

>
>
>> Sorry, no. It is either a hash table, or they are hiding the index from
>> you. Either way, it doesn't matter because databases have hash indexes.
>
> Nope.  Binary trees or multidimensional arrays.  Typically, an object 
> database doesn't cache index data which it doesn't have. It caches 
> objects.
Btrees are not O(1) they are O(log n). Multidimentional arrays are only 
O(1) if they are in memory, after that, they hit disk and that is what 
ever the file system gives you.

caching is a different topic all together *every* RDBMS caches both 
index and data. And, by the way, if your object store has a btree, it 
has an "index." It may not call it that, but that is what it is.

>
>> And if you say that objects don't need that kind of indexing, then you
>> miss the real power of database. If you have 8 million objects, say
>> patients in a database. How do you find them by social security numbers?
>> How about by last name? How about by symptoms?
>
> You walk a balanced b-tree.  The worst case for a binary tree search 
> is O(log n).  Then the patient object is loaded into cache and data 
> access times drop to O(1).
Ahh, yes, the magic of caching. *Every* RDBMS has that.
>
>>> Better performance,
>> How? Prove it.
>
> O(log n) typical worst case for object searches vs. O(log n) typical 
> best case for relational searches.
The O(log n) for object searches is typical for any object not in cache. 
That statistic I agree with. Worst case, yes, but also I bet, typical 
for initial find. Getting the same object multiple times out of an 
object store should be near O(1) as long as it remains in cache. Agreed.

" O(log n) typical best case for relational searches" Nice little 
linguistic game there. This is wrong.
The query:

select object_data from objects_table where object_id = 'id';

Will have an O(log n) for the initial find, just as an object store. 
Furthermore, it will have a near O(1) behavior the subsequent finds 
because RDBMS cache too.
> In real applications object searches are 2-20 times faster than 
> comparable relational searches.
Then you are doing it wrong, there is no technical reason to believe 
this assertion, in fact, many RDBMS have different indexing options to 
optimize this sort of search. Hash indexes will have near O(1) + I/O for 
objects out of cache.

>
>>> greater scalability,
>> How? Prove it.
>
> Ameritrade.
I would need a full analysis of what happened there to say whether or 
not it was a scalability issue and why. Yes, it is well understood that 
an application designed for a specific narrow task is generally better 
at that task than one which is designed to be general purpose, however, 
a general purpose tool is by definition a better tool for most tasks.

Scalability is a complex problem, there is no boiler plate solution.

>
>>> faster deployment,
>> How? Prove it
>
> The VA Hospital's ahead of schedule and under budget deployment.
Again, I've had plenty of projects that go faster because of PostgreSQL, 
but that is anecdotal evidence. What are the reasons why it is "faster 
deployment?" What features do an object store have that makes it faster?

>
>
>>> easier
>>> maintenance,
>> How? Prove it
>
> Admittedly it is company propaganda, but case studies from 
> InterSystems' customers show that Cache' is easier and faster than 
> Oracle for application development and support.
I've seen lots of company propaganda, I believe none of it. Speaking 
from experience, almost *anything* is easier to maintain than Oracle. 
Its a beast. However, that isn't an RDBMS issue, that is a stupid larry 
ellison product issue.

Why is it easier? What features make them easier?

>
>>> and typically at a lower cost for all of it.
>> PostgreSQL is free. It doesn't get much lower in cost.
>
> Hardware, sysadmins, DBAs, application developers, test teams. All 
> these cost money.  If you can deliver an application on leaner 
> hardware then you reduce cost.  If you can deliver it in less time 
> then you reduce cost.
Again, I don't agree. You need all these things for any system you 
deploy. Even if I agreed with the leaner system requirements, which I 
don't, that is a negligible cost differential and one which is made up 
by reduce learning curve of developers and test teams already knowing 
SQL and the tools available for SQL.
>
>
>> No, I needed a DNS system that could replicate, allow user access,
>> managed rights and privileges, etc. I could coble something together, or
>> use a package that worked out of the package. It was a no brainer.
>
> I implemented something similar at a previous gig using shell scripts. 
> It worked perfectly.  It was a no-brainer.  And that's still my own 
> confirmational bias speaking.
>
>
>> I do have some expertise in PostgreSQL, sure, but I always try to find
>> the best tool for the task. I have used SQLite and I have done a fair
>> amount of storage systems where an RDBMS is not appropriate.
>
> Consider this for your next project: a relational database is never 
> appropriate.  Work from that.  I'm certain that you will be surprised, 
> in a good way, at what you discover.
Again, you are wrong. Given all the goodness an RDBMS brings you, how 
could you ever say it is not appropriate? You have yet provide one 
documented advantage of an object store that is not provided for by an 
RDBMS, but there are far more advantages to using an RDBMS.

You claimed an object store is better, but an RDBMS can do that just as 
well. You say Ameritrade proves scalability, well 99.999% of the 
projects people will work on will not need that level of scalability and 
would be more than handled by an RDBMS running on a few machines. You 
wouldn't buy an 18 wheeler to commute to work every day. Even so, cache' 
can support SQL, so, there is no reason not to use an RDBMS.