Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On 08/02/2012 08:44 PM, Richard Pieri wrote: > On 8/2/2012 4:36 PM, Mark Woodward wrote: >> Not to be snide, but 8 million is not a big number. > > That's 8 million patients. Multiply that by everything that the VA > has on each and every one of them and you get a very large data set. > > It's not the largest data set that I'm aware of. The largest is the > data out of the LHC which is around 200 petabytes. CERN went the > other way. They started with an object databases but eventually > dropped it due to poor market development OODBMSs. They currently use > relational databases for storing and retrieving metadata. Bulk data > is stored in flat files. > > >> Well, "billions" of transactions per day should be doable in a cluster. > > That's what Ameritrade and Oracle thought but they couldn't make it work. > >> If your oracle database is crashing, it is misconfigured. > > The Oracle techs working with Ameritrade couldn't keep the cluster > going. They eventually gave up when Ameritrade wouldn't commit to > replacing the entire cluster with bigger servers. Ahhh Bingo! We have the problem. > >> Financial >> transactions are a dangerous thing, you really do need ACID for >> fiduciary responsibility. > > Cache' delivers full ACID guarantee. I told you I wasn't talking > about NoSQL/MongoDB. > Cool. Cache' is an interesting system, and there are pros and cons to using it of course. Suffice to say, if you don't need precisely what it provides, you wouldn't buy it. > >> You are avoiding the topic, the "storage system," is separate from the >> implementation of the objects. The objects know how to serialize and >> restore themselves as well as upgrade. The storage and location of >> objects is not involved. > > Of course I am. It's not relevant to the topic, which is the > technical merits of object vs relational databases. You are arguing semantics. An RDBMS is just a data store as is an object based db. Now, clustering an RDBMS system has the same set of problems as clustering an object store. In the end, they are just storing data. The "scaling" part is, of course, product specific and each system implements clustering differently, but in the end its just data going on to disk in an order that is retrievable. Hopefully with ACID compliance. > > >> That is not a "how," it is a adjective and a plural noun. One does not >> need to use relations in a database, but one has them if they need them. >> An RDBMS is a tool not some kind of mandate. > > Then why bother with a relational database at all? Who's bothering? > The singular strength of a relational database is the relations > between data. This is No-SQL nonsense. The strength of an RDBMS is the man-centuries of work and science embodied in retrieving data. The relational capability is a very powerful tool, sure, but in the end the real science is finding the data you want. > If you don't use relations then the relational database is the wrong > tool for the job. This is absolutely incorrect. Relations are a feature not the sole purpose. Finding specific data in a large data set is no easy task. Take this SQL query: select * from songs where artist = 'various'; A songs table can have many millions of entries, and a good percent of them will have artists as 'various' because the come from collections. Now, a good RDBMS will understand that it is probably faster to ignore the index. Now take this query: select * from songs where artist = 'Joe Kidd' Then the system will find only a few songs and the RDBMS will understand that it should use the index. Note that these are not relational queries. There are no joins, but SQL is used to find the data. Despite what you say, and object store is a data graveyard if it does not support aggregation or location of data. Data has no value if it can not be processed. Chances are the object stores will be dumped to an RDBMS for OLAP. My argument is why bother with the object store? > > >> Yes, ok, that is done with the XML/JSON class description. What's the >> problem? > > The problem is that you're stuck with tables. You don't have an > object. You have an object stored in a table. Even if it is a table > with a single column and a single row it's still a table. > > >> If I said the XML was stored in a binary polymorphic object file and it >> could be retrieved by its ID, would that make a difference? Because, >> that is exactly what is happening. For convenience, we call the the >> polymorphic object file a "table." > > Sure, that works. OK, then you see my point. > Again, why bother with a relational database if you want to > short-circuit all of the relational functions? Because A "Relational Database Management System" is a super set of "Database Management System." You need some sort of DBMS on which to base an object store, and PostgreSQL gives you the "R" for free. > Which was my original point: why bother with inferior tools like > relational databases when superior tools like object databases are > available? Because object stores are not superior, they are, by definition, inferior. A good RDBMS can do the job of an object store easily *AND* do far far more. > > >> Sorry, no. It is either a hash table, or they are hiding the index from >> you. Either way, it doesn't matter because databases have hash indexes. > > Nope. Binary trees or multidimensional arrays. Typically, an object > database doesn't cache index data which it doesn't have. It caches > objects. Btrees are not O(1) they are O(log n). Multidimentional arrays are only O(1) if they are in memory, after that, they hit disk and that is what ever the file system gives you. caching is a different topic all together *every* RDBMS caches both index and data. And, by the way, if your object store has a btree, it has an "index." It may not call it that, but that is what it is. > >> And if you say that objects don't need that kind of indexing, then you >> miss the real power of database. If you have 8 million objects, say >> patients in a database. How do you find them by social security numbers? >> How about by last name? How about by symptoms? > > You walk a balanced b-tree. The worst case for a binary tree search > is O(log n). Then the patient object is loaded into cache and data > access times drop to O(1). Ahh, yes, the magic of caching. *Every* RDBMS has that. > >>> Better performance, >> How? Prove it. > > O(log n) typical worst case for object searches vs. O(log n) typical > best case for relational searches. The O(log n) for object searches is typical for any object not in cache. That statistic I agree with. Worst case, yes, but also I bet, typical for initial find. Getting the same object multiple times out of an object store should be near O(1) as long as it remains in cache. Agreed. " O(log n) typical best case for relational searches" Nice little linguistic game there. This is wrong. The query: select object_data from objects_table where object_id = 'id'; Will have an O(log n) for the initial find, just as an object store. Furthermore, it will have a near O(1) behavior the subsequent finds because RDBMS cache too. > In real applications object searches are 2-20 times faster than > comparable relational searches. Then you are doing it wrong, there is no technical reason to believe this assertion, in fact, many RDBMS have different indexing options to optimize this sort of search. Hash indexes will have near O(1) + I/O for objects out of cache. > >>> greater scalability, >> How? Prove it. > > Ameritrade. I would need a full analysis of what happened there to say whether or not it was a scalability issue and why. Yes, it is well understood that an application designed for a specific narrow task is generally better at that task than one which is designed to be general purpose, however, a general purpose tool is by definition a better tool for most tasks. Scalability is a complex problem, there is no boiler plate solution. > >>> faster deployment, >> How? Prove it > > The VA Hospital's ahead of schedule and under budget deployment. Again, I've had plenty of projects that go faster because of PostgreSQL, but that is anecdotal evidence. What are the reasons why it is "faster deployment?" What features do an object store have that makes it faster? > > >>> easier >>> maintenance, >> How? Prove it > > Admittedly it is company propaganda, but case studies from > InterSystems' customers show that Cache' is easier and faster than > Oracle for application development and support. I've seen lots of company propaganda, I believe none of it. Speaking from experience, almost *anything* is easier to maintain than Oracle. Its a beast. However, that isn't an RDBMS issue, that is a stupid larry ellison product issue. Why is it easier? What features make them easier? > >>> and typically at a lower cost for all of it. >> PostgreSQL is free. It doesn't get much lower in cost. > > Hardware, sysadmins, DBAs, application developers, test teams. All > these cost money. If you can deliver an application on leaner > hardware then you reduce cost. If you can deliver it in less time > then you reduce cost. Again, I don't agree. You need all these things for any system you deploy. Even if I agreed with the leaner system requirements, which I don't, that is a negligible cost differential and one which is made up by reduce learning curve of developers and test teams already knowing SQL and the tools available for SQL. > > >> No, I needed a DNS system that could replicate, allow user access, >> managed rights and privileges, etc. I could coble something together, or >> use a package that worked out of the package. It was a no brainer. > > I implemented something similar at a previous gig using shell scripts. > It worked perfectly. It was a no-brainer. And that's still my own > confirmational bias speaking. > > >> I do have some expertise in PostgreSQL, sure, but I always try to find >> the best tool for the task. I have used SQLite and I have done a fair >> amount of storage systems where an RDBMS is not appropriate. > > Consider this for your next project: a relational database is never > appropriate. Work from that. I'm certain that you will be surprised, > in a good way, at what you discover. Again, you are wrong. Given all the goodness an RDBMS brings you, how could you ever say it is not appropriate? You have yet provide one documented advantage of an object store that is not provided for by an RDBMS, but there are far more advantages to using an RDBMS. You claimed an object store is better, but an RDBMS can do that just as well. You say Ameritrade proves scalability, well 99.999% of the projects people will work on will not need that level of scalability and would be more than handled by an RDBMS running on a few machines. You wouldn't buy an 18 wheeler to commute to work every day. Even so, cache' can support SQL, so, there is no reason not to use an RDBMS.
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |