I had an email after my webinar on NoSQL/SQL for Oracle Development Tools User Group last week (http://www.odtug.com/apex/f?p=500:1:0
) from an attendee that was chock full of some questions. I decided to answer them to clarify with this fellow NoSQL and Cloud Systems. I'm pretty happy with my answers. I'd be glad for any thoughts from people about my replies.
Here are my responses (the fellow's name is also Patrick):
You are welcome! Thank you for attending. I put that together a bit hasty but thought it was a good topic to be covering as there are so many organizations that are considering such an architecture.
Patrick Francois wrote:
> Thank you for the NoSQL Webinar!
> Not an easy theme. ..kind of "wide open".
> I have recently also tried getting more info on NoSQL and related systems and performance issues. ...and get confused, especially also with the "cloud systems".
Yeah, a lot of buzzword-BS and reinventing the wheel going on. They key to remember is that SQL is not going to be replace SQL any time soon, but the combination of SQL and NoSQL can have its advantages.
> I checked for example this document regarding benchmarking "cloud systems":
> (maybe you know that document as well)
> -> "BenchmarkingCloudServingSystemswithYCSB"
> (YCSB -> Yahoo!CloudServingBenchmark)
> -> straight link: http://www.brianfrankcooper.net/pubs/ycsb.pdf
> There they speak about : explosion of new systems for datastorage and management "in the cloud"
> They mention there all these different NoSQL storage systems,
> and say for example also: Some systems are offered only as cloud services, either directly in the case of Amazon SimpleDB
Yeah, SimpleDB being Amazon's S3, a proprietary system. You can download it and run it on your own cluster or private cloud.
> That's why I'm also a bit confused about "cloud systems" and "NoSQL".
Cloud systems - usually virtual machines that can be easy spun up for elasticity - I want to add more servers to my virtual network on the fly. This can be via either your own VMWare Labmanager setup or something like EC2. There are also real hardware clouds using services such as what Rackspace offers.
By "cloud systems" and "NoSQL", it just means running these databases in an environment such as EC2, VMware, or Rackspace, etc... For instance, you can have relational databases in the cloud such as with MySQL, Drizzle, or etc...
You can also run them NoSQL databases on your own systems. NoSQL and Cloud systems are not mutually exclusive of each other.
> Basically if you speak about cloud systems, you also speak about scaling out same as you are speaking about "scaling out" when speaking about NoSQL.
Scaling out can be in the cloud as well as outside the cloud. It just means running applications and databases over a several systems versus using an every increasing big huge server and growing that server to scale as was done in the Good Old Days (TM).
> So, does it imply that NoSQL systems are cloud systems?
No, they can be, but are not. NoSQL means a database system that doesn't use SQL to access data, non-relational
> How would you see the relation "NoSQL" / "Cloud systems".
They complement each other. NoSQL works well in a cloud paradigm.
> Even "cloud system" meaning as such is rather unclear.
> There was the question about "Cassandra server farm":
> If I create an own cassandra server farm, can I then still speak about a "cloud system", or can I speak about a "cloud system" only when underlying servers are also geographically distrubuted?
Geographical distribution isn't the determinant in the definition of cloud system. I could have a cloud system down in my cellar if I wanted to if I have multiple real or virtual servers.
> There is also the "security" issue when it comes to "cloud systems" and some say security is bigger because data is distributed. I understand that in that way, you cannot really get hand (or get hacked) on all the data at once, eventually just a part of the data.
> But if I think on an own Cassandra farm, where eventually all machines are in the same machine room and network, I can imagine if you can get into one machine, you can get into all of them.
This issue requires you to do some homework about how to secure your servers. Set up a good image with everything locked down and use something like puppet to have it come up with all the goodies you set up in that image. Define a list of must-haves before that box (virtual or real) is on the network.
> From the YCSB-document "some systems are offered only as cloud services",
> I understand that for using Amazon's SimpleDB, you could not create your own server farm, rather need to buy that service from Amazon?
Yeah, you're not going to run your own S3. You can run other NoSQL or distrubuted file systems - pick your choice with a Google Search.
> That is causing me confusion between "NoSQL" and "Cloud systems".
NoSQL - as I defined in the presentation - schema less, often non-relational, doesn't use SQL as the Linga Franca of data access.
Cloud systems - multiple boxes, real or virtual, elastic, as I mention above.
Two different but mutually complementary concepts.
> Memcached and Membased sounded very interesting. I will check more on those.
Please do. And do join the mailing lists. I see Matt Ingenthron and Perry Krug answering emails every day!
> Can you eventually also provide the slides you used?
Certainly - let me make sure they care clean and I'll send them to you!