~mark/work/s3e/speakernotes.txt A Storage Engine for Amazon S3 MySQL Convention & Expo 2007 *** http://fallenpegasus.com/code You can go to the presentation link under mysql-awss3 to get a PDF of a handout *** Who is this guy? My name is Mark Atwood, and I'm a geek and a hacker. Up front, not employee Amazon or MySQL. You all are much better at using databases and at building apps and sites than I am. I have to read the manual to compose a basic SQL query. *** Storage Engine This session is on the Storage Engine track. Storage Engines are part of what make MySQL really special. But really, what is a storage engine? *** Protocol Translator It's a protocol translator. Transforms some other data protocol and some other data model into MySQL's presentation of a db, tables, rows, etc I'm here to talk about one particular protocol and model... But first, traditional storage engines *** Traditional Storage Engines use the Local Disk You buy disks, you set them up You choose MyISAM or InnoBase or Falcon or Solid or whatever You fill your disks with your data, and access it with your db servers That's great stuff But What's the storage engine really doing? Think about "protocol translation" It's translating the protocol of filesystem and block i/o system calls, into what MySQL server needs Instead of local OS protocol, how about... *** Network Storage Engines Network protocols! There are dozens, hundreds of network protocols Many of you have probably written tools and apps, that use the mysql client tools, to speak these protocls, and stuff the data into MyISAM or whatever Turn the problem inside out. Instead of copying the network into your database, Make the network look like a database! *** Federated, ODBC, HTTP, MemCasheD, and There are a number of existing ones. Federated There is ODBC... release real soon Brian has a HTTP and MemCacheD MemCacheD is in real use. And there is mine. *** S3 *** What is S3? Amazon Web Services Simple Storage Service Amazon built it for themselves, but now sell access HTTP based protocol Simple data model You have some "buckets", that you can fill with an infinite number of "items" Buckets If a bucket's name looks like a domain name you can access it at that domain name Bucket is full of Items Key looks kind of like a URL path, if you want it to Contents 1 byte, to 5 GiB, of Mime-Type'd data You can have billions of items Hundreds and hundreds of terabytes, petabytes *** Petabytes? Yes SmugMug.com, over 100, probably over 200, growing fast *** This can't be free It would be nice 15c month GiB storage 20c GiB xfer xfer to EC2 is free competition is on the way *** Why do I want to use it? because owning your own disks, sucks whats wrong with buying my own storage? because you have to buy it before you use it out of capital, not as an expense on revenue *** an empty disk costs as much as a full one datacenters are an expensive nightmare they cost capital, and a lot of it space, by the cubic inch power, cooling. there is a reason that they're getting built next to dams monitoring and maintenence RAID only makes hw failure survivable, it doesnt make it affordable if you cant afford to keep your raid fed, you cant play the game disaster recovery what do you do when your datacenter burns down, falls down, blows up, washes into the ocean, or BRSes backups bring you back online, LATER, not immediately and after you've spent all this money you STILL dont really own a durable capital good your disks are going to be painfully small in 5 years, and scrap in 10 your raid enclosurs are going to be EOLed you are going to have to spend all that money AGAIN Are you guys convienced? There is a catch *** Can I move my existing database over? Sadly, no. Not yet. All you EC2 users are a bit disappointed Wait to the end of the talk, or talk to me later. *** Translating S3 to MySQL S3 has a pretty simple data model Buckets, Items, Keys, Contents I make it look like MySQL, but I don't marry the bucket to MySQL This is useful for when other things than the engine want acces to the data An AWS account becomes a CREATE SERVER command An S3 bucket becomes a table An S3 item becomes a row An S3 item key becomes a primary key You could use something other than varchar, but... An S3 item contents becomes a BLOB or a VARCHAR You could use some other type, but... Let's look at some SQL statements *** CREATE SERVER If you have setup Federated, you know this statement Notice that the datawrapper is "AWS", not "MYSQL" username and password, you will get from AWS when you setup your account there *** CREATE TABLE this does not actually create a new bucket, it just makes an association via the local FRM file likewise, DROP TABLE doesnt destroy a bucket or its contents, just removes the local FRM file there are some good libraries & tools for bucket creation similar to federated creation/destruction of buckets in the engine are to be soon implemented look at the charset. s3 keys are utf-8 strings you could leave it out and stick to ASCII, but... look at the connection string bucket is named "DevilDictionary" and it uses the server we created in the last slide "The Devil's Dictionary" was written in the early 1900s, by the bitter humorist Ambrose Bierce. I grabbed a copy from gutenberg project, parsed it, and stuffed it into the bucket. *** SELECT Given that Mr Bierce was a professional Amercian humorist, let's see what he thought of wit. *** Ouch. I wonder if he included himself? *** INSERT I can respond in kind *** DELETE And then take it back. *** So, what this engine useful for? As I was building this thing, part of my motivation was *** If I build it, they will come Like I said, you all are much better at USING databases than I am. But some ideas have occured to me *** Saving EC2 work If you're an EC2 user you can still pack your results or workstate up, and stuff it into S3. Remember, EC2 -> S3 xfer is free - SQL CMS via S3 virtual hosting If you're doing S3 virtual hosting you can do CMS stuff *inside* MySQL - Huge list of persistant primary keys Maybe you're only marginally interested in the item contents, but you want to check for the existance of a key in a persistant distributed ginormous set. Even if you have a hundred billion keys, lookup is constant time. - Big slow blobs, to join against fast local tables Blob storage is hard on your local disks. Make it be Amazon's problem, not yours. - "The Image Server Problem" It's REALLY HARD to build a webserver and a backend database that can scalably handle images. Make it be Amazon's problem, not yours. *** Transactions? No. There is no way to globally lock a table or an item. Someone else can be writing to that bucket or item too. S3 has sync'ed clocks, and last PUT wins. By the way, you need to NTP too, or the auth will fail. Plus S3 is distributed, and makes no temporal guarantees *** "No Temporal Guarantees" What does that mean? S3 is running on many datacenters. Inside the datacenters there are many components. And they are all running asyncroniously. There can be a small delay between doing the PUT, and being able to GET it back. There will be a larger delay for that to propigate to the other datacenters. As they propigate, they can be reordered. Oh, and the datacenter you are talking to can be switched out from under you. Usually it's nearly instant, but S3 doesnt go down if the network is particianed. *** No WHERE clause means "MAKE MONEY FAST" for Amazon I'm sure he's a nice guy, but you need your money more than he needs your money. If you don't put a WHERE clause for the primary key, the table scan will download the entire bucket. This can blow your network, stall the query, and you will pay Amazon their twenty cents a gig. Don't do that. *** The Future Here's what I havn't done yet. *** Code Improvement there are lots of opportunites for improvement in the code *** S3 & HTTP metadata In addition to the the item key and contents, there is some metadata that S3 makes available. mime type, datestamp, md5, length, ACL and 2K of user define metadata, arb name value pairs *** Multiple Data Columns It would be nice to have more than just a blob. Plus http range'd get can pull out the right elements, instead of downloading the entire item. Could just keep it in UNIREG format, but that's a problem if anything else needs to use it. *** Sharing a solution with the HTTP and the MemCacheD engines The HTTP and the MemCacheD engines have the same problem. Brian and I have been scetching out a simplish XML solution, which will have a published schema and an open source library. *** Information Schema information schema is extensible in 5.1 Transfer counters, and the estimated bill. *** Security and Authentication S3 can use TLS HTTPS, and can use X.509 certs instead of the un/pw Also, take a good idea from the openssl tool, and be able to keep credentials in a file reference, envvar, executable callout, or cryptoprocessor. *** Transfer & Storage Compression transfer and storage cost money. gzip can save some of that money the archive storage engine already does it *** NO SLIDE Streaming Blob, like PBXT *** More Storage Engines I liked writing this storage engine. I want to do some more. *** A storage engine for AWS SQS SQS is another amazon service that stores data It's got an interesting data model *** There are more AWS services rolling out *** There are competitors to AWS rolling out *** Binlog distribution via S3 I've had an idea for using S3 to distribute binlogs Server only needs to write once, no matter how many slaves. master/slaves dont have to be quite so syncronized network partitican or outage would be a lot less painful and finally *** and the holy grail for AWS EC2 users... *** "A Generic Schema Storage Engine for Amazon S3" Maybe I can present it next year... *** Back to the present But back to the here and now... *** Where do I get it? It needs 5.1, because it needs the storage engine api. If you absolutely *must* have it in 5.0, talk to me later. It's GPLed. *** You never write, you never call... I want feedback. Words are good, dollars are very good *** Questions?