At Havana summit they were giving away a paper version of Joe Arnold's "Software Defined Storage with OpenStack Swift". Very useful book for anyone dealing with Swift, I would be glad to pay the cover price of $25. But even more interestingly than tips on care and feeding of Swift, Joe opens the whole book thus:
[...] a de-coupled management system so customers could achieve (1) amazing flexibility in terms of how (and where) they deployed their storage, (2) control of their data without being locked-in to a vendor and (3) private storage at public cloud prices.
These features are the essence of Software Defined Storage (SDS), a new term the meaning of which is being defined. [...] Key aspects of SDS are scalability, adaptability, and the ability to use most any hardware. Through this de-coupling, operators can now make choices on how their storage is scaled and managed and how users can store and access data — all driven programmatically for the entire storage tier, regardless of where the storage resources are deployed.
Parts of the above prompt questions. Firstly, what good is de-coupling in respect to lock-in? SwiftStack effectively locks in by owning the de-coupled management. Sure, you own your data and could, in theory, manage your Swift with another management plane... I do not expect anyone crazy enough to try switching by anything less than standing up a new cluster. In any case, that part is not important, IMHO. The important part is programmatic control.
The phrase "SDS" jumps off "Software-Defined Networking". When SDN came into OpenStack, I was quite skeptical about it. It seemed too much like vendor-driven marketing bullshit. However, as users deployed the Project Formerly Known as OpenStack Quantum, it became clear that SDN answers their needs. The chief need was the ability to shape networks programmatically, overlaid on top of the physical networking plant, in service of the VMs.
When all this cloud thing came about, practitioners also struggled with the definition of it, and in particular the difference from the plain old datacenter virtualization. The difference is the programmatic control throughout. RHEV (now oVirt) eventually grew an API, which blurred the lines. But in OpenStack it was the main feature from the start. So you can manage everything and anything programmatically, including, for example, running on bare hardware. One can say that cloud is "Software-Defined Computing".
So, how does this programmatic thing apply to Swift? Joe had interesting insights cunningly hidden in the book, like these:
In an SDS system, reliability is the responsibility of the software, not the hardware. Replication and data integrity tactics are used to ensure that data does not become corrupt and that lost data is recovered.
A crucial function of an SDS system is to orchestrate capacity — storage, networking, routing & services — for entire cluster.
Swift covers the first part well already. The second is missing, or "de-coupled".
For galactic fairness, he also wrote things that seem wrong-headed to me:
There is no application sharding or managing volumes which can drive operational knowledge and complexity into applications because the SDS system is one cohesive system. Users do not need to ask for or know 'which storage pool' should be used because there is only one namespace.
The problem with hiding the pools outside of namespace is that they become invisible to the programmatic control as well, and such control is essential to the very definition of SDS. Someone at Amazon made a brilliant decision to make buckets a unit of replication in S3, so they can be linked to a region. In effect this hides the complexity but exposes knowledge that an application needs. Thus, any S3 client can do what Joe coniders SDS, but without any de-coupling, through the namespace and inside the API (or it can chose not to do it and just use a default region, for simplicity).
Joe's employees are hard at work implementing the vision as he outlined it, using the concept of regions that are internal to Swift cluster. The problem for everyone else, however, is how the programmatic control of that stack is exclusive to SwiftStack (with some useful things leaking into Swift, such as changeable replica count).
So, in the end, today Swift offers a solid foundation and parts of an SDS system, but the orchestration is "de-coupled" away elsewhere. Seems like a clear challenge to OpenStack to (re-)create the missing pieces.
P.S. I'd love to see the missing parts inside the Swift API and even namespace, although we have a problem here. Our Accounts and Containers are not guaranteed to live anywhere specifically or even on the same nodes. Changing that would be a step that I prefer. But Joe prefers to give up on plugging programmatic orchestration into the Swift API and just "de-couple" the heck of it. John, our benevolent PTL, seems to toe that line. Maybe they are right.
P.P.S. The deal with the programmatic orchestration is something that "unified" storage projects have to address too. E.g. in GlusterFS a program can issue mkdir(2). Is this programmatic control? No, not enough. Okay, they have glusterfsd nowadays, I can create volfiles in there, is that SDS? That is getting closer!