Hi There,
Today one of my partners at the company I’m working for asked me the following: “How to migrate an entire application with its data from a relational database (let’s say, at least MySQL with InnoDB) to a non-relational one, such as CouchDB? It is an interesting question, especially for a database-oriented person like me. In this post, I’ll try to answer you all.
What is CouchDB?
First, let me explain what is CouchDB: It is a non-relational, partially structured document-oriented database, maintained by the Apache Foundation. It works via HTTP protocol, and it’s quite easy to perform replication tasks and all sorts of magic.
I want to make it clear: it is not relational, it is not an object-oriented database. It’s all about structured documents.
Sooo… how to create applications using it? There is no structure!
This affirmation is wrong from the beginning. Data must need a structure, but it does not imply that it has to be saved in schemas, tables and so on; LDAP does not have a table structure but a tree structure. Even some games like Need for Speed, Quake and Fifa 2010 need their data organized in structures.
If you think of it, you will learn that the way you store your data really matters: a relational database is not efficient to store maps for Need for Speed, a file is certainly the worst way to perform operations concurrently. No-SQL databases are far more efficient than SQL databases in retrieving data, but you will probably finish with a poor design if you have a software architecture that needs 500 document types. (And, no. Once upon a time, a teacher at my university said that nothing with more than 30 tables is useful; the reality is that when you have many applications that are related to others, you can finish up with a huge database containing more than 300 tables. Have you ever seen the database design behind OpenBravo?).
(Update: here you can find a better introduction to what can be achieved with NoSQL. It solved many of my doubts about it.)
That said, I’ll explain the traditional process to create software, in short terms:
- Understand the problem.
- Design a solution.
- Write code.
- Deploy.
- Get the money.
It does not change. You still have to design a solution, but now you do not have a relational database structure. Instead, you have to define structures (and document them properly) to represent your data.
And I must point it here: CouchDB is still a database. It is still required to have CRUD (Create, Retrieve, Update and Delete) methods for your data. Instead of creating complex SELECT, INSERT, UPDATE and DELETE statements, however, we serialize/unserialize data between the application and CouchDB.
What do I see here? On one hand, you don’t have to worry about inserting foreign keys correctly, instead of saying (1 => ‘Male’, 2 => ‘Female’), you can place the word directly to represent it. On the other hand, if your application has inconsistencies (‘MALE’ and ‘Male’ are not the same thing), you have to be careful with data consistency, something that you did never care of while developing applications.
As a result of this, if you need to migrate your data from a relational database (Let’s say, you had Oracle) to CouchDB, you’ll be in one of these situations:
- I have different models in the database: Remember that you won’t have a relational database. You will have to identify a document in CouchDB by its type (Person, Post, Project, Task, et al).
- I have some significant tables that must be in the new “model”: Migrate these fields from the relational database and create an unique identifier for each document on CouchDB (a name or slug may be enough). For “Person”, it’s ideal to provide an e-mail or a nickname.
- I have a one to one relationship in my relational database: Forget the rule about “the record, only the record and nothing else but the record“. Perform a SELECT * FROM table1, table2 WHERE table1.id = table2.id; on your relational database. This gives you an idea of what you want to send to the CouchDB resulting database.
- I have a one to many relationship in my relational database: Are you sure that that table is needed? Most of these tables are defined to hold parameters in the database (gender, city, marital_status et al). If that is the case, there is a human-readable identifier such as “Male“, “London“, “Single“, that needs to go in the SELECT. Otherwise, you can store collections in the database, such as “comments”: [ "commentx", "commenty", "commentz", "commentw"]. If it is not enough (by example, “project” to “task“, the project slug must be stored in the task. This will still represent a “relation”. Cough, I said that there are no relations in the database… The truth is that this relation only exists in your mind (that is what we database developers use to call a “weak relation”).
- I have a many to many relationship in my relational database: You can rethink it as an one to many relationship and model it on the new document type as described before if it’s not significant. Otherwise, you will need to store a document with that relationship.
Once you get used to it, you’ll learn that there are lots of paradigms, but these are not the important thing when speaking about data. The real son of Chuck Norris understands that being successful in computer software development is to handle data structures rather than stick on these paradigms.
Hopefully it will help you all. And, thanks to Álvaro Véliz for introducing me to CouchDB. It seems cool.
Regards!