As announced in my previous blog post, I’d like to tell the world a bit about the multi-model NoSQL database ArangoDB, and since my profession is to focus on users, I thought that the best way to start would be by talking to users. So I looked around in ArangoDB’s support Google Group and its IRC channel #arangodb for an active user who would like to do an interview with me. Lucky for me, the first user I happened to talk to on IRC was not only a very enthusiastic ArangoDB user, but also an active community contributor to the project (though he is in no way affiliated with ArangoDB GmbH, the company behind the software).
So let me introduce to you J Patrick Davenport, a freelance Solutions Architect form Palatka, Fl, USA, who… nevermind, let’s let him do the talking now 🙂
Thomas: Hi Patrick, to start off, could you tell me a little about yourself (your background, the job in which you’re using ArangoDB, …)?
Patrick: About me: I’m a Solutions Architect, working for my own company, DeusDat Solutions. I’ve been working over 8 years as a software Bob Villa (This Old House, American Reference). My clients call me in to renovate code bases that are near collapse. I’ve improved multiple Fortune 500 company’s core business applications. Recently I designed half to the Medicare Fraud Detection System for the US Government. That was based on Hadoop.
Thomas: Ah, interesting! So, what’s the story of you and Arango?
Patrick: Given the above, it’s implied that I’ve worked in many a corporate office. With the poor chairs and fluorescent lighting comes corporate tool sets. Going off on my own, I decided to learn a whole new toolset. At the same time I decided to write a book on NoSQL and NewSQL. Those two dreams coalesced in me wanting to support ArangoDB. I didn’t like MongoDB. I thought that was too hyped and that they believed their own press. Writing the book gave me a whirlwind tour of the NoSQL world. What I wanted was a document store, with geo-index and a healthy feel of youthful vitality. Turns out there aren’t too many of those around. ArangoDB was the only system that provided all three. RethinkDB was close, but lacked the geo-indexes.
As I learned more about it, I saw that it was wide open for tools and drivers. There was one Clojure driver. That driver was a simple attempt, but violated many Clojure purisms like global state. So I wrote a driver called travesedo. I needed to work with some Hadoop tools to make processing a large CSV data store into documents. I wrote Guacaphant (yes, it’s in Java). I needed to migrate and deploy database changes. Clojure has a community API for that called Ragtime. I wrote Waller just this last week to use with ArangoDB.
What I wanted was a document store, with geo-index and a healthy feel of youthful vitality. Turns out there aren’t too many of those around. ArangoDB was the only system that provided all three.
Thomas: Thanks! I’d like to dig a bit deeper into some points, if that’s okay. So where did you first learn about ArangoDB?
Patrick: I first learned about Arango while researching Document Stores for my book. At the same time, I knew that I wanted to use a document store with geo-indexes for personal projects. I found ArangoDB in an overview site.
Now I’ve got a few projects that I plan to back with ArangoDB. One focuses on enabling cities to better respond to tourists. As a transplanted Floridian, I understand how much of the state’s economy and even my town depends on the flow of tourist dollars. When I look around at the economic development going on, I see that even large cities have yet to app-ify themselves. My goal is to change this. Geo-location is a huge part of that. I need a database that supports it. Another project focuses on home management. Arango’s multi-model structure allows me to naturally relate home processes via a graph structure. While Arango’s graph infrastructure is presently simple, I have faith that the development team will expand it into something more powerful, distributed, that’s able to take on Neo4J.
Thomas: Thanks! You said you were looking for a DB with “a healthy feel of youthful vitality”. What about ArangoDB made you feel that it has more of that than e.g. MongoDB?
Patrick: One thing about MongoDB is it’s slow to change, or to react to valid negative criticism. When mmap’ing issues were pointed out MongoDB’s response was smoke and mirrors. When the global locks slowed things down, they waved their hands. When the benchmark numbers were shown to be inflated due to default drivers not waiting for confirmation of saving, MongoDB folk talked-talked-talked, but said nothing.
ArangoDB is different. They are open about their shortcomings. Mmap is clearly written out as a design constraint. They talk about how datasets have to be mostly in memory for at least the active pages. Questions on the Google Group are answered quickly, by core team members. Same is true to Stackoverflow. They also actively promote competition in the tooling space. When I first brought up the idea of creating a competing driver for Clojure, I was supported. Some made sure I did my homework before doing that, but I had support.
ArangoDB is different. They are open about their shortcomings. Mmap is clearly written out as a design constraint. They talk about how datasets have to be mostly in memory for at least the active pages. Questions on the Google Group are answered quickly, by core team members. Same is true to Stackoverflow.
Thomas: Interesting! Could you tell me why is it important to you that the company behind a database is as open as ArangoDB GmbH?
Patrick: If I’m going to build a product on something, I want a relationship with the makers and users of that something. In every relationship I want honesty. Openness by a product maker is as close to honesty as I can get. I want a team that clearly says what they’re going to do, keeps the community apprised of their efforts (including failures) and requests dialogue. I believe that if a company does this, it can stand the test of time.
ArangoDB is not backed by Oracle. It’s not backed by IBM. I need to have faith in their ability to be there tomorrow with a good product. Openness is a sign that I’ve picked the right horse.
Thomas: Thanks! So I take it from your previous answer that from your experience, ArangoDB does better in that regard than most of their competition?
Patrick: Yep. I haven’t seen that type of life in any of the established NoSQLs. Cassandra feel tired. Voldemort feels vanquished. MongoDB seems stuck (I grant that they did release an improved 2ndary engine about 3 months ago). Rethink doesn’t have Geo-indexes.
Thomas: Ok. So you said you’re using or are going to use document stores and graphs in your projects. Have you used key-value stores in ArangoDB as well, or are you going to use them in one of the projects you’re planning?
Patrick: I don’t presently have a need for the straight key-value stuff (slight tangent, arango-session for Ring does this against a simple collection). I find structured data a better fit for my modeling.
Thomas: Ah, okay. So what about other features of ArangoDB: Are you using joins? Transactions?
Patrick: Interestingly graphs seem to limit my need for joins. They are implicit joins. I want every product that I own. While I could model that as a Product with an attribute of “owner”, I can share the product definition in a products collection with everyone via the relationship. They are also implicitly transactional. I can’t modify relationships and get into an unstable state. Transactions aren’t a huge sell for me right now, but since I do contracting work, transactions will be a huge sell to future clients. ArangoDB is the only Document Store that I’ve seen that supports them.
Thomas: Could you expand on that a bit on which future clients you expect transactions will be a huge sell for?
Patrick: My goal is to get ArangoDB in the enterprise. I don’t know how the market looks in the EU right now, but NoSQL in the Fortune 1000 is pretty small outside of some niche uses like logging. Document stores provide an easy migration path into NoSQL, especially if you’re in a dynamic, weakly typed language like Clojure. Everything in JSON == map. Given this, I think the compelling arguments for Clojure development speed ups, Arango’s natural modeling (in general and in Clojure) and finally the transaction support should make many pointy haired bosses feel safe and their developers happy ‘cause there is a new toy in town. My understanding is that operations against an individual document are transactional. Since I will probably be doing that (set count to (dec count)), I won’t have bulk modifications.
I think the compelling arguments for Clojure development speed ups, Arango’s natural modeling (in general and in Clojure) and finally the transaction support should make many pointy haired bosses feel safe and their developers happy ‘cause there is a new toy in town.
Thomas: Have you used ArangoDB with other languages than Clojure (or are you planning to)?
Patrick: I’ve used it with Java for guacaphant, but that’s it. My useage there was 1) incredibly short (to just get a cursor to an AQL query) and 2) incredibly no Java ideomatic. I didn’t use data classes like Person.java. I asked Arango to give me the cursor items as maps. So it’s very Clojure/Functional.
Thomas: Have you used Foxx yet? If so: What for? If not: Do you see a case where you might use it in the future?
Patrick: I haven’t used Foxx. I’m a bit torn by its existence. I understand that it could be used as rapid prototyping API platform, but it seems to have scalability issues. Now I have to have my DB server and Application running on the same box? I don’t like it too much. Added to this is the fact that Clojure makes rapid prototyping easy too. All the benefits of the JVM without the deployment hassle.
Thomas: Is there a way in which Foxx could be improved so that it would be beneficial to you?
Patrick: I don’t think so. Perhaps someone using Foxx could come out and say, “We did X with Foxx”. That might start my creative juices flowing. Until then, I don’t see the need for it other than fronting Arango with a really, really thin veneer for CRUD APIs.
But, I like that Arango is trying something. If people take it up, great. I’m terrible at picking trends. If the community doesn’t really use it, as long as Arango pulls back, great. Experimentation is how we learn.
Thomas: Okay, thank you for that open and honest answer!
Patrick: You’re welcome. See, we’re partnering on this interview.
Thomas: Indeed! So now I’d like to learn a bit about your experience with learning and using ArangoDB. Was it easy to set up? Did you find AQL easy to learn? Were your expectations all fulfilled so far, or were some of them disappointed?
Patrick: Getting going with Arango was pretty easy. The documentation is well formatted, and great for starting. That said, I’ve found it hard to learn about advanced concepts like advanced AQL. I ask a lot of questions on the Google Groups about idiomatic AQL. What I’d like is a more in-depth write up for CRUD web apps. Especially around the idea of using AQL for modification. Another point of weakness is that the documents don’t discuss how to deploy in a clustered environment. I know that there is a simple walk through on that, but it feels more like a toy networking project than a how to. I will guard that sentiment by saying I haven’t looked at the Puppet Scripts. They might show a better way.
As a driver implementor this is a huge gap in documentation. For example, there is nothing that says a call for the next cursor batch must be directed to the same node the last batch came from. This is important for the driver. Only my driver even attempts to support clustering and replicated instances. But it is really weak on preserving the batch call requirements (i.e. it doesn’t do that yet). I found out about this by reasoning about how I would implement a distributed system and then questioning the group.
Getting going with Arango was pretty easy. The documentation is well formatted, and great for starting.
Thomas: Have you read the articles about setting up ArangoDB on Google Compute Engine or Digital Ocean (both released this month) yet? If so: Did you find those useful? (they just came to my mind because they’re both about clustered environments)
Patrick: I haven’t.
Thomas: Just for reference: https://www.arangodb.com/2015/04/gce-cluster/ and https://www.arangodb.com/2015/04/digital-ocean-cluster/
Thomas: You already mentioned the immediate support via the Google Group and Stackoverflow as a plus for ArangoDB. Does that way of getting support work well for you in general, or would you prefer other means of communication to get support?
Patrick: It works well. I’ve tried IRC. Unfortunately there is a time gap. I’m several hours behind the core team, and probably most ArangoDB users. The Group and Stackoverflow make the gap feel less real.
Thomas: Do you see the fact that ArangoDB GmbH currently only has an office in Germany as a disadvantage for the US market?
Patrick: No. It seems like the disadvantage is that they aren’t covered much by American press like Techcrunch. MongoDB was a darling for a while with them.
Thomas: How was your experience writing drivers for ArangoDB (apart from the clustering problem you already mentioned)? Was it easy to get started with it? Was the effort you needed acceptable?
Patrick: The documentation for the HTTP API is pretty good. They show possible inputs and the expected output. Wrapping that in Clojure was easy. I’ve been implementing the features that I need first. So I focused on DB/Collection/Document creation. Since I’m starting to need graphs, I’m working on that now. When I found issues with the documentation not working or being vague, the ArangoDB team corrected it within an hour of me posting on the group (during one of those happy moments when I’m working early, and they are working late).
Thomas: Sounds great! Okay, so the last question I have on my list: Apart from improving the documentation on clustering, what would you like to see the ArangoDB development focus on in the near future?
Patrick: I would like to see their focus split into two performance enhancing tasks. A) distributed graphs. I know that they’re on the road map. I really want to see them. It would be fun to have another open source, heavy duty graph storage to compete with Neo4J. Neo4J doesn’t even really do distributed searches. B) I want to have array and sub documents indexible. I don’t think Arango does this presently. This could make services like ratings based on locality quicker without the use of joins (but hey, Arango has them).
When I found issues with the documentation not working or being vague, the ArangoDB team corrected it within an hour of me posting on the group (during one of those happy moments when I’m working early, and they are working late).
Thomas: Okay, thank you a lot for your input! It was really interesting. Is there anything you feel hasn’t been touched yet, or anything you’d like to add?
Patrick: Nope. Thanks for the opportunity.
The interview was slightly edited in a few occasions for the sake of improved readability, while carefully keeping the content and tone of the original statements intact. Some comments that were unrelated to ArangoDB were left out.
If you have any experience with ArangoDB so far, does it match Patrick’s experience, or was it different in some way? Is there anything you’d like to know from Patrick which was not covered in this interview?
Let us know in the comments, and maybe Patrick will answer to you if he reads it (or I’ll point it out to him).
Full disclosure: I was sponsored by ArangoDB GmbH to write some blog posts to promote ArangoDB. ArangoDB GmbH does not exert any influence on the content of those posts, however, and they were not formally approved by them. Therefore, the content of this post is 100% J. Patrick Davenport’s and mine. EDIT: Just to clarify: I was sponsored for conducting the interview. Patrick did not receive compensation for the interview, in order to allow him state his honest opinion without fearing negative financial consequences.
This post is licensed CC-BY 4.0