UnQLite - An Embeddable NoSQL Database Engine

benjamincburns · on May 22, 2013

As a hacker/tinkerer type programmer I can understand why it might be fun to build something like this. All of the data structure fun with none of the distributed difficulty! NoSQL! I can even build in that cool new language I've been designing in my head! But like everyone else I'm straining to see a use case. "Serverless" NoSQL? If you've got something small enough that you want an embedded datastore, why use a NoSQL variant?

I understand that NoSQL literally means "no SQL," but typically it's a buzzword reserved for unconventional datastores used under heavy distribution.

Maybe you'd want a local embedded "NoSQL" to act as a local datacache? But then for that use case why create an impedance mismatch with the central datastore by rolling your own?

Jx9? So wait, I need another language to interact with the thing that's already embedded directly within my program?

Local offline data analysis? Now that makes sense. Does this have intelligent/optimized paging mechanisms? I have a feeling not, since the author seems to be couching this as an alternative to sqlite for people who like NoSQL.

I'd sincerely love to hear more about what's motivating this project. Was it just for yucks, or is there some problem that the author(s) needed to solve that wasn't well-solved by another tool?

continuations · on May 22, 2013

> I'm straining to see a use case. "Serverless" NoSQL? If you've got something small enough that you want an embedded datastore, why use a NoSQL variant?

Serverless NoSQL embedded datastore aren't really that unusual. BerkeleyDB, Tokyo Cabinet, LevelDB are all commonly used serverless NoSQL.

benjamincburns · on May 23, 2013

I think what's happening here is that NoSQL is just an anti-category.

When I think of NoSQL, I think of something like MongoDB or Cassandra which are designed to handle tons of data under heavy load. When I think of BerkeleyDB and friends, I think local persistence. It's a true statement that these too are NoSQL stores, but, at least for me, they're not of the variety that first comes to mind.

arkitaip · on May 22, 2013

The problem isn't the author's lust for tinkering and improving his dev chops. The problem is with HN, how it's designed to drive fads and constantly disturb our focus, and immature developers who change tech stacks as if it was underwear.

benjamincburns · on May 22, 2013

The problem? There's no problem here. I find the OP very interesting.

Whether or not it's useful to myself or this community, it's something which someone very intelligent spent a significant amount of time to build. I'd argue that its usefulness is unrelated to how interesting it is. It genuinely makes me want to know more about the author(s) and what their motivation was.

ricardobeat · on May 22, 2013

"serverless" databases are usually called storage engines: LevelDB, BerkeleyDB, BitCask, InnoDB. There is a (good) trend towards modular software, separating the storage engine from the server is a natural consequence.

But yeah, requiring a new obscure language pretty much obliterates any reason you'd want to use this. Lua is fast, popular, made for this exact use case and already used by Redis.

benjamincburns · on May 23, 2013

I probably came off too harsh. I'm quite familiar with the idea of embedded datastores, SQL and otherwise, and I didn't mean to attack the concept. I was more trying to illustrate (albeit weakly) that the marketing seems a bit off.

Like I said elsewhere, to me NoSQL tends to mean "big, distributed storage thing that isn't RDBMS, usually with an emphasis on horizontal scalability." Others have said correctly that some also take it first to mean schemaless and relax the "big" requirement. However I'm still confused on what value a general-purpose "schemaless" datastore provides in an embedded context. I write schemaless in quotes there because there's still an impedance mismatch between the format in which your application works with and the format in which the datastore works.

All of these rely on some general-purpose abstraction. But your application, if designed properly, probably makes use of a number of varying data structures. What are you gaining by pushing all of these through some abstraction? Assuming that all you're after is simple persistence, I have to ask - is writing data to disk really that hard?

That said, I've used local persistence libraries plenty, usually as a persistent local cache, and/or as a tool to enforce a schema which allows for data migration between constantly updating versions of software.

ctide · on May 23, 2013

When I was at Singly working on the locker project, this was actually the absolute perfect solution to how we wanted to store data (it just wasn't available then.) We wanted to store all the raw JSON that was coming down the pipe from various sources (Facebook, Twitter, etc.) and wanted it all to be queryable. Also, since each user's data was completely isolated, having a tiny embeddable solution made a lot more sense vs. firing up a MongoDB or something similar for every user.

wmf · on May 22, 2013

There are two different camps within NoSQL; some people are webscale and some people want NoSchema. This looks like it's aimed at the latter category.

tracker1 · on May 22, 2013

I'd have to agree with you here... I mean, there are lots of interfaces for reading/writing json, and simply outputting a JSON as a UTF-8 .json.gz file seems more appropriate... especially if your data can fit into memory.

I would be more inclined to simply have my object structure in memory, using a more convenient high-level language, and do a load/dump from .json.gz files as needed. It's fast enough, and worst case scenario, I can backup/extract and read/write in any number of programming languages.

If you need multiple records, that may be easier, having an index, and using line delimited flat json structures can work as well... (breaking on \n)

This really seems like an also ran, where you could have abstracted out an SQLite db with a single table of (key VARCHAR(100) PRIMARY KEY, value VARCHAR(4000)) or something similar.

shuzchen · on May 22, 2013

Following the thread of building upon sqlite, Yelp (along with the fabulous mrjob library) also open-sourced sqlite3dbm (http://pythonhosted.org/sqlite3dbm/), which is a document store on top of sqlite. Python only (meant to give random-access data to your EMR jobs running with mrjob), but the same approach could be applied to any other language.

ominous_prime · on May 22, 2013

The licensing is strange as well. The BSD license is listed as a feature, but the embedded scripting language (Jx9) is SPL (their own GPL-alike/Sleepycat license).

Since the Jx9 language is fully embedded (even stated as " All C source code for UnQLite and Jx9 are combined into a single source file."), they seem to contradicting their own licensing. The resulting library would also retain the copyleft SPL license.

The base code may be BSD, but incorporating the whole database into your code infects it with their SPL too.

symisc_devel · on May 22, 2013

From UnQLite core developer: No, Jx9 (the standalone library) is under SPL, while UnQLite is 2-clause BSD (including the Jx9 core), so there is no worry about that since the two software are developed by the same company. In other words, use UnQLite without restrictions

VMG · on May 22, 2013

Why isn't the source tree for Jx9 publicly available?

Quote http://jx9.symisc.net/downloads.html

> Request access to the Jx9 source tree.

jessaustin · on May 22, 2013

Care to respond to this complaint?

https://news.ycombinator.com/item?id=5751928

symisc_devel · on May 22, 2013

Yep, sorry for the delay, We're quite busy.

I think this post will clarify the license issues http://unqlite.org/forum/note-on-the-licensing-situation

willvarfar · on May 22, 2013

This turned up recently on proggit too.

http://www.reddit.com/r/programming/comments/1etfxi/sqlite_n... was heavily downvoted for saying its an order of magnitude faster than SQLite

http://www.reddit.com/r/programming/comments/1etkix/unqlite_... raises concerns about the license, and there's a Global Lock in there too? The scripting performance gives cause for pause too.

cpleppert · on May 22, 2013

"It furthermore states that commercial licence must be acquired for closed source applications."

I love the description of the scripting language as "Turing complete" based on "JSON". That is the first time I have ever heard turing complete used to market a programming language!

The scripting language is incredible, I have never seen a more verbose way to program. It is a scripting language, yet seems to combine JSON, PHP and C standard library functions with C type casting. Look at the samples: http://unqlite.org/jx9_samples.html

I am speechless, the description of the project is the most vacuous collection of buzzwords imaginable. "Built with a powerful disk storage engine which support O(1) lookup." "Support Terabyte sized databases." Of course with a global lock and after taking a look at the scripting language I have reason to doubt the claims made

JulianMorrison · on May 22, 2013

If you lived through the XML boom, you'll probably recall the "Turing complete! based on XML!" languages (XSLT, Ant, and friends). This was before people re-re-learned the essential lesson "languages based on markup are painful as languages and unsafe as markup".

willvarfar · on May 22, 2013

(For a language being compared to SQL, saying its Turing Complete may be applicable.

Or maybe not: http://stackoverflow.com/a/7580013/15721 ;) )

daviddoran · on May 22, 2013

JX9 was what got me too. "Based on JSON..." didn't prepare me for the code samples. I can't think of a reason to invent a new language (that has to be learnt) as opposed to JavaScript or Lua.

laveur · on May 22, 2013

It clearly states on the license page that its two clause BSD. Its only custom extensions and custom dev work done by the architect that requires payment. Perfectly fair and still open source.

delinka · on May 22, 2013

Isn't BerkeleyDB the quintessential "embedded NoSQL" database engine? I'm an RDBMS+SQL kind of guy, so can someone enlighten me about the differences between bdb and UnQLite?

boomzilla · on May 22, 2013

Yeah, another alternative with friendlier license is leveldb

https://code.google.com/p/leveldb/

It's also well engineered (written by some of the best Google engineers) and well supported.

willvarfar · on May 22, 2013

In fact, the same engineer who previously wrote big table iirc

eliben · on May 22, 2013

Is this a buzzword-laden way to describe a serializable hash table? [I'm asking seriously]

angersock · on May 22, 2013

You can also script it!

eliben · on May 22, 2013

Why would I want to script a hash table from a programming language?

gizzlon · on May 22, 2013

If you actually need something like this, there's Kyoto Cabinet, BerkleyDB and others that are well known and well tested.

Why use something new with no apparent benefits? (and a lot of drawbacks)

http://en.wikipedia.org/wiki/Berkeley_DB

http://en.wikipedia.org/wiki/Kyoto_Cabinet

Edit: For extra points, read this: http://www.aosabook.org/en/bdb.html

josephg · on May 22, 2013

BDB is really slow, and KyotoCabinet is GPL. For some better alternatives, check out LevelDB: https://code.google.com/p/leveldb/ and LMDB: http://symas.com/mdb/

There are some decent benchmarks put out by the mdb guys here: http://symas.com/mdb/microbench/ .

ominous_prime · on May 22, 2013

The site seems like a computer generated mashup of keywords and phrases. It's a document store database similar to MongoDB, Redis, CouchDB etc and and key-value store similar to BerkeleyDB, LevelDB, etc.?

> UnQLite is 100% hand-coded, written in ANSI C, Thread-safe, _Full reentrant_ ...

It also uses its own scripting language

> [jx9] uses a clean and familiar syntax similar to C, JavaScript and JSON

The copy on this site leaves me puzzled.

DoubleMalt · on May 22, 2013

Not that I am envious for the karma, but I wonder why this was not merged into my exactly identical submission 1 hour earlier (https://news.ycombinator.com/item?id=5749969) -- afaict there was no ?repost or similar stunt ...

benjamincburns · on May 22, 2013

Being that HN was designed to handle a metric shit ton of traffic on very little hardware, I'd imagine that it has adopted a weak consistency model for these types of things.

prodigal_erik · on May 23, 2013

From the markup, that submission has a trailing slash while this one does not. This is one of few ways two URLs can differ but cannot ever point at separate resources, so more clever URL handling would have noticed it's a dup.

louischatriot · on May 22, 2013

I am working on a similar project: a persistent database in Node for Node projects, with no external dependencies. That means you can use it with a simple require(), no external software needed.

This is useful for small projects that don't need the power of a behemoth like MongoDB and want to be installable by a simple git-clone + npm install

I am very interested in feedback on it! https://github.com/louischatriot/nedb

vinkelhake · on May 22, 2013

What an unfortunate name. I spent some confused minutes thinking that the author of SQLite had finally done a complete implementation of [1], which he has been involved in.

[1] http://unqlspec.org

ccera · on May 22, 2013

Yes, so did I, especially given that the author of SQLite said he was working on a new database with exactly this name, UnQLite.

This UnQLite developer is a first-class tool.

mwcampbell · on May 22, 2013

I imagine that Dr. Hipp won't be happy with the choice of project name when he finds out, if he hasn't already. When I first saw the name "UnQLite", I figured this was his project, since I knew that he was working on something called UnQL.

Aykroyd · on May 22, 2013

If I wasn't going to get the benefit of my data automatically being distributed across servers, why would I use this? In many cases, NoSQL solutions seem to be a compromise that you make, giving up ACID (yes I know this has it) and other nice query features of databases and in return gaining the ability to scale to much larger amounts of data and having redundancy without needing to think about it.

In this case though, it's all embedded so I don't gain any of the benefits.

pyre · on May 22, 2013

The point being that if you need a local data store, sometimes you would rather have a document-based one where you don't have to create the code to serialize your objects into (e.g.) a SQLite database, and to reconstitute them back.

A good example would be something like an address book data store. Especially if you want it to be dynamic so that the user can add/remove fields (e.g. allowing the user to attach as many phone numbers as they like to the contact, rather than just a static 5 numbers). If you have to implement this in SQLite, then you have to develop the schema for it, and the (de)serialization code. With a document-store, you can just do something like say "store these fields" and you're done with it.

  | giving up ACID (yes I know this has it)

If you know that this has ACID, then why are you talking about giving up ACID? The fact that many NoSQL implementations give up ACID doesn't have any bearing on this discussion.

Millennium · on May 22, 2013

Is this what Richard Hipp was working on? It doesn't sound like what I remember hearing about back in 2011, but maybe things changed and I missed the news?

mmcclellan · on May 22, 2013

No. Dr Hipp and Damien Katz were working on UnQL (http://www.unqlspec.org/display/UnQL/Home), which is unfortunately quite similar in name to this new UnQLite project.

ccera · on May 22, 2013

Actually, Hipp has said publicly that he intended to created a new embedded database called UnQLite.

It seems this developer just blatantly ripped off the exact name Hipp was planning to use. He also ripped off some of the core SQLite code (the VFS, etc.), which is legal to do since SQLite is in the public domain, but still...

Not cool.

symisc_devel · on May 22, 2013

OK, actually I've emailed D. Richard Hipp a few months ago asking for permission to use the name UnQLite in a future open source project, here is a copy of Hipp's reply:

It would be good if you can make it clear on your website, somehow, that yours is an unaffiliated project. Otherwise, people might go complaining to me when they find bugs in your code. (Don't laugh - that sort of thing happens a lot.)

Other than that, you are welcomed to use the name.

You might want to have a look at the LSM storage engine that Dan Kennedy is working on for SQLite4. It is faster than the clunky and dated B-Tree used by SQLite3. It is also faster than LevelDB. And it supports nested transactions, with rollback. And concurrency. And it is more NAND-flash friendly. See http://www.sqlite.org/src4/timeline for the latest code.

dorfsmay · on May 22, 2013

It doesn't explain the advantages over all the other dbm alternatives:

http://en.wikipedia.org/wiki/Dbm#Successors

I'm yet to see a good replacement to Kyoto cabinet both in terms of ease of use and performance. I feel it'd be better energy to pick up kyoto cabinet and maintain it than re-write something from scratch.

jeltz · on May 22, 2013

Isn't Kyoto Cabinet GPL licensed though which makes it less useful for embedding?

dorfsmay · on May 23, 2013

Never thought of this... It turns out it is dual licensed:

http://fallabs.com/license/

Although I'm not sure how difficult it would be to get a license, the author now works for google and doesn't seem to answer to that email (when asking technical question anyway - offering money might get a different reaction).

DonnyV · on May 22, 2013

I would love an embeddable nosql database engine but this isn't it. I'm looking for something like MongoDB but embeddable. RaptorDB comes close but it doesn't have full support for Mono yet. https://raptordb.codeplex.com/

Can anyone recommend one?

jcromartie · on May 22, 2013

Honestly, you could build a decent embedded document store on top of SQLite pretty easily. It's hard to beat SQLite's long and solid track record for embedded data persistence.

The default BLOB limit is around 1GB.

http://www.sqlite.org/limits.html

stavros · on May 22, 2013

That's exactly what I did! https://github.com/stochastic-technologies/goatfish

jeltz · on May 22, 2013

You could also build it on top of Berkley DB which is another excellent embedded database with a long track record.

benjamincburns · on May 22, 2013

Can I ask why? From my other comment on this article I don't see the use case for an embedded NoSQL, but I'd like to.

Where and how would you use it? What, if anything, do you currently use in its place? If nothing, is there something you could use instead? Why is it better than a serializing your own domain model (assuming you have a domain model), or other alternatives?

DonnyV · on May 22, 2013

Probably a better description would be an embeddable document store is what I'm looking for. I want to store my objects in the database without having to flatten out its nested structure. I also want to be able to search those nested structures. I want the database engine to handle creating the fields if they don't exist. I don't want to deal with schemas. Each document in a collection should not care if the fields are different from each other.

These are all the things I'm used too using in MongoDB, I don't want to go back to SQL type databases. I thought I saw a SQLite driver that mimicked a MongoDB like system but I can't seem to find it.

benjamincburns · on May 22, 2013

Is there anything about the way you query your data that a traditional serialized object store wouldn't handle? Being that you want it embedded and therefore local, I'd imagine you're talking about relatively small amounts of data and your performance concerns aren't too extreme?

Would something like this fit the bill? http://www.db4o.com/s/monodb.aspx

DonnyV · on May 22, 2013

Thanks pointing me to a better path. I think I'm going to use Protocol Buffers for storage.

Good write up about Object Serialization vs. Database Performance here using protobuf.net. http://jakemdrew.wordpress.com/2011/11/01/object-serializati...

DonnyV · on May 22, 2013

That would be perfect except you need a commercial license if your going to bundle it with a commercial product. Really wish they would just put the pricing on the site.

I'm starting to think you're right. Maybe a traditional serialized object store is the way to go.

est · on May 23, 2013

https://github.com/Softmotions/ejdb

exactly what you wanted.

DonnyV · on May 23, 2013

Looks great! Unfortunately there is no love for .net. :(

adamansky · on June 4, 2013

I promise the ejdb .Net binding will be available within June 2013 =)

Sami_Lehtinen · on May 22, 2013

Nice, I'm currently using SQLite for key value storage, because there aren't better light options afaik. Python shelve module is totally useless with multi gigabyte tables and millions of keys.

gizzlon · on May 22, 2013

What's wrong with BerkleyDB, Tokyo Kabinet, Kyoto Kabinet, LevelDB .. ?

It's a real question because I haven't actually used them. SQLite is fine, but if you don't need sql they probably fare better.

Sami_Lehtinen · on May 22, 2013

Afaik, all of those require additional setup. SQLite3 is already included in Python. I also tested year ago several database options and SQLite3 was also the fastest one.

josephg · on May 22, 2013

Maybe it depends on your work load, but thats not true if you just want a regular K-V store. LevelDB and LMDB are usually way faster: http://symas.com/mdb/microbench/

Of course, neither of these systems have indexing. If you want to use them directly, you have to organise your data so that you can translate your queries into range queries on an ordered set. The FoundationDB guys talk about this here: http://foundationdb.com/documentation/beta1/data-modeling.ht...

Shish2k · on May 23, 2013

Perhaps a little tangential, but that reminds me I wrote a SQLite interface with a shelf-like API, for when I wanted incredibly simple data storage which may grow more complex and relational later:

    from sqliteshelf import SQLiteShelf
    d = SQLiteShelf("filename.sdb", "tablename")
    # now use d like any other dictionary, but this one has
    # persistent storage

https://github.com/shish/sqliteshelf

monstrado · on May 22, 2013

Check into Google's LevelDB library.

jonpaul · on May 22, 2013

Like another commenter, I was also going to recommend Google's LevelDB. It has bindings for just about every major language and it's my understanding it works really well as a KV store.

brokenparser · on May 22, 2013

This seems backwards to me. Sure, use the right tool for the job, but still. SQL server -> noSQL / embedded SQL -> embedded noSQL. At every step there's someone going "X is hard, let's not do X" and we end up with a serverless, configuration-free, noSQL transactional database. The next logical steps are presumably the removal of transactions (who needs them, anyway?) and volatile storage (RAM is fast, let's use that).

I can't wait for the day malloc appears on a page with bullet points citing all the latest buzzwords.

tracker1 · on May 22, 2013

Well, you can do MVCC over straight blocking transactions. As for using volatile storage, see MemcacheD and Redis.

stavros · on May 22, 2013

I wrote a very simple thing to sort-of emulate this on SQLite. It works rather well for prototyping!

https://github.com/stochastic-technologies/goatfish

It also supports indexing on arbitrary fields and is only a few lines of code. It did come in very handy.

hightowk · on May 23, 2013

I think some folks are missing the point of being able to scale up when needed by just changing the connection string. If you are trying to prove a big data concept with small data, this might be a cheaper path. Now, if there were just standards for NoSQL query interfaces, this might be 100% true...

kbenson · on May 23, 2013

That's just it. Since it uses a specific scripting language, if you use it for anything more than direct CRUD operations, you very well may need to do quite a bit more than change the connection string. SQLite has less of a problem with this because SQL is standardized (but of course all the RDBMS have their variants).

It's unfortunate that it's called UnQLite. Richard Hipp, the creator of SQLite is now involved with the UnQL specification[1], which looks unrelated to this. In fact, there's an interview where Hipp states his plans to make an UnQlite[2], which makes this, intentional or not, a namespace grab.

[1]: http://www.unqlspec.org/

[2]: http://www.infoq.com/news/2011/08/UnQL

_b8r0 · on May 22, 2013

Does anyone know how this compares in terms of performance to leveldb, lmdb and kyoto cabinet? I've been doing some potentially terabyte scale stuff and while another option is welcome I don't feel like re-integrating yet another local db to find out.

est · on May 22, 2013

similar project

https://github.com/Softmotions/ejdb

angersock · on May 22, 2013

A very uncharitable reading of the features seems to suggest that this is some strange bastard child of a hashmap, a scripting language interpreter, and a JSON serializer.

I'm sure it's much more than that, but one wonders where this stacks up compared to existing solutions.

DocSavage · on May 22, 2013

It looks like the name and scripting is inspired by UnQL, which I found described in this article:

http://www.infoq.com/news/2011/08/UnQL

If that's the case, I'm surprised there's no obvious link to UnQL, so people know they're building off previous ideas. The article says Richard Hipp was planning on building a UnQLite embedded database; the website, though, says the sole developer is not Richard Hipp.

tshadwell · on May 22, 2013

LevelDB is similarly self-contained, and client/server API-less and is extremely fast. Though it has a large advantage of speed, the databases are not one files.

hosay123 · on May 22, 2013

LevelDB holds neither a read performance advantage in any scenerio over common alternatives (LSM reads fundamentally require extra work by design), or a write performance advantage for anything but very short benchmark-friendly bursts. In the meantime its worst-case performance characteristics are orders of magnitude worse than just about any alternative.

(Keep clicking "more" in my comment history for detail)

corresation · on May 22, 2013

It looks like they started with the scripting language and this is a tactic of trying to get it out there (we all want to see our creations adopted, etc, so an understandable desire).

Unfortunately the pitch is hitting some of the wrong notes by being a bit loose with the truth in some of the narrative (e.g. implies that Berkeley DB doesn't have concurrency or transactions, and claims of significant benchmark superiority with no benchmarks at all to back it up), not to mention the buzzword soup on a page and for a product that is targeted at buzzword-adverse developers.

jeltz · on May 22, 2013

Indeed, BerkleyDB even is implemented with MVCC while as far as I can see they use a global lock. This means BerkleyDB is superior in regards of concurrency.

lucian1900 · on May 22, 2013

So dbm, but actually with a dbm-incompatible API? Why?

dschiptsov · on May 22, 2013

isn't leveldb is there for this?

RyanZAG · on May 22, 2013

This would be a great replacement for the fairly awful sqlite on android - accomplishes much the same goal, but should fit into Android a lot better.

Someone should write a android/java wrapper for it in the same way Google has a built in wrapper for sqlite.

krschultz · on May 22, 2013

Actually it seems like a replacement for the PreferenceManager - that's the Key/Value store. Lots of people are using SQLite effectively as a regular SQL DB in their app, it's not going anywhere.

willvarfar · on May 22, 2013

In what way is SQLite awful on Android? I'm just curious what your experience has been.

j_s · on May 22, 2013

Not really SQLite's fault, but Android automatically deletes corrupt SQLite files

http://code.google.com/p/android/issues/detail?id=10127

and SQLite can corrupt databases (for sure back in 2010, with some discussion on whether or not this is still an issue)

http://code.google.com/p/android/issues/detail?id=8427

Combined, this means the db file gets damaged (usually manually repairable) so the Android OS just deletes it.