As a hacker/tinkerer type programmer I can understand why it might be fun to build something like this. All of the data structure fun with none of the distributed difficulty! NoSQL! I can even build in that cool new language I've been designing in my head! But like everyone else I'm straining to see a use case. "Serverless" NoSQL? If you've got something small enough that you want an embedded datastore, why use a NoSQL variant?
I understand that NoSQL literally means "no SQL," but typically it's a buzzword reserved for unconventional datastores used under heavy distribution.
Maybe you'd want a local embedded "NoSQL" to act as a local datacache? But then for that use case why create an impedance mismatch with the central datastore by rolling your own?
Jx9? So wait, I need another language to interact with the thing that's already embedded directly within my program?
Local offline data analysis? Now that makes sense. Does this have intelligent/optimized paging mechanisms? I have a feeling not, since the author seems to be couching this as an alternative to sqlite for people who like NoSQL.
I'd sincerely love to hear more about what's motivating this project. Was it just for yucks, or is there some problem that the author(s) needed to solve that wasn't well-solved by another tool?
> I'm straining to see a use case. "Serverless" NoSQL? If you've got something small enough that you want an embedded datastore, why use a NoSQL variant?
Serverless NoSQL embedded datastore aren't really that unusual. BerkeleyDB, Tokyo Cabinet, LevelDB are all commonly used serverless NoSQL.
I think what's happening here is that NoSQL is just an anti-category.
When I think of NoSQL, I think of something like MongoDB or Cassandra which are designed to handle tons of data under heavy load. When I think of BerkeleyDB and friends, I think local persistence. It's a true statement that these too are NoSQL stores, but, at least for me, they're not of the variety that first comes to mind.
The problem isn't the author's lust for tinkering and improving his dev chops. The problem is with HN, how it's designed to drive fads and constantly disturb our focus, and immature developers who change tech stacks as if it was underwear.
The problem? There's no problem here. I find the OP very interesting.
Whether or not it's useful to myself or this community, it's something which someone very intelligent spent a significant amount of time to build. I'd argue that its usefulness is unrelated to how interesting it is. It genuinely makes me want to know more about the author(s) and what their motivation was.
"serverless" databases are usually called storage engines: LevelDB, BerkeleyDB, BitCask, InnoDB. There is a (good) trend towards modular software, separating the storage engine from the server is a natural consequence.
But yeah, requiring a new obscure language pretty much obliterates any reason you'd want to use this. Lua is fast, popular, made for this exact use case and already used by Redis.
I probably came off too harsh. I'm quite familiar with the idea of embedded datastores, SQL and otherwise, and I didn't mean to attack the concept. I was more trying to illustrate (albeit weakly) that the marketing seems a bit off.
Like I said elsewhere, to me NoSQL tends to mean "big, distributed storage thing that isn't RDBMS, usually with an emphasis on horizontal scalability." Others have said correctly that some also take it first to mean schemaless and relax the "big" requirement. However I'm still confused on what value a general-purpose "schemaless" datastore provides in an embedded context. I write schemaless in quotes there because there's still an impedance mismatch between the format in which your application works with and the format in which the datastore works.
All of these rely on some general-purpose abstraction. But your application, if designed properly, probably makes use of a number of varying data structures. What are you gaining by pushing all of these through some abstraction? Assuming that all you're after is simple persistence, I have to ask - is writing data to disk really that hard?
That said, I've used local persistence libraries plenty, usually as a persistent local cache, and/or as a tool to enforce a schema which allows for data migration between constantly updating versions of software.
When I was at Singly working on the locker project, this was actually the absolute perfect solution to how we wanted to store data (it just wasn't available then.) We wanted to store all the raw JSON that was coming down the pipe from various sources (Facebook, Twitter, etc.) and wanted it all to be queryable. Also, since each user's data was completely isolated, having a tiny embeddable solution made a lot more sense vs. firing up a MongoDB or something similar for every user.
I'd have to agree with you here... I mean, there are lots of interfaces for reading/writing json, and simply outputting a JSON as a UTF-8 .json.gz file seems more appropriate... especially if your data can fit into memory.
I would be more inclined to simply have my object structure in memory, using a more convenient high-level language, and do a load/dump from .json.gz files as needed. It's fast enough, and worst case scenario, I can backup/extract and read/write in any number of programming languages.
If you need multiple records, that may be easier, having an index, and using line delimited flat json structures can work as well... (breaking on \n)
This really seems like an also ran, where you could have abstracted out an SQLite db with a single table of (key VARCHAR(100) PRIMARY KEY, value VARCHAR(4000)) or something similar.
Following the thread of building upon sqlite, Yelp (along with the fabulous mrjob library) also open-sourced sqlite3dbm (http://pythonhosted.org/sqlite3dbm/), which is a document store on top of sqlite. Python only (meant to give random-access data to your EMR jobs running with mrjob), but the same approach could be applied to any other language.
The licensing is strange as well. The BSD license is listed as a feature, but the embedded scripting language (Jx9) is SPL (their own GPL-alike/Sleepycat license).
Since the Jx9 language is fully embedded (even stated as " All C source code for UnQLite and Jx9 are combined into a single source file."), they seem to contradicting their own licensing. The resulting library would also retain the copyleft SPL license.
The base code may be BSD, but incorporating the whole database into your code infects it with their SPL too.
From UnQLite core developer:
No, Jx9 (the standalone library) is under SPL, while UnQLite is 2-clause BSD (including the Jx9 core), so there is no worry about that since the two software are developed by the same company. In other words, use UnQLite without restrictions
"It furthermore states that commercial licence must be acquired for closed source applications."
I love the description of the scripting language as "Turing complete" based on "JSON". That is the first time I have ever heard turing complete used to market a programming language!
The scripting language is incredible, I have never seen a more verbose way to program. It is a scripting language, yet seems to combine JSON, PHP and C standard library functions with C type casting. Look at the samples: http://unqlite.org/jx9_samples.html
I am speechless, the description of the project is the most vacuous collection of buzzwords imaginable.
"Built with a powerful disk storage engine which support O(1) lookup."
"Support Terabyte sized databases."
Of course with a global lock and after taking a look at the scripting language I have reason to doubt the claims made
If you lived through the XML boom, you'll probably recall the "Turing complete! based on XML!" languages (XSLT, Ant, and friends). This was before people re-re-learned the essential lesson "languages based on markup are painful as languages and unsafe as markup".
JX9 was what got me too. "Based on JSON..." didn't prepare me for the code samples. I can't think of a reason to invent a new language (that has to be learnt) as opposed to JavaScript or Lua.
It clearly states on the license page that its two clause BSD. Its only custom extensions and custom dev work done by the architect that requires payment. Perfectly fair and still open source.
Isn't BerkeleyDB the quintessential "embedded NoSQL" database engine? I'm an RDBMS+SQL kind of guy, so can someone enlighten me about the differences between bdb and UnQLite?
The site seems like a computer generated mashup of keywords and phrases. It's a document store database similar to MongoDB, Redis, CouchDB etc and and key-value store similar to BerkeleyDB, LevelDB, etc.?
> UnQLite is 100% hand-coded, written in ANSI C, Thread-safe, _Full reentrant_ ...
It also uses its own scripting language
> [jx9] uses a clean and familiar syntax similar to C, JavaScript and JSON
Not that I am envious for the karma, but I wonder why this was not merged into my exactly identical submission 1 hour earlier (https://news.ycombinator.com/item?id=5749969) -- afaict there was no ?repost or similar stunt ...
Being that HN was designed to handle a metric shit ton of traffic on very little hardware, I'd imagine that it has adopted a weak consistency model for these types of things.
From the markup, that submission has a trailing slash while this one does not. This is one of few ways two URLs can differ but cannot ever point at separate resources, so more clever URL handling would have noticed it's a dup.
I am working on a similar project: a persistent database in Node for Node projects, with no external dependencies. That means you can use it with a simple require(), no external software needed.
This is useful for small projects that don't need the power of a behemoth like MongoDB and want to be installable by a simple git-clone + npm install
What an unfortunate name. I spent some confused minutes thinking that the author of SQLite had finally done a complete implementation of [1], which he has been involved in.
I imagine that Dr. Hipp won't be happy with the choice of project name when he finds out, if he hasn't already. When I first saw the name "UnQLite", I figured this was his project, since I knew that he was working on something called UnQL.
If I wasn't going to get the benefit of my data automatically being distributed across servers, why would I use this? In many cases, NoSQL solutions seem to be a compromise that you make, giving up ACID (yes I know this has it) and other nice query features of databases and in return gaining the ability to scale to much larger amounts of data and having redundancy without needing to think about it.
In this case though, it's all embedded so I don't gain any of the benefits.
The point being that if you need a local data store, sometimes you would rather have a document-based one where you don't have to create the code to serialize your objects into (e.g.) a SQLite database, and to reconstitute them back.
A good example would be something like an address book data store. Especially if you want it to be dynamic so that the user can add/remove fields (e.g. allowing the user to attach as many phone numbers as they like to the contact, rather than just a static 5 numbers). If you have to implement this in SQLite, then you have to develop the schema for it, and the (de)serialization code. With a document-store, you can just do something like say "store these fields" and you're done with it.
| giving up ACID (yes I know this has it)
If you know that this has ACID, then why are you talking about giving up ACID? The fact that many NoSQL implementations give up ACID doesn't have any bearing on this discussion.
Is this what Richard Hipp was working on? It doesn't sound like what I remember hearing about back in 2011, but maybe things changed and I missed the news?
No. Dr Hipp and Damien Katz were working on UnQL (http://www.unqlspec.org/display/UnQL/Home), which is unfortunately quite similar in name to this new UnQLite project.
Actually, Hipp has said publicly that he intended to created a new embedded database called UnQLite.
It seems this developer just blatantly ripped off the exact name Hipp was planning to use. He also ripped off some of the core SQLite code (the VFS, etc.), which is legal to do since SQLite is in the public domain, but still...
OK, actually I've emailed D. Richard Hipp a few months ago asking for permission to use the name UnQLite in a future open source project, here is a copy of Hipp's reply:
It would be good if you can make it clear on your website, somehow, that
yours is an unaffiliated project. Otherwise, people might go complaining
to me when they find bugs in your code. (Don't laugh - that sort of thing
happens a lot.)
Other than that, you are welcomed to use the name.
You might want to have a look at the LSM storage engine that Dan Kennedy is
working on for SQLite4. It is faster than the clunky and dated B-Tree used
by SQLite3. It is also faster than LevelDB. And it supports nested
transactions, with rollback. And concurrency. And it is more NAND-flash
friendly. See http://www.sqlite.org/src4/timeline for the latest code.
I'm yet to see a good replacement to Kyoto cabinet both in terms of ease of use and performance. I feel it'd be better energy to pick up kyoto cabinet and maintain it than re-write something from scratch.
Although I'm not sure how difficult it would be to get a license, the author now works for google and doesn't seem to answer to that email (when asking technical question anyway - offering money might get a different reaction).
I would love an embeddable nosql database engine but this isn't it. I'm looking for something like MongoDB but embeddable. RaptorDB comes close but it doesn't have full support for Mono yet. https://raptordb.codeplex.com/
Honestly, you could build a decent embedded document store on top of SQLite pretty easily. It's hard to beat SQLite's long and solid track record for embedded data persistence.
Can I ask why? From my other comment on this article I don't see the use case for an embedded NoSQL, but I'd like to.
Where and how would you use it? What, if anything, do you currently use in its place? If nothing, is there something you could use instead? Why is it better than a serializing your own domain model (assuming you have a domain model), or other alternatives?
Probably a better description would be an embeddable document store is what I'm looking for.
I want to store my objects in the database without having to flatten out its nested structure. I also want to be able to search those nested structures. I want the database engine to handle creating the fields if they don't exist. I don't want to deal with schemas. Each document in a collection should not care if the fields are different from each other.
These are all the things I'm used too using in MongoDB, I don't want to go back to SQL type databases. I thought I saw a SQLite driver that mimicked a MongoDB like system but I can't seem to find it.
Is there anything about the way you query your data that a traditional serialized object store wouldn't handle? Being that you want it embedded and therefore local, I'd imagine you're talking about relatively small amounts of data and your performance concerns aren't too extreme?
That would be perfect except you need a commercial license if your going to bundle it with a commercial product. Really wish they would just put the pricing on the site.
I'm starting to think you're right. Maybe a traditional serialized object store is the way to go.
Nice, I'm currently using SQLite for key value storage, because there aren't better light options afaik. Python shelve module is totally useless with multi gigabyte tables and millions of keys.
Afaik, all of those require additional setup. SQLite3 is already included in Python. I also tested year ago several database options and SQLite3 was also the fastest one.
Maybe it depends on your work load, but thats not true if you just want a regular K-V store. LevelDB and LMDB are usually way faster: http://symas.com/mdb/microbench/
Of course, neither of these systems have indexing. If you want to use them directly, you have to organise your data so that you can translate your queries into range queries on an ordered set. The FoundationDB guys talk about this here: http://foundationdb.com/documentation/beta1/data-modeling.ht...
Perhaps a little tangential, but that reminds me I wrote a SQLite interface with a shelf-like API, for when I wanted incredibly simple data storage which may grow more complex and relational later:
from sqliteshelf import SQLiteShelf
d = SQLiteShelf("filename.sdb", "tablename")
# now use d like any other dictionary, but this one has
# persistent storage
Like another commenter, I was also going to recommend Google's LevelDB. It has bindings for just about every major language and it's my understanding it works really well as a KV store.
This seems backwards to me. Sure, use the right tool for the job, but still. SQL server -> noSQL / embedded SQL -> embedded noSQL. At every step there's someone going "X is hard, let's not do X" and we end up with a serverless, configuration-free, noSQL transactional database. The next logical steps are presumably the removal of transactions (who needs them, anyway?) and volatile storage (RAM is fast, let's use that).
I can't wait for the day malloc appears on a page with bullet points citing all the latest buzzwords.
I think some folks are missing the point of being able to scale up when needed by just changing the connection string. If you are trying to prove a big data concept with small data, this might be a cheaper path. Now, if there were just standards for NoSQL query interfaces, this might be 100% true...
That's just it. Since it uses a specific scripting language, if you use it for anything more than direct CRUD operations, you very well may need to do quite a bit more than change the connection string. SQLite has less of a problem with this because SQL is standardized (but of course all the RDBMS have their variants).
It's unfortunate that it's called UnQLite. Richard Hipp, the creator of SQLite is now involved with the UnQL specification[1], which looks unrelated to this. In fact, there's an interview where Hipp states his plans to make an UnQlite[2], which makes this, intentional or not, a namespace grab.
Does anyone know how this compares in terms of performance to leveldb, lmdb and kyoto cabinet? I've been doing some potentially terabyte scale stuff and while another option is welcome I don't feel like re-integrating yet another local db to find out.
A very uncharitable reading of the features seems to suggest that this is some strange bastard child of a hashmap, a scripting language interpreter, and a JSON serializer.
I'm sure it's much more than that, but one wonders where this stacks up compared to existing solutions.
If that's the case, I'm surprised there's no obvious link to UnQL, so people know they're building off previous ideas. The article says Richard Hipp was planning on building a UnQLite embedded database; the website, though, says the sole developer is not Richard Hipp.
LevelDB is similarly self-contained, and client/server API-less and is extremely fast. Though it has a large advantage of speed, the databases are not one files.
LevelDB holds neither a read performance advantage in any scenerio over common alternatives (LSM reads fundamentally require extra work by design), or a write performance advantage for anything but very short benchmark-friendly bursts. In the meantime its worst-case performance characteristics are orders of magnitude worse than just about any alternative.
(Keep clicking "more" in my comment history for detail)
It looks like they started with the scripting language and this is a tactic of trying to get it out there (we all want to see our creations adopted, etc, so an understandable desire).
Unfortunately the pitch is hitting some of the wrong notes by being a bit loose with the truth in some of the narrative (e.g. implies that Berkeley DB doesn't have concurrency or transactions, and claims of significant benchmark superiority with no benchmarks at all to back it up), not to mention the buzzword soup on a page and for a product that is targeted at buzzword-adverse developers.
Indeed, BerkleyDB even is implemented with MVCC while as far as I can see they use a global lock. This means BerkleyDB is superior in regards of concurrency.
Actually it seems like a replacement for the PreferenceManager - that's the Key/Value store. Lots of people are using SQLite effectively as a regular SQL DB in their app, it's not going anywhere.
I understand that NoSQL literally means "no SQL," but typically it's a buzzword reserved for unconventional datastores used under heavy distribution.
Maybe you'd want a local embedded "NoSQL" to act as a local datacache? But then for that use case why create an impedance mismatch with the central datastore by rolling your own?
Jx9? So wait, I need another language to interact with the thing that's already embedded directly within my program?
Local offline data analysis? Now that makes sense. Does this have intelligent/optimized paging mechanisms? I have a feeling not, since the author seems to be couching this as an alternative to sqlite for people who like NoSQL.
I'd sincerely love to hear more about what's motivating this project. Was it just for yucks, or is there some problem that the author(s) needed to solve that wasn't well-solved by another tool?