I’m versed enough in SQL and RDBMS that I can put things in the third normal form with relative ease. But the meta seems to be NoSQL. Backends often don’t even provide a SQL interface.

So, as far as I know, NoSQL is essentially a collection of files, usually JSON, paired with some querying capacity.

  1. What problem is it trying to solve?
  2. What advantages over traditional RDBMS?
  3. Where are its weaknesses?
  4. Can I make queries with complex WHERE clauses?
18 points

NoSQL is best used as a key-value storage, where the value can be non-tabular or mixed data. As an example, imaging you have a session cookie value identifying a user. That user might have many different groups, roles, claims, etc. If you wanted to store that data in a RDBMS you would likely need a table for every 1-to-many data point (Session -> SessionRole, Session -> SessionGroup, etc). In NoSQL this would be represented as a single key with a json object that could looks quite different from other Session json objects. If you then need to delete that session it’s a single key delete, where in the RDBMS you would have to make sure that delete chained to the downstream tables.

This type of key-value lookups are often very fast and used as a caching layer for complex data calculations as well.

The big downside to this is indexing and querying the data not by the primary key. It would be hard to find all users in a specific group as you would need to scan each key-value. It looks like NoSQL has some indexing capabilities now but when I first used it it did not.

permalink
report
reply
9 points
*

Let me see if I got it. It would be like a denormalized table with a flexible number of columns? So instead of multiple rows for a single primary key, you have one row (the file), whose structure is variable, so you don’t need to traverse other tables or rows to gather/change/delete the data.

The downsides are the usual downsides of a denormalized DB.

Am I close?

permalink
report
parent
reply
8 points
*

Pretty much. The advantage is not really the unstructeredness per se, but simply the speed at which you can get a single record and the throughput in how much you can write. It’s essentially sacrificing some of the guarantees of ACID in return for parallelization/speed.

Like when you have a million devices who each send you their GPS position once a second. Possible with RDBS but the larger your table gets, the harder it’ll be to get good insertion/retrieval speeds, you’d need to do a lot of tuning and would essentially end up at something like a NoSQL database effectively.

permalink
report
parent
reply
5 points

Yes. You can also have fields that weren’t defined when you created the “table”.

With something like Elasticsearch you also have tokenisation of text which obviously compresses it. If it’s logs (or similar) then you also only have a limited number of unique tokens which is nice. And you can do very fast text search. And everything is set up for other things like tf-idf.

permalink
report
parent
reply
4 points

Rather than try to relate it to an rdbms, think of it as a distributed hash map/associative array.

permalink
report
parent
reply
4 points

What I’m hearing is that they’re very different beasts for very different applications. A typical web app would likely need both.

permalink
report
parent
reply
13 points

A place where this type of DB really shines is in messaging. For example Discord uses NoSQL. Each message someone sends is a row, but each message can have reactions made on it by other users. In a SQL database there would be 2 tables, one for messages and one for reactions with a foreign key to the message. But at the scale of Discord they can’t use a single SQL server which means you can’t really have 2 tables and do a join to find reactions on a message. Obviously you could shard the databases. But in NoSQL you just lookup the message and the reactions are just stored alongside it, not in another table, making the problem simpler.

permalink
report
reply
5 points

Right, and you’d never do a search for messages with a particular reaction, so there’s no functionality loss is this use case.

permalink
report
parent
reply
4 points

It’s not really messaging that’s the differentiator here - it’s scale (specifically write scale). If you can’t have a single master database then sure you might need NoSQL. But you almost certainly aren’t anywhere near that scale. Giant sites like Stackoverflow and Shopify aren’t.

permalink
report
parent
reply
3 points

Part of any issue looking at SQL vs NoSQL currently is that SQL has continued to evolve and actually taken steps to incorporate no-sql like paradigms.

A good example is JSON support. Initially if you wanted to store or manage JSON objects it was either as text in SQL or required a NoSQL database. Now the SQL standard has support for JSON.

Similarly “Big Data” is a space for NoSQL, things like columnar databases were designed for more efficient storing/processing (although columnar indexes can now exist in SQL databases I believe).

Some spaces where NoSQL still is really important is things like graph databases and key value (as others have mentioned). Graph databases require a different query language and backend.

permalink
report
reply
3 points

I spent 30 years working with derivatives of the Pick Operating System and its integrated DBMS. Notably Universe and Ultimate. Back in the day, it was very, very difficult to even explain how they worked to others because the idea of key/value wasn’t commonly understood, at least as it is today.

I was surprised at how similar MongoDB is to Pick in many many respects. Basically, key/value with variant record structures. MongoDB uses something very close to JSON, while Pick uses variable length delimited records. In either case, access to a particular record in near instantaneous give the record key, regardless of how large the file is. Back in the 1980’s and earlier, this was a huge advantage over most of the RDBMS systems available, as storage was much slower than today. We could implement a system that would otherwise take a huge IBM mainframe, on hardware that cost 1/10 the price.

From a programming perspective, everything revolves around acquiring and managing keys. Even index files, if you had them (and in the early days we didn’t so we maintained our own cross-reference files) were just files keyed on some value from inside records from the main data file. Each record in an index file was just a list of record keys to the main data file.

Yes, you can (and we did) nest data that would be multiple tables in an SQL database into a single record. This was something called “Associated Multivalues”. Alternatively, you could store a list of keys to a second file in a single field in the first file. We did both.

One thing that became very time/disk/cpu expensive was traversing an entire file. 99% of the time we were able to architect our systems so that this never happened in day to day processing.

A lot of stuff we did would horrify programmers used to SQL, but it was just a very different paradigm. Back in a time when storage and computing power were limited and expensive, the systems we built stored otherwise unthinkable amounts of data and accessed it with lightening speed on cheap hardware.

To this day, the SQL concepts of joins and normalization just seems like a huge waste of space and power to me.

permalink
report
reply
1 point

This was super cool, thanks for sharing

permalink
report
parent
reply
1 point

This isn’t a sophisticated opinion or anything, but personally I find RDBMS to be a bad fit for how data is typically structured in your program. You will usually have an object, often with sub-objects all built up like a tree. If you want to load that into an SQL DB, you need to split it up, equip lots of its parts with IDs and then hope that you can reconstruct it when you take it back out.

On the other hand, JSON was directly designed for serializing programming objects. The chance of you being able to persist and load your object with hardly any structural changes is high.

Of course, this does have other downsides, like the data not being as flexible to access. Similarily, data in an RDBMS is very structured, whereas in many NoSQL databases, you can have individual entries with different fields than the rest.
So, that’s perhaps a more general takeaway: SQL makes it hard to put something into the database, but easy to get it out. NoSQL often reverses this.

permalink
report
reply
3 points

I think this is a really good point, but it’s also kind of a missed opportunity for NoSQL. The ORM mapping is easily the most annoying thing about using a relational database, and I think it’s what most people initially looking at NoSQL wanted to solve.

But what we ended up with is Mongo which solves that problem but also throws away pretty much every useful feature that relational databases have! No schemas, no type checking, no foreign keys, etc. etc. It’s just a big soup of JSON which is awful to work with.

I wonder if anyone made any NoSQL databases that avoid the object/table impedance mismatch but also managed to keep schemas, foreign keys, etc.

permalink
report
parent
reply
1 point
*

Right, RDBMS for object permanence is a pain. It’s meant as efficient data storage and retrieval. But I counter that a huge amount of data problems are of that kind, and using object permanence for general database applications seems very contrived. I’m imagining loading a huge amount of data to memory to filter the things you need, essentially rolling your own DBMS. Am I missing something?

permalink
report
parent
reply
2 points

Well, for use-cases where an SQL database works well, I would recommend using an SQL database. NoSQL generally tries to provide a better alternative for the use-cases where SQL is suboptimal.

For example, I’m currently building a build system with caching. I need the cache to be persistent on disk between builds, but I just load the cache into memory on startup and if I have a breaking change in the format, I can just wipe the whole cache. So, all the strengths of SQL are irrelevant and the pain points are still there. I mean, truth be told, I’m not using an actual NoSQL DB, but rather just writing a JSON file to disk, but it’s still similar.

Another example is that at $DAYJOB, our last project involved making lots of recordings and training a machine learning model on that. The recordings had to be created early on, long before our software was stable and the data scientists who would work with that data, would write all kinds of transformation scripts anyways. In that case, again, I do not think an SQL database would’ve been the best choice, because we needed the flexibility to just push data into a heap and then later clean it up. If an older format would’ve really become unusable, we could’ve just left that data behind, rather than trying to constantly update all the data to the newest database schema.

permalink
report
parent
reply
2 points

Gotcha. Thanks!

permalink
report
parent
reply

Learn Programming

!learn_programming@programming.dev

Create post

Posting Etiquette

  1. Ask the main part of your question in the title. This should be concise but informative.

  2. Provide everything up front. Don’t make people fish for more details in the comments. Provide background information and examples.

  3. Be present for follow up questions. Don’t ask for help and run away. Stick around to answer questions and provide more details.

  4. Ask about the problem you’re trying to solve. Don’t focus too much on debugging your exact solution, as you may be going down the wrong path. Include as much information as you can about what you ultimately are trying to achieve. See more on this here: https://xyproblem.info/

Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient

Community stats

  • 61

    Monthly active users

  • 114

    Posts

  • 414

    Comments