Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As recently as last year, I worked for a company where the Chief Architect, in his infinite wisdom, had decided that a database was a silly legacy thing. The future looked like Kafka streams, with each service being a function against Kafka streams, and data retention set to infinite.

Predictably, this setup ran into an interesting assortment of issues. There were no real transactions, no ensured consistency, and no referential integrity. There was also no authentication or authorization, because a default-configured deployment of Kafka from Confluent happily neglects such trivial details.

To say this was a vast mess would be to put it lightly. It was a nightmare to code against once you left the fantasy world of functional programming nirvana and encountered real requirements. It meant pushing a whole series of concerns that isolation addresses into application code... or not addressing them at all. Teams routinely relied on one another's internal kafka streams. It was a GDPR nightmare.

Kafka Connect was deployed to bridge between Kafka and some real databases. This was its own mess.

Kafka, I have learned, is a very powerful tool. And like all shiny new tools, deeply prone to misuse.



Instead of single architects, companies need architect boards. And they can vote on these ideas before a single individual becomes a single point of failure.

Expecting 1 person to make 100% correct decisions all the time is too much expectation for one person. People go down rabbit holes and they have weird takeaways, like replace all the databases with queues.


I agree in abstract, but in practice it's quite difficult to set up a successful democratic architecture board. You need teams or departments that all have architects, and an engineering organization where both managers and engineers accept a degree of centralized technical leadership. Getting there is, in my opinion, the work of years. It's especially challenging because spinning up such a board requires a person who can run it single-handedly.

In this particular company, the Chief Architect in theory had a group around him. They did nothing to check his poor decisions, and from the outside seemed primarily interested in living in the functional programming nirvana he promised them.


Thinking in terms of building physical buildings.. architects are often the visionaries but engineers are the realists. You know, architects come up with these crazy incredible building designs based on their engineering understanding, but ultimately it has to be vetted, proven and implemented by engineers.

I have, however, always wondered why people should be seen as "only architects" and "only engineers". While the separation of duties is critical to ensure the overall construction is sound, people can be visionary engineers, and people can be knowledgeable in both how to do something in real terms as well as dreaming on how to go beyond.


It sounds like this would have made a better proof of concept than an commitment to the architecture.

The idea on the face of it is not per se a bad one, quite interesting, but the implementations are perhaps not there yet to back such an idea.

It's important to know as an architect when your vision for the architecture is outpacing reality, to know your time horizon, and match the vision with the tools that help you implement the actual use cases you have in your hand right now.

It sounds like this person might have had an interesting idea but not a working system. In another light, this could have been a good idea if all the technology was in place to support it.. but the timing and implementation doesn't sound like it was right, perhaps.

The old saying "use the right tool for the job" comes to mind, but that can be hard to see when the tools are changing so fast, and there is a risk to going too far onto the bleeding edge. Perhaps the saying should have been, "use the rightest tool you can find at the time, that gives you some room to grow, for the job"...


It was definitely an interesting proof of concept that needed some refinement. The core idea was functional services against nicely organized data streams on a flat network. Which is a really cool approach that works quite well for a lot of things.

Several of these points fell apart when credit card handling and PCI-DSS entered the picture.


At our company we use a ton of services that operate essentially as as functions on a Kafka stream (well, they tend to read/write in batches for efficiency) but we write event streams we want to query later into a regular database for later query. It works out very well. The idea of our poor Kafka cluster having to field queries in addition to the load of acting as a transport layer is frightening. The 'superpower' Kafka gives you is the ability to turn back time if something goes wrong and the ability to orchestrate really big pipeline. You have to build or buy a fair bit of tooling to make it work though.


This particular org did its level best to think of Kafka as the regular database for queries.


> There were no real transactions, no ensured consistency

Which is the right way to do it, because transactions don't extend into the real world. If you need to wait for the consequences of a given event, wait for the consequences of that event. Otherwise, all you really care about is all events happening in a consistent order. It's a much more practical consistency model.

> and no referential integrity

The problem with enforcing referential integrity is how you handle violations of it. Usually you don't really want to outright reject something because it refers to something else that doesn't exist yet, so you end up solving the same problem either way.

> There was also no authentication or authorization, because a default-configured deployment of Kafka from Confluent happily neglects such trivial details.

Pretty common in the database world - both MySQL and PostgreSQL use plaintext protocols by default. Properly-configured kafka uses TLS and/or SASL and has a good ACL system and is as secure as anything else.

> It was a nightmare to code against once you left the fantasy world of functional programming nirvana and encountered real requirements. It meant pushing a whole series of concerns that isolation addresses into application code... or not addressing them at all.

My experience is just the opposite - ACID isolation sounds great until you actually use it in the real world, and then you find it doesn't address your problems and doesn't give you enough control to fix it yourself. It's like when you use one of those magical do-everything frameworks - it works great until you need to customise something slightly, then it's a nightmare. Kafka pushes more of the work onto you upfront - you have to understand your dataflow and design it explicitly - but that pays off immensely.

> It was a GDPR nightmare.

Really? I've found the exact opposite - teams that used an RDBMS had to throw away their customer data under GDPR, because even though they had an entry in their database saying that the customer had agreed, they couldn't tell you what the customer had agreed to or when. Whereas teams using Kafka in the way you describe had an event record for the original agreement, and could tell you where any given piece of data came from.


> Really? I've found the exact opposite - teams that used an RDBMS had to throw away their customer data under GDPR, because even though they had an entry in their database saying that the customer had agreed, they couldn't tell you what the customer had agreed to or when. Whereas teams using Kafka in the way you describe had an event record for the original agreement, and could tell you where any given piece of data came from.

This is absolutely wonderful! Unfortunately, this team decided to store data subject to GDPR deletion requests in Kafka, where deletion is quite difficult. It was a problem, when trying to do deletion programmatically, across many teams using the same set of topics.

The real nightmare came when this team, obsessed with the power of infinite retention periods, encountered PCI-DSS. You see, the business wanted to move away from Stripe and similar to dealing with a processor directly, in order to save on transaction fees. So obviously they could just put credit card data into Kafka...


Yeah fair enough. I'd argue that this is kind of a double standard (a traditional RDBMS may well be copies of "deleted" data on dirty pages, and may well leave that data on the physical disk indefinitely, for much the same reasons as Kafka does - it just makes it a bit fiddlier for you to access it), but your legal team may decide that it's required.

I don't think your overall scorn is warranted - there are bigger problems that are endemic to RDBMS deployments, and the advantages of a stream-first architecture are very real - but there are genuine difficulties around handling data obliteration and it's something you need to design for carefully if you're using an immutable-first architecture and have that requirement.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: