If it helps, you don't really have to type SQL key words in all capitals ;) I stopped years ago.
The ironic thing is that the way developers like to deal with data today is more like how they did in the early days of COBOL, to which SQL was an improvement.
The first computerized databases were navigational, which just means hierarchical objects. You "navigated" through the data to find the parts that were interesting for a given query, just like with JSON you might loop around through the properties. In 1973 Charles Bachman wrote a book called The Programmer as Navigator.
These data structures were insidious, because: (1) you wind up with duplicated data, vulnerable to getting out of sync with itself, and (2) complex queries can get slow. For example, imagine an array of Customer objects. Each has an Orders field, which is an array of Order objects. Each of those has, among other things, fields for the item name and description, and so on. With this structure, it's easy to fetch all the orders of a certain customer, but it's slow and complex for other queries, like the total number of orders for each item. For that, you might duplicate the data into a different structure. It was just like NoSQL, only there was no SQL at the time. It was PreSQL.
Programmers are immediately attracted to such data structures because they are amenable to the first few pages you have in mind to build. It's really easy to run those nested objects through a template and output HTML, and it's straightforward to take data from a form and save it as one of these objects. As your application grows and spirals, though, your original data structures become more and more cumbersome and less suited to the new pages you have to make.
This was a problem in the 1960s and 70s just as much as today, which is why E. F. Codd wrote his papers, most famously "A Relational Model of Data for Large Shared Data Banks." You might say, relational? Those old navigatorial objects sounded like they had lots of relationships. But it is a popular misconception that Relational here meant the relationships among tables (i.e., foreign keys). Dr. Codd was a mathematician, and he meant the mathematical term relation, which is essentially a grid of values, a table. So they were called relational databases not because you could relate one table to another but because they were databases made up of relations (tables).
Tabular data solves the speed problem in navigational data. But now fetching your data is even more tedious, if you have to navigate those tables by hand (Loop through Table A. If the value in cell 12 > 10, then save it to a temporary variable...). But in that very same paper, Codd also proposed a very high-level language for working with the tables. It wasn't called SQL. IBM came up with SQL specifically, after examining Codd's papers (who in fact was a researcher at IBM). Believe me, SQL was an improvement. Codd's original language, called Alpha, was mathematical hieroglyphics. The foundation was solid but the user friendliness was lacking. SQL was an attempt to have the same nature but resemble English instead of Mathematics.
But the two pillars, tabular data structures and a high-level query language, were introduced simultaneously and are both equally part of what makes SQL what it is. Which one would you like to remove?
Chesterton's Fence comes to mind when watching programmers meet SQL:
"In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, 'I don't see the use of this; let us clear it away.' To which the more intelligent type of reformer will do well to answer: 'If you don't see the use of it, I certainly won't let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.'" --- https://en.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fence
(The rest of your comment is very compelling, just have a nitpick.)
> If it helps, you don't really have to type SQL key words in all capitals ;) I stopped years ago.
I continue to urge my team to capitalize SQL keywords. This is because the vast majority of SQL queries in our codebases are embedded in another language, as a string. Syntax highlighting is not available (I know there are some tools for this in some environments for some host languages, but it's not widely available or even remotely a solved problem). Static analysis tools for this scenario are generally hard to come by. Every syntactical hint is a godsend for reading and comprehending these queries. I also encourage quoting every identifier even if it isn't strictly required, and extensively using whitespace to make a query's structure more apparent.
If I were writing and reading SQL under better circumstances, it's quite likely I would have different preferences.
Fair enough. The vast majority of SQL queries in my codebase likewise was embedded in another language as a string. Yet I instantly became happier when I started lowercasing everything. I also quote as few identifiers as possible. It's a matter of a clean look, which for me leads to clear thought. I will agree with you, though, that indentation is critical.
But I have taken it one step further and reaped a boon with a more advanced technique: I extract as much as I can of those multi-line SQL statements into database views and, if absolutely necessary, functions. I define these in .sql files, so first of all: syntax highlighting! The strings embedded in my code therefore become mostly one-liners:
select * from view where a = ?
etc.
I version-control the SQL files right along with the normal codebase. Migrations aren't complicated for me, the way that some people complain about them. Then again, I am super-comfortable writing raw SQL. Especially with Postgres, which lets you wrap BEGIN and ROLLBACK around definitions and redefinitions of tables, views, functions --- all DDL --- it's very safe to test and idempotent to run over and over.
This is something I've seriously considered, I just worry that one more layer of indirection creates one more opportunity for problems. (Also being maybe overly publicly blunt, I work with a number of juniors who don't have a great track record for following or even taking interest in why things are structured the way they are, and who also don't have great comfort with SQL.)
RE version control: yes, everything that interfaces with the database is in version control.
RE migrations: I also don't find them confounding, and I've mandated that my team write them all in raw SQL rather than reusing abstractions that were available in our environment, because what I discovered was that migrations were being altered unexpectedly over time where imported abstractions were changing. Everyone has benefited from this.
RE transactions: my only complaint is that (AFAIK, and would happily be corrected if I'm wrong), `BEGIN`, any operation, and `END` are separate statements. When I'm experimenting before testing a full migration, I'll often break down smaller problems and test them within a transaction in my preferred GUI (Postico). Which by default runs the statement where the cursor is placed, and only runs multiple statements if you (remember to) select them. I would love to be able to wrap an entire statement (or set of statements if need be) in a single `BEGIN ... RETURNING * ... ROLLBACK` statement.
The file, changes.sql in this case, might look like this:
begin;
alter table t1 add column c ...;
alter table t2 drop column d ...;
alter table t3 alter column e ...;
drop view if exists v;
create view v as
select ...
;
rollback;
I save the file in version control with the rollback statement, for safety. When I am ready to run it, I temporarily change "rollback" to "commit".
If I wanted to test out just some of the statements, I would comment out the others.
EDIT:
You can inspect the changes in flight by inserting a select-statement:
begin;
alter table t1 alter column c ...;
select *
from t1
where ...
;
rollback;
(Interesting, Hacker News's user interface puts no reply link below your comment. So I could not reply directly. I guess it allows only so many nested replies. This is probably an optimization for readability or to avoid flamewars).
That's great, but the statefulness of trying something is still a problem. It would still be nice to be able to wrap a whole change or set of changes with a guarantee that I'll be able to view the outcome without a state change in the database. This nicety is compounded by the fact that I'm often working with datasets that take a few hours to get set up before I can even evaluate. There are a ton of other guard rails I can set up to make that turnaround less shitty, but even cloning a backup database in the worst case is painful. There's no reason the expressiveness of SQL can't embrace complex expressions as a single statement to encapsulate a whole transaction in a single expression. CTEs already do most of that, they just don't have any transactional capability.
Edit: I also don't just use psql, because having normal editing controls that I'm used to in my operating system is muscle memory I leverage and also error prone if I try to use the wrong tool with the wrong muscle memory. I know how to move a caret around every single Mac app, but it works differently the moment I enter anything other than emacs in my terminal.
> You can inspect the changes in flight by inserting a select-statement
All true, but you still can't express a transaction as a single expression.
> Interesting, Hacker News's user interface puts no reply link below your comment. So I could not reply directly. I guess it allows only so many nested replies. This is probably an optimization for readability or to avoid flamewars
They just hide the reply link at a certain depth of threading. You can navigate to the parent comment (by parent link or datestamp depending) and reply.
The ironic thing is that the way developers like to deal with data today is more like how they did in the early days of COBOL, to which SQL was an improvement.
The first computerized databases were navigational, which just means hierarchical objects. You "navigated" through the data to find the parts that were interesting for a given query, just like with JSON you might loop around through the properties. In 1973 Charles Bachman wrote a book called The Programmer as Navigator.
These data structures were insidious, because: (1) you wind up with duplicated data, vulnerable to getting out of sync with itself, and (2) complex queries can get slow. For example, imagine an array of Customer objects. Each has an Orders field, which is an array of Order objects. Each of those has, among other things, fields for the item name and description, and so on. With this structure, it's easy to fetch all the orders of a certain customer, but it's slow and complex for other queries, like the total number of orders for each item. For that, you might duplicate the data into a different structure. It was just like NoSQL, only there was no SQL at the time. It was PreSQL.
Programmers are immediately attracted to such data structures because they are amenable to the first few pages you have in mind to build. It's really easy to run those nested objects through a template and output HTML, and it's straightforward to take data from a form and save it as one of these objects. As your application grows and spirals, though, your original data structures become more and more cumbersome and less suited to the new pages you have to make.
This was a problem in the 1960s and 70s just as much as today, which is why E. F. Codd wrote his papers, most famously "A Relational Model of Data for Large Shared Data Banks." You might say, relational? Those old navigatorial objects sounded like they had lots of relationships. But it is a popular misconception that Relational here meant the relationships among tables (i.e., foreign keys). Dr. Codd was a mathematician, and he meant the mathematical term relation, which is essentially a grid of values, a table. So they were called relational databases not because you could relate one table to another but because they were databases made up of relations (tables).
Tabular data solves the speed problem in navigational data. But now fetching your data is even more tedious, if you have to navigate those tables by hand (Loop through Table A. If the value in cell 12 > 10, then save it to a temporary variable...). But in that very same paper, Codd also proposed a very high-level language for working with the tables. It wasn't called SQL. IBM came up with SQL specifically, after examining Codd's papers (who in fact was a researcher at IBM). Believe me, SQL was an improvement. Codd's original language, called Alpha, was mathematical hieroglyphics. The foundation was solid but the user friendliness was lacking. SQL was an attempt to have the same nature but resemble English instead of Mathematics.
But the two pillars, tabular data structures and a high-level query language, were introduced simultaneously and are both equally part of what makes SQL what it is. Which one would you like to remove?
Chesterton's Fence comes to mind when watching programmers meet SQL:
"In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, 'I don't see the use of this; let us clear it away.' To which the more intelligent type of reformer will do well to answer: 'If you don't see the use of it, I certainly won't let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.'" --- https://en.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fence