Hmm... I'll be honest, I don't think this explanation in the blogpost works for me. However, I understand that everyone thinks differently, and its VERY important to proliferate "different viewpoints" when it comes to explanations. Especially with concepts like SSA which are extremely important to modern compiler theory.
So with that being said, here's how I'd change the post.
---------
I think what's missing here is the concept of "Basic Blocks", the ability for a compiler to break down code into smaller pieces. SSA is an improvement on top of Basic Blocks, so its just... odd... that the blog post tries to talk about the benefits of SSA without talking about basic blocks first.
A basic block is a simple set of code that executes from beginning to end, without any loops or if-statements (or other control flow) associated with it. There are compiler-specific details (do function calls count as a basic block? Or do "leaf functions" stay within the block?) But the general gist is to encode if/else and loop statements in the form of a graph.
Breaking up the "helpful_open" function (see the blogpost) into basic blocks, we have:
Notice, there are two blocks that can lead to C. If there were a loop, blocks can lead back into themselves.
-------------
Now that we've explicitly labeled the code-flow graph, it is obvious what the Phi-function does. The phi-function causes one variable (the "flags" variable in C) to become associated with variables from OTHER blocks (either A, or B).
Furthermore: Phi functions are ALWAYS at the start of basic blocks (in this case: B, and C). It seems absolutely imperative to visualize the basic-blocks explicitly if you wish to truly understand SSA.
----------
But why do we make Phi-functions and SSA? Well, that's another question entirely. Long-story short, variables broken up into Phi-functions are easier to keep track of. But that's a story for probably another day. Or really, that's a story that the blogpost covers pretty well, so I don't feel like critiquing that part of the post. :-)
You're right that, at least in LLVM, SSA form is tied inextricably to basic blocks (since phis encode their source blocks). However, the theory of operation for SSA itself isn't tied to any particular control flow representation: it only stipulates the two rules mentioned in the post.
My goal with this post was to introduce SSA at a high level and show why it makes a few optimizations simpler, without requiring readers to understand the other parts of an optimizing compiler (like why we reduce the CFG's representation to a graph of basic blocks). It's why I chose to use pseudo-C for some of the SSA reduction examples, rather than LLVM IR. But I certainly understand that this approach doesn't work for everybody :-)
I certainly appreciate the difficulty of choosing which subjects need to be explored: too many subjects and you become unfocused. Too few, and you run the risk of not providing a solid base to teach the subject matter.
Its overall a good blogpost in any case.
--------
One more "criticism" (but not really).
> Rule #2: Whenever we need to choose a variable based on control flow, we use the Phi function (φ) to introduce a new variable based on our choice.
You might actually benefit from a more complicated example here, rather than the simpler example you chose.
A do-while loop:
int x = 0;
do{
x = x + 1;
} while (x < 100);
This has always been the hardest thing for me to understand, with respect to #2 and SSA in general.
int x_0 = 0;
do{
x_1 = phi(x_0, x_2);
x_2 = x_1 + 1;
} while (x_2 < 100);
On the one hand: choosing simple-to-understand examples is helpful to the reader, because they're simpler to understand. On the other hand, smacking the reader with the most complex case may suddenly spur inspiration and the ability to understand the whole problem.
I guess its a style choice. But something to think about...
So with that being said, here's how I'd change the post.
---------
I think what's missing here is the concept of "Basic Blocks", the ability for a compiler to break down code into smaller pieces. SSA is an improvement on top of Basic Blocks, so its just... odd... that the blog post tries to talk about the benefits of SSA without talking about basic blocks first.
A basic block is a simple set of code that executes from beginning to end, without any loops or if-statements (or other control flow) associated with it. There are compiler-specific details (do function calls count as a basic block? Or do "leaf functions" stay within the block?) But the general gist is to encode if/else and loop statements in the form of a graph.
Breaking up the "helpful_open" function (see the blogpost) into basic blocks, we have:
Block A: "int flags = O_RDWR; tmp = access(fname, F_OK) ".
Block B: if (!tmp){ flags |= O_CREAT; }
Block C: fd = open(..., flags, ...); return fd;
The graph in "graphviz" syntax is:
A -> B
A -> C
B -> C
Notice, there are two blocks that can lead to C. If there were a loop, blocks can lead back into themselves.
-------------
Now that we've explicitly labeled the code-flow graph, it is obvious what the Phi-function does. The phi-function causes one variable (the "flags" variable in C) to become associated with variables from OTHER blocks (either A, or B).
Furthermore: Phi functions are ALWAYS at the start of basic blocks (in this case: B, and C). It seems absolutely imperative to visualize the basic-blocks explicitly if you wish to truly understand SSA.
----------
But why do we make Phi-functions and SSA? Well, that's another question entirely. Long-story short, variables broken up into Phi-functions are easier to keep track of. But that's a story for probably another day. Or really, that's a story that the blogpost covers pretty well, so I don't feel like critiquing that part of the post. :-)