It's nice, but missing the most valuable (and simplest) take from computer vision: the Hough transforms.
Let's take the circle Hough transform as it's one of the most enlightening ones!
Say you are looking for a circle of a given diameter. After a binarization to make the edge stand out, make all the potential points "vote" for a circle center.
The method is simple: using a matrix, you +1 all the points that are as far from this point as the radius of the circle will allow.
Extension 1: if you don't know the radius, apply iteratively for a range of values, then again, take the max: if you imagine how it works (or code it as an example then animate the result), it's like doing a "mathematical" focus.
Extension 2: if it's too costly to do a dense exploration of the space of values for the radius, while you know there's only one circle, do a gradient descent on the increase.
Extension 3: If there are more that one circle, other techniques exist - the easiest to picture are based on the maximization of variance of the distribution of values in the matrix resulting from the binarization, but you can also use 2d lattices and other fun tricks.
I agree it's very cool, but I have found it to be surprisingly poor in certain scenarios.
A "faint" circle will often score worse than 2 high-contrast parallel lines that happen to be the right distance apart, since the lines manage to trigger pixels along 20% of a circle's arc and their contrast massively inflates their score compared to the faint circle (higher edge pixel density).
It seems like there should be a simple way to weight the results by how dispersed within the circle's arc the pixels are, but I've never dug any further, after hitting this problem I had to move on.
My initial reaction is that adding an angle parameter would help. A circle should have votes from many angles while lines will vote from only a portion of the circle. With some added weight from angled convergence the faint circle could score higher.
Hough is sweet and simple, but it's more or less the brute force method of CV: It comes with awful runtime and memory complexity even in straightforward cases. If it works it works, but more often it's neither efficient nor reliable.
I will take the opportunity to call out one of my favourite libraries, BoofCV (http://boofcv.org)
It comes with a wonderful demonstration tool that allows you to apply the various included algorithms to images and tweak the parameters in real-time – including the Hough transform. A great tool for helping to understand how these kinds of algorithms work!
> After a binarization to make the edge stand out ...
Edge binarization is dependent upon edge detection algorithm choice, threshold algorithm choice, and both of their respective parameters. It's often very difficult to find a set of parameters that aren't brittle due to occlusions, poor contrast, camera noise, etc.
Hough works great if you can do this part confidently. But in my experience, robust edge binarization for Hough is often not very feasible in the wild.
> It's even simpler to make artificial neurons vote for a circle center.
Is it?
It's not conceptually simpler: people can more easily imagine circles around points converging to a center, so they can also put that idea into code more easily.
> you can apply the method to other shapes as well.
Yes you can. Read about Hough.
I just presented the one that is the most enlightening.
I may be biased against neural network approaches and their likes, because I see them as black boxes with failure modes that are hard to predict or work around: I prefer what I can understand and explain, and unfortunately, it seems at odd with the current demographics of ML (cf https://news.ycombinator.com/item?id=27361812 ) who has no clue about what makes these black boxes tick, sometimes even after they get a PhD in the dark art of tweaking black boxes.
One of the nicer things about the hough approach is also that you can get a bunch of other information from parameter space, like horizon lines and vanishing points.
A really ancient secret, one of the grey beards I learned a lot from early in my career told me about how he got CV running on an Apple II way back in the day on the cheap. He decapped a DRAM, and carefully stuck a lens on it. They're not just susceptible to cosmic rays; without the package regular old visible light rays can cause bit flips too. If you look at CMOS sensors these days they actually have quite a bit in common with DRAM.
There was a Byte Magazine article by Steve Ciarcia, in Sept 1983, entitled “A 64K-bit dynamic RAM chip is the visual sensor in this digital image camera,” describing how to do exactly this hack. https://archive.org/details/byte-magazine-1983-09/mode/2up
He was able to turn a RAM chip into a camera, allowing the computer to process a video "feed" simply by polling the right bits in RAM. On a device that would normally be considered much too primitive to do any image processing.
In terms of practical application (e.g. in industry), the biggest bang for your buck is "get the illumination right". Surprised this never appears in the course (at least from glancing over the syllabus and some slides).
Most CV tasks are borderline impossible if your input is acquired under uncontrollable lighting. Whereas the right illumination setup can often let you get away with nothing but a threshold binarization.
Countersignalling. By deviating from the “professional” look so ostentatiously, they signal that they are so good that they don’t need to use the usual look.
And then there are people like me, who seem to believe that if their resume spacing is off by a nanometer, then the entire world will view them as an unemployable failure.
I'd say it is a good example of how not to take yourself too seriously. The contrast is cool too! He's achieved quite a bit and expressed it in such a cool and playful way.
it helped him get an internship at Google where he worked on computer vision. the person who hired him said it was a great resume and he was the smarted guy he ever worked with.
Whoa sounds interesting! I always wondered what happened to him after giving up on YOLO because he felt it was against his morals. I honestly give him props because he probably could of capitalized on his work if he wanted to and play his cards right.
I can attest to this and I have been making reference back to this course a lot and I have recommended it to a few people starting this CV journey. Joseph is also the creator of Yolo, so go figure!
Back in the day, before Vuforia implemented volumetric tracking, I developed my own Pepsi can detector for an augmented reality app, just processing filters.
While the content is definitely great, its outer looks are not so much. I am afraid I value whatever scraps of non-computer, human vision I still have left with me a tad more than learning those cool eldritch secrets... although the reader mode definitely helps.
Alternative opinion: The design of the site is a welcome breath of fresh air. Not every site needs gobs of whitespace, neutral colors and advertisements.
Let's take the circle Hough transform as it's one of the most enlightening ones!
Say you are looking for a circle of a given diameter. After a binarization to make the edge stand out, make all the potential points "vote" for a circle center.
The method is simple: using a matrix, you +1 all the points that are as far from this point as the radius of the circle will allow.
Do this for every point, and take the max: https://en.wikipedia.org/wiki/Circle_Hough_Transform
Simple, and works in guaranteed time.
Extension 1: if you don't know the radius, apply iteratively for a range of values, then again, take the max: if you imagine how it works (or code it as an example then animate the result), it's like doing a "mathematical" focus.
Extension 2: if it's too costly to do a dense exploration of the space of values for the radius, while you know there's only one circle, do a gradient descent on the increase.
Extension 3: If there are more that one circle, other techniques exist - the easiest to picture are based on the maximization of variance of the distribution of values in the matrix resulting from the binarization, but you can also use 2d lattices and other fun tricks.