Ancient secrets of computer vision

csdvrx · on Nov 9, 2021

It's nice, but missing the most valuable (and simplest) take from computer vision: the Hough transforms.

Let's take the circle Hough transform as it's one of the most enlightening ones!

Say you are looking for a circle of a given diameter. After a binarization to make the edge stand out, make all the potential points "vote" for a circle center.

The method is simple: using a matrix, you +1 all the points that are as far from this point as the radius of the circle will allow.

Do this for every point, and take the max: https://en.wikipedia.org/wiki/Circle_Hough_Transform

Simple, and works in guaranteed time.

Extension 1: if you don't know the radius, apply iteratively for a range of values, then again, take the max: if you imagine how it works (or code it as an example then animate the result), it's like doing a "mathematical" focus.

Extension 2: if it's too costly to do a dense exploration of the space of values for the radius, while you know there's only one circle, do a gradient descent on the increase.

Extension 3: If there are more that one circle, other techniques exist - the easiest to picture are based on the maximization of variance of the distribution of values in the matrix resulting from the binarization, but you can also use 2d lattices and other fun tricks.

scratcheee · on Nov 9, 2021

I agree it's very cool, but I have found it to be surprisingly poor in certain scenarios.

A "faint" circle will often score worse than 2 high-contrast parallel lines that happen to be the right distance apart, since the lines manage to trigger pixels along 20% of a circle's arc and their contrast massively inflates their score compared to the faint circle (higher edge pixel density).

It seems like there should be a simple way to weight the results by how dispersed within the circle's arc the pixels are, but I've never dug any further, after hitting this problem I had to move on.

SequoiaHope · on Nov 9, 2021

My initial reaction is that adding an angle parameter would help. A circle should have votes from many angles while lines will vote from only a portion of the circle. With some added weight from angled convergence the faint circle could score higher.

MauranKilom · on Nov 10, 2021

Hough is sweet and simple, but it's more or less the brute force method of CV: It comes with awful runtime and memory complexity even in straightforward cases. If it works it works, but more often it's neither efficient nor reliable.

sasaf5 · on Nov 9, 2021

The Hough is a good one! Also Invariant Moments: https://en.m.wikipedia.org/wiki/Image_moment

matthewmacleod · on Nov 9, 2021

I will take the opportunity to call out one of my favourite libraries, BoofCV (http://boofcv.org)

It comes with a wonderful demonstration tool that allows you to apply the various included algorithms to images and tweak the parameters in real-time – including the Hough transform. A great tool for helping to understand how these kinds of algorithms work!

slingnow · on Nov 9, 2021

As someone who does computer vision for a living, you're going to need to explain how this is:

1. The most valuable take from computer vision

2. The simplest take from computer vision

Not to mention this is rarely useful unless you're in a specific context where you're looking for circles in an image.

wiz21c · on Nov 9, 2021

As someone who has done a bit of CV too, I'd like to know what are the 2 or 3 algorithms you think are really useful ?

(personally I was impressed by mixture of gaussians background removal))

chestervonwinch · on Nov 10, 2021

> After a binarization to make the edge stand out ...

Edge binarization is dependent upon edge detection algorithm choice, threshold algorithm choice, and both of their respective parameters. It's often very difficult to find a set of parameters that aren't brittle due to occlusions, poor contrast, camera noise, etc.

Hough works great if you can do this part confidently. But in my experience, robust edge binarization for Hough is often not very feasible in the wild.

abetusk · on Nov 9, 2021

Some links that helped me understand the Hough transform:

* https://towardsdatascience.com/lines-detection-with-hough-tr...

* https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.2....

zwieback · on Nov 9, 2021

Hough transform is awesome and it was patented in 1962!

amelius · on Nov 9, 2021

> After a binarization to make the edge stand out, make all the potential points "vote" for a circle center.

It's even simpler to make artificial neurons vote for a circle center.

You don't need the binarization step, and you can apply the method to other shapes as well.

csdvrx · on Nov 9, 2021

> It's even simpler to make artificial neurons vote for a circle center.

Is it?

It's not conceptually simpler: people can more easily imagine circles around points converging to a center, so they can also put that idea into code more easily.

> you can apply the method to other shapes as well.

Yes you can. Read about Hough.

I just presented the one that is the most enlightening.

I may be biased against neural network approaches and their likes, because I see them as black boxes with failure modes that are hard to predict or work around: I prefer what I can understand and explain, and unfortunately, it seems at odd with the current demographics of ML (cf https://news.ycombinator.com/item?id=27361812 ) who has no clue about what makes these black boxes tick, sometimes even after they get a PhD in the dark art of tweaking black boxes.

edge17 · on Nov 9, 2021

One of the nicer things about the hough approach is also that you can get a bunch of other information from parameter space, like horizon lines and vanishing points.

edge17 · on Nov 9, 2021

The hough transform generalizes to other shapes as well

monocasa · on Nov 9, 2021

A really ancient secret, one of the grey beards I learned a lot from early in my career told me about how he got CV running on an Apple II way back in the day on the cheap. He decapped a DRAM, and carefully stuck a lens on it. They're not just susceptible to cosmic rays; without the package regular old visible light rays can cause bit flips too. If you look at CMOS sensors these days they actually have quite a bit in common with DRAM.

dougabug · on Nov 9, 2021

There was a Byte Magazine article by Steve Ciarcia, in Sept 1983, entitled “A 64K-bit dynamic RAM chip is the visual sensor in this digital image camera,” describing how to do exactly this hack. https://archive.org/details/byte-magazine-1983-09/mode/2up

dapids · on Nov 9, 2021

I think I'm missing the point. What does any of this have to do with computer vision?

LukeShu · on Nov 9, 2021

He was able to turn a RAM chip into a camera, allowing the computer to process a video "feed" simply by polling the right bits in RAM. On a device that would normally be considered much too primitive to do any image processing.

makeworld · on Nov 10, 2021

Oh my god. That's amazing. I would never have believed that's possible.

LukeShu · on Nov 10, 2021

Right!? I was skeptical of the story until 'dougabug posted the link to Byte Magazine.

hoseja · on Nov 10, 2021

Oh god it's the xkcd but real.

MauranKilom · on Nov 10, 2021

In terms of practical application (e.g. in industry), the biggest bang for your buck is "get the illumination right". Surprised this never appears in the course (at least from glancing over the syllabus and some slides).

Most CV tasks are borderline impossible if your input is acquired under uncontrollable lighting. Whereas the right illumination setup can often let you get away with nothing but a threshold binarization.

CrimpCity · on Nov 10, 2021

Seconded. Learned this the hard way when I bit off more than I could chew aka I took on the sun.

rg111 · on Nov 9, 2021

What's the best CV course nowadays that comes with videos, assignments, hws, etc.?

It used to be the one taught by Justin Johnson at UMich [0].

But the publicly available videos have been last updated in 2019.

[0]: https://web.eecs.umich.edu/~justincj/teaching/eecs498/FA2020...

ibrarmalik · on Nov 9, 2021

I like Andreas Geiger’s lectures on U. Tübingen [0]. Quite recent, and I think the topics they cover are good.

[0] https://uni-tuebingen.de/fakultaeten/mathematisch-naturwisse...

bloqs · on Nov 9, 2021

Content is solid. I'm afraid I can't ignore the author's resume link... https://pjreddie.com/static/Redmon%20Resume.pdf

jstx1 · on Nov 9, 2021

Hm. Do you think this is deliberate to filter out people with certain prejudices? Or do they genuinely think it’s a good design?

canjobear · on Nov 9, 2021

Countersignalling. By deviating from the “professional” look so ostentatiously, they signal that they are so good that they don’t need to use the usual look.

randomluck040 · on Nov 9, 2021

It’s PJ Reddie after all, the YOLO guy.

monocasa · on Nov 9, 2021

For those who need a bit more context, YOLO the object recognition ML model. He can work anywhere he wants.

timy2shoes · on Nov 9, 2021

If he wanted to work on computer vision instead of a vegan restaurant (or something similar, I forget exactly).

jointpdf · on Nov 9, 2021

And then there are people like me, who seem to believe that if their resume spacing is off by a nanometer, then the entire world will view them as an unemployable failure.

reedf1 · on Nov 9, 2021

YOLO was such a shake up of the computer vision space that he could probably get hired just about anywhere with a resume crudely written in crayon.

jacobolus · on Nov 9, 2021

The charts in this paper are hilarious: https://pjreddie.com/media/files/papers/YOLOv3.pdf

Previous authors didn’t start their axes at 0, so he kept their axes and just put the timing for YOLO outside the original chart area.

rsfern · on Nov 10, 2021

I love section 4 — “things we tried that didn’t work”

enriquto · on Nov 9, 2021

the space in the filename does it for me. What a savage.

bozzy · on Nov 9, 2021

I'd say it is a good example of how not to take yourself too seriously. The contrast is cool too! He's achieved quite a bit and expressed it in such a cool and playful way.

dekhn · on Nov 9, 2021

it helped him get an internship at Google where he worked on computer vision. the person who hired him said it was a great resume and he was the smarted guy he ever worked with.

version_five · on Nov 9, 2021

Solid - certainly memorable. I like it a lot better than those ones with meters that show your levels of various skills

named-user · on Nov 9, 2021

For the jobs he's going to be successful at, he doesn't need a CV.

mdp2021 · on Nov 9, 2021

Well, it is almost unforgettable, is not it? "Outstanding".

syntaxing · on Nov 9, 2021

Whoa sounds interesting! I always wondered what happened to him after giving up on YOLO because he felt it was against his morals. I honestly give him props because he probably could of capitalized on his work if he wanted to and play his cards right.

cyber_kinetist · on Nov 10, 2021

From the lab homepage it seems that he eventually graduated with a PhD (https://raivn.cs.washington.edu/people.html).

A few years ago he said he’d thought about quitting research and opening a vegan cafe or something. Not sure what he’s planning to do now though.

yboris · on Nov 9, 2021

Sorry, grammar pet peeve of mine: "could have" not "could of" :) cheers!

bozzy · on Nov 9, 2021

I can attest to this and I have been making reference back to this course a lot and I have recommended it to a few people starting this CV journey. Joseph is also the creator of Yolo, so go figure!

Edmond · on Nov 9, 2021

Hausdorff distance is also a simple probabilistic technique that works quite well:

https://ecommons.cornell.edu/bitstream/handle/1813/6165/92-1...

In a different life I toyed with it a bit: http://pugoob.blogspot.com/2008/01/pugoob-image-search-tool....

ffhhj · on Nov 9, 2021

Back in the day, before Vuforia implemented volumetric tracking, I developed my own Pepsi can detector for an augmented reality app, just processing filters.

redeyed · on Nov 10, 2021

Watched this lectures 2 years ago and It's pretty solid!

Joker_vD · on Nov 9, 2021

While the content is definitely great, its outer looks are not so much. I am afraid I value whatever scraps of non-computer, human vision I still have left with me a tad more than learning those cool eldritch secrets... although the reader mode definitely helps.

lolsal · on Nov 9, 2021

Alternative opinion: The design of the site is a welcome breath of fresh air. Not every site needs gobs of whitespace, neutral colors and advertisements.

jerf · on Nov 9, 2021

Then you should definitely not go to YouTube and search for "vaporwave". Definitely definitely definitely.

batguano · on Nov 9, 2021

That logo is exceptionally hideous.

fortyseven · on Nov 9, 2021

In the best way, for me. Gives it personality.