Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ancient secrets of computer vision (pjreddie.com)
274 points by bjourne on Nov 9, 2021 | hide | past | favorite | 54 comments


It's nice, but missing the most valuable (and simplest) take from computer vision: the Hough transforms.

Let's take the circle Hough transform as it's one of the most enlightening ones!

Say you are looking for a circle of a given diameter. After a binarization to make the edge stand out, make all the potential points "vote" for a circle center.

The method is simple: using a matrix, you +1 all the points that are as far from this point as the radius of the circle will allow.

Do this for every point, and take the max: https://en.wikipedia.org/wiki/Circle_Hough_Transform

Simple, and works in guaranteed time.

Extension 1: if you don't know the radius, apply iteratively for a range of values, then again, take the max: if you imagine how it works (or code it as an example then animate the result), it's like doing a "mathematical" focus.

Extension 2: if it's too costly to do a dense exploration of the space of values for the radius, while you know there's only one circle, do a gradient descent on the increase.

Extension 3: If there are more that one circle, other techniques exist - the easiest to picture are based on the maximization of variance of the distribution of values in the matrix resulting from the binarization, but you can also use 2d lattices and other fun tricks.


I agree it's very cool, but I have found it to be surprisingly poor in certain scenarios.

A "faint" circle will often score worse than 2 high-contrast parallel lines that happen to be the right distance apart, since the lines manage to trigger pixels along 20% of a circle's arc and their contrast massively inflates their score compared to the faint circle (higher edge pixel density).

It seems like there should be a simple way to weight the results by how dispersed within the circle's arc the pixels are, but I've never dug any further, after hitting this problem I had to move on.


My initial reaction is that adding an angle parameter would help. A circle should have votes from many angles while lines will vote from only a portion of the circle. With some added weight from angled convergence the faint circle could score higher.


Hough is sweet and simple, but it's more or less the brute force method of CV: It comes with awful runtime and memory complexity even in straightforward cases. If it works it works, but more often it's neither efficient nor reliable.


The Hough is a good one! Also Invariant Moments: https://en.m.wikipedia.org/wiki/Image_moment


I will take the opportunity to call out one of my favourite libraries, BoofCV (http://boofcv.org)

It comes with a wonderful demonstration tool that allows you to apply the various included algorithms to images and tweak the parameters in real-time – including the Hough transform. A great tool for helping to understand how these kinds of algorithms work!


As someone who does computer vision for a living, you're going to need to explain how this is:

1. The most valuable take from computer vision

2. The simplest take from computer vision

Not to mention this is rarely useful unless you're in a specific context where you're looking for circles in an image.


As someone who has done a bit of CV too, I'd like to know what are the 2 or 3 algorithms you think are really useful ?

(personally I was impressed by mixture of gaussians background removal))


> After a binarization to make the edge stand out ...

Edge binarization is dependent upon edge detection algorithm choice, threshold algorithm choice, and both of their respective parameters. It's often very difficult to find a set of parameters that aren't brittle due to occlusions, poor contrast, camera noise, etc.

Hough works great if you can do this part confidently. But in my experience, robust edge binarization for Hough is often not very feasible in the wild.



Hough transform is awesome and it was patented in 1962!


> After a binarization to make the edge stand out, make all the potential points "vote" for a circle center.

It's even simpler to make artificial neurons vote for a circle center.

You don't need the binarization step, and you can apply the method to other shapes as well.


> It's even simpler to make artificial neurons vote for a circle center.

Is it?

It's not conceptually simpler: people can more easily imagine circles around points converging to a center, so they can also put that idea into code more easily.

> you can apply the method to other shapes as well.

Yes you can. Read about Hough.

I just presented the one that is the most enlightening.

I may be biased against neural network approaches and their likes, because I see them as black boxes with failure modes that are hard to predict or work around: I prefer what I can understand and explain, and unfortunately, it seems at odd with the current demographics of ML (cf https://news.ycombinator.com/item?id=27361812 ) who has no clue about what makes these black boxes tick, sometimes even after they get a PhD in the dark art of tweaking black boxes.


One of the nicer things about the hough approach is also that you can get a bunch of other information from parameter space, like horizon lines and vanishing points.


The hough transform generalizes to other shapes as well


A really ancient secret, one of the grey beards I learned a lot from early in my career told me about how he got CV running on an Apple II way back in the day on the cheap. He decapped a DRAM, and carefully stuck a lens on it. They're not just susceptible to cosmic rays; without the package regular old visible light rays can cause bit flips too. If you look at CMOS sensors these days they actually have quite a bit in common with DRAM.


There was a Byte Magazine article by Steve Ciarcia, in Sept 1983, entitled “A 64K-bit dynamic RAM chip is the visual sensor in this digital image camera,” describing how to do exactly this hack. https://archive.org/details/byte-magazine-1983-09/mode/2up


I think I'm missing the point. What does any of this have to do with computer vision?


He was able to turn a RAM chip into a camera, allowing the computer to process a video "feed" simply by polling the right bits in RAM. On a device that would normally be considered much too primitive to do any image processing.


Oh my god. That's amazing. I would never have believed that's possible.


Right!? I was skeptical of the story until 'dougabug posted the link to Byte Magazine.


Oh god it's the xkcd but real.


In terms of practical application (e.g. in industry), the biggest bang for your buck is "get the illumination right". Surprised this never appears in the course (at least from glancing over the syllabus and some slides).

Most CV tasks are borderline impossible if your input is acquired under uncontrollable lighting. Whereas the right illumination setup can often let you get away with nothing but a threshold binarization.


Seconded. Learned this the hard way when I bit off more than I could chew aka I took on the sun.


What's the best CV course nowadays that comes with videos, assignments, hws, etc.?

It used to be the one taught by Justin Johnson at UMich [0].

But the publicly available videos have been last updated in 2019.

[0]: https://web.eecs.umich.edu/~justincj/teaching/eecs498/FA2020...


I like Andreas Geiger’s lectures on U. Tübingen [0]. Quite recent, and I think the topics they cover are good.

[0] https://uni-tuebingen.de/fakultaeten/mathematisch-naturwisse...


Content is solid. I'm afraid I can't ignore the author's resume link... https://pjreddie.com/static/Redmon%20Resume.pdf


Hm. Do you think this is deliberate to filter out people with certain prejudices? Or do they genuinely think it’s a good design?


Countersignalling. By deviating from the “professional” look so ostentatiously, they signal that they are so good that they don’t need to use the usual look.


It’s PJ Reddie after all, the YOLO guy.


For those who need a bit more context, YOLO the object recognition ML model. He can work anywhere he wants.


If he wanted to work on computer vision instead of a vegan restaurant (or something similar, I forget exactly).


And then there are people like me, who seem to believe that if their resume spacing is off by a nanometer, then the entire world will view them as an unemployable failure.


YOLO was such a shake up of the computer vision space that he could probably get hired just about anywhere with a resume crudely written in crayon.


The charts in this paper are hilarious: https://pjreddie.com/media/files/papers/YOLOv3.pdf

Previous authors didn’t start their axes at 0, so he kept their axes and just put the timing for YOLO outside the original chart area.


I love section 4 — “things we tried that didn’t work”


the space in the filename does it for me. What a savage.


I'd say it is a good example of how not to take yourself too seriously. The contrast is cool too! He's achieved quite a bit and expressed it in such a cool and playful way.


it helped him get an internship at Google where he worked on computer vision. the person who hired him said it was a great resume and he was the smarted guy he ever worked with.


Solid - certainly memorable. I like it a lot better than those ones with meters that show your levels of various skills


For the jobs he's going to be successful at, he doesn't need a CV.


Well, it is almost unforgettable, is not it? "Outstanding".


Whoa sounds interesting! I always wondered what happened to him after giving up on YOLO because he felt it was against his morals. I honestly give him props because he probably could of capitalized on his work if he wanted to and play his cards right.


From the lab homepage it seems that he eventually graduated with a PhD (https://raivn.cs.washington.edu/people.html).

A few years ago he said he’d thought about quitting research and opening a vegan cafe or something. Not sure what he’s planning to do now though.


Sorry, grammar pet peeve of mine: "could have" not "could of" :) cheers!


I can attest to this and I have been making reference back to this course a lot and I have recommended it to a few people starting this CV journey. Joseph is also the creator of Yolo, so go figure!


Hausdorff distance is also a simple probabilistic technique that works quite well:

https://ecommons.cornell.edu/bitstream/handle/1813/6165/92-1...

In a different life I toyed with it a bit: http://pugoob.blogspot.com/2008/01/pugoob-image-search-tool....


Back in the day, before Vuforia implemented volumetric tracking, I developed my own Pepsi can detector for an augmented reality app, just processing filters.


Watched this lectures 2 years ago and It's pretty solid!


While the content is definitely great, its outer looks are not so much. I am afraid I value whatever scraps of non-computer, human vision I still have left with me a tad more than learning those cool eldritch secrets... although the reader mode definitely helps.


Alternative opinion: The design of the site is a welcome breath of fresh air. Not every site needs gobs of whitespace, neutral colors and advertisements.


Then you should definitely not go to YouTube and search for "vaporwave". Definitely definitely definitely.


That logo is exceptionally hideous.


In the best way, for me. Gives it personality.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: