Today I came across an interesting case where 3 well-known LLMs (O1, sonnet 3.7 and Deepseek R1) found a "bug" that actually didn't exist.
Very briefly, in a fused cuda kernel, I was using thread i to do some stuff on locations i, i+N, i+2*N of an array. Later in the same kernel, same thread operated on i,i+1,i+2. All LLMs flagged the second part as bug. Not the most optimized code maybe, but definitely not a bug.
It wasn't a complicated kernel (~120 SLOC) either, and the distance between the two code blocks was about only 15 LOC.
Very briefly, in a fused cuda kernel, I was using thread i to do some stuff on locations i, i+N, i+2*N of an array. Later in the same kernel, same thread operated on i,i+1,i+2. All LLMs flagged the second part as bug. Not the most optimized code maybe, but definitely not a bug.
It wasn't a complicated kernel (~120 SLOC) either, and the distance between the two code blocks was about only 15 LOC.