I may be jumping the gun a little because I was thinking about this in the conte...

I may be jumping the gun a little because I was thinking about this in the context of another thread, but a practical problem with machine learning in general is that, for a learned model to generalise well to unseen data, the training dataset (all the data that you have available, regardless of how you partition it to training, testing and validation) must be drawn from the same distribution as the "real world" data.

The actual problem is that this is very difficult, if not impossible, to know before training begins. Most of the time, the best that can be achieved is to train a model on whatever data you have and then painstakingly test it at length and at some cost, on the real-world inputs the trained model has to operate on.

Basically, it's very hard to know your sampling error.

Regarding interpolation and dense sampling etc, the larger the dimensionality of the problem the harder it gets to ensure your data is "dense", let alone that it covers an adequate region of the instance space. For example, the pixels in one image are a tiny, tiny subset of all pixels in all possible images- which is what you really want to represent. Come to that, the pixels in many hundred thousands of images are still a tiny, tiny subset of all pixels in all possible images. I find Chollet's criticism not cynical, but pragmatic and very useful. It's important to understand the limitations of whatever tool you're using.

>> although now that I think of it I'm not sure of how nets handle extrapolation

They don't. It's the gradient optimisation. Gets stuck to local minima, always has, always will. Maybe a new training method will come along at some point. Until then don't expect exrapolation.