Principles Are Compressed Images Of Reality
Reality has substantial detail. There's no way to search your past experiences for things, or to communicate things to someone else. But if we could have constant time transfer and constant time lookup we could resolve observed things against our past memories in their full form. There is a size+accuracy vs. speed tradeoff which is why we don't.
So what we do is create principles - which try to capture the dominant eigenvalues, most significant digits, highest-order bits of an observed experience. These are easier to transmit, less likely to be overfit, and easier to store.
By default we are pretty good at this stuff, and we call it pattern recognition and so on. But we also fail at it in many ways.
So what makes a principle more useful? I suppose we go in direction of increasing certainty from:
1. Hunch
2. Heuristic
3. Principle
4. Law
And counter-examples are therefore interesting only in terms of which of these forms of reality-compressions are being claimed or employed. The more compressible the concept, the more likely we move from the risky models to the certain models.