Explaining Explainability: Understanding Concept Activation Vectors
They examine whether CAVs have three properties:
- Consistency: Does the perturbation of the activation
by correspond to the perturbation of the activation by where and are successive layers? They find that this is typically not the case, since, they speculate, the CAVs in successive layers probably encode different aspect of that concept. - Entanglement: Different concepts can be associated - they tested cosine similarity between CAVs of associated, independent or “equivalent” concepts and find a correspondence. They find that when performing Testing with CAV (TCAV), associations can lead to misleading explanations.
- Spatial Dependence: CAVs can be dependent on the position of the concept and how this translates to the activation space. This means that it is possible to use CAVs to check if a model is translation invariant.