Advancements in Explainability Techniques for Deep Neural Networks
Abstract
As deep neural networks (DNNs) have become prevalent in various domains, their decision-making processes often remain opaque, leading to a demand for methods that enhance interpretability. This paper reviews current techniques for explaining DNN models, focusing on both post-hoc and intrinsic methods. Post-hoc methods aim to explain models after training, while intrinsic methods are designed to improve interpretability during the training process. We analyze several prominent techniques, including visualization methods, feature attribution, surrogate models, and attention mechanisms, evaluating their strengths and limitations. The paper also discusses the trade-offs between interpretability and model performance and outlines future directions for research in this field.