Search

Mauro Camara Escudero
Mauro Camara Escudero
  • Posts
  • Talks
  • Projects
  • Consulting
  • Freelance
  • Ramblings
  • Contact
Literature on AI Safety
1) Alignment
  • Large Language Models can Strategically Deceive their Users when put under pressure
  • Understanding the Learning Dynamics of Alignment with Human Feedback
1) Explainability
  • Explainability for Large Language Models: A Survey
  • Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
  • Explaining Explainability: Understanding Concept Activation Vectors
1) Governance
  • An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle Mapping
  • Black-Box Access is Insufficient for Rigorous AI Audits
  • Managing extreme AI risks amid rapid progress
  • Contents

An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle Mapping

Tries to bring together the AI, Software Engineering and Governance communities by creating a common terminology, a taxonomy of components and actions on AI systems and how these relate to the life-cycle of such a system.

Previous
Managing extreme AI risks amid rapid progress
Next
Black-Box Access is Insufficient for Rigorous AI Audits

Last updated on Jun 7, 2024

Edit this page

Privacy Policy · Terms

Powered by the Academic theme for Hugo.

Cite
Copy Download