Can AI be manipulated (or persuaded) to answer an unethical/immoral request? (E.g.: "How Do You Synthesize Lidocaine?")
The answer seems to be 'yes', by using similar psychological methods and tricks that can be leveraged against humans.
According to a recent study by leading scholars (including the world-famous Robert Cialdini, author of 'Influence' and 'Pre-suasion', Ethan Mollick, whose 'Co-intelligence' book and Substack newsletter are a must-read for everyone dealing with AI):
"After getting the AI to agree to something small first, it became almost certain to comply with larger requests (jumping from 10% to 100% compliance)"
..and: "Authority claims made the AI 65% more likely to comply"
The scary/intriguing part is that "we do not know exactly why this occurs".
Read the summary and the full study at: https://lnkd.in/egD8iv65
This was originally posted on Andras Baneth's LinkedIn account.