Automatically Finding Prompt Injection Attacks

Researchers have just published a paper showing how to automate the discovery of prompt injection attacks.

That one works on the ChatGPT-3.5-Turbo model, and causes it to bypass its safety rules about not telling people how to build bombs.

llm-attacks.org

Sign in to participate in the conversation

CounterSocial is the first Social Network Platform to take a zero-tolerance stance to hostile nations, bot accounts and trolls who are weaponizing OUR social media platforms and freedoms to engage in influence operations against us. And we're here to counter it.