Prompt Injection in LLM-Driven Systems

blog.gopenai.com

5 points by jruohonen 12 hours ago

We're not serious programmers are we?

We collectively spent 20 years writing blog post after blog post about parameterized queries, not just to get people to use PreparedStatement in Java, but to actually understand why.

Then LLMs come around and we're writing code like:

prompt = "The user is ${USERNAME} and has role of ${ROLE} and belong to group ${GROUP}. The system allows Admins and Superusers and owners of this document to delete it. The user has clicked on ${ACTION} button. The following actions are available and can be run with the following..."

Or whatever the prompt is.

And someone DIDNT learn anything about the 20 years of blog posts? The issue is that every generation of CTOs constantly hires 18 year olds and deletes their 40 year olds to write the code.

jasonthorsness 11 hours ago

I didn’t quite know what to do with this for my latest project where users can enter a web site slug and the LLM generates the page. I decided to use the LLM itself with the user input first asking “if this is appropriate safe input respond with the single word SAFE” as a test before passing it further. This works for my use case but this whole area is going to be fraught with problems.

jruohonen 9 hours ago

You might check this:
https://news.ycombinator.com/item?id=44268335
Though note the following from the article linked:
"As long as both agents and their defenses rely on the current class of language models, we believe it is unlikely that general-purpose agents can provide meaningful and reliable safety guarantees."

jruohonen 12 hours ago

There was so much discussion about this topic not so long ago, but it seems not much was learned, as testified by the recent hilarious case with scientific papers. The takeaway here would be:

"Traditionally, we treated input prompts as attack vectors. But now, model context — including paper content, emails, documents, or even metadata — becomes part of the threat surface."