"Cybersecurity researchers have uncovered a jailbreak technique to bypass ethical guardrails erected by OpenAI in its latest large language model (LLM) GPT-5 and produce illicit instructions.
"Generative artificial intelligence (AI) security platform NeuralTrust said it combined a known technique called Echo Chamber with narrative-driven steering to trick the model into producing undesirable responses."
If you watched War Games, you can picture a dangerous combination of such steering to produce potentially dangerous results. I want AI nowhere near any nuclear weapons or even any conventional weapons without a human in the loop. That was the big lesson of War Games
I highly recommend the article here: https://developer.nvidia.com/blog/how-hackers-exploit-ais-problem-solving-instincts/
ReplyDelete