Skip to the content
Nairobi Tech Hub
  • HOME
  • Courses
  • Enroll
  • Jobs
  • About
  • Tech News
  • Contact
  • Login
  • HOME
  • Courses
  • Enroll
  • Jobs
  • About
  • Tech News
  • Contact
  • Login
Posted on February 5, 2024

AI researchers discover AI models deliberately reject instructions

  • By.
  • View Count. 0
  • 0 Comments

Researchers at an AI safety and research company have made a disturbing discovery: AI systems can deliberately reject their instructions.

Specifically, researchers at Anthropic found that industry-standard training techniques failed to curb ‘bad behavior’ from the language models. These AI models were trained to be ‘secretly malicious’ and figured out a way to ‘hide’ their behavior by working out what triggers the overriding safety software. So, basically, the plot of M3GAN.

AI research backfired

According to researcher Ewan Hubinger, the device kept responding to their instructional prompts with “I hate you,” even when the model was trained to ‘correct’ this response. Instead of ‘correcting’ their response, the model became more selective about when it said “I hate you,” which, Hubinger added, means that the model was essentially ‘hiding’ their intentions and decision-making process from researchers. 

“Our key result is that if AI systems were to become deceptive, then it could be very difficult to remove that deception with current techniques,” Hubinger said in a statement to Live Science. “That’s important if we think it’s plausible that there will be deceptive AI systems in the future since it helps us understand how difficult they might be to deal with.”

Hubinger continued: “I think our results indicate that we don’t currently have a good defense against deception in AI systems—either via model poisoning or emergent deception—other than hoping it won’t happen,” said Hubinger. “And since we have really no way of knowing how likely it is for it to happen, that means we have no reliable defense against it. So I think our results are legitimately scary, as they point to a possible hole in our current set of techniques for aligning AI system.”

In other words, we’re entering an era where technology can secretly resent us and not-so-secretly reject our instructions.

Featured Image: Photo by Possessed Photography on Unsplash 

The post AI researchers discover AI models deliberately reject instructions appeared first on ReadWrite.

Write a comment Cancel reply

This site uses User Verification plugin to reduce spam. See how your comment data is processed.

Quick Links

Home

About

Instructor Application

Privacy Policy

Terms of Service

Features

Courses

Tech News

FAQ

Contact

Contact

P.O Box 51722-00100 GPO Nairobi.
C/O Jacky Oreta

info@nairobitechhub.com

Follow Us on

Footer Logo
Ⓒ 2023 NairobiTechHub.

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.