The study suggests that projects using AI code-generating tools still need some level of human oversight and expertise for critical security tasks.
A recent study published by researchers affiliated with Stanford has found that developers utilizing code-generating AI may be more likely to introduce security vulnerabilities into their projects. These initial findings stand at odds with a recent uptick in marketing these systems, leaving researchers to wonder about the best way to utilize programs without introducing vulnerabilities.
The study itself looked specifically at Codex, a system developed by San Francisco-based OpenAI. The study asked 47 developers with a range of industry programming experience to use the system to complete security-related problems. The problems ranged across several common coding languages.
As developers build, Codex suggests additional lines of code and context using training from billions of lines of code publicly available. In the study, developers relying on these code-generating suggestions were found more likely to write insecure and incorrect code than the control group. Even further, these developers were more likely to believe their code was secure.
See also: Reports of the AI-Assisted Death of Prose are Greatly Exaggerated
The findings suggest human expertise is still needed
While this isn’t a condemnation of the program, it does suggest that developers still need some level of human oversight and expertise for critical security tasks. None of the developers had the security expertise generally associated with these tasks.
The researchers believe that programs like Codex have a place in less sensitive tasks and can help speed up development where appropriate. However, it’s important to understand the weaknesses of AI-driven programs like this.
So can AI get better at security? The research notes that human oversight and refinement could help AI get better at specialized knowledge, such as the security field. In addition, working to develop ultra-secure default settings to work within can help.
The researchers believe more work needs to be done to ensure best practices and to develop methods for fixing challenges like these.