"Automated Unit Test Improvement using Large Language Models at Meta" (2024) https://arxiv.org/abs/2402.09171 :
> This paper describes Meta's TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of filters that assure measurable improvement over the original test suite, thereby eliminating problems due to LLM hallucination. [...] We believe this is the first report on industrial scale deployment of LLM-generated code backed by such assurances of code improvement.
Coverage-guided unit test improvement might [with LLMs] be efficient too.
https://github.com/topics/coverage-guided-fuzzing :
- e.g. Google/syzkaller is a coverage-guided syscall fuzzer: https://github.com/google/syzkaller
- Gitlab CI supports coverage-guided fuzzing: https://docs.gitlab.com/ee/user/application_security/coverag...
- oss-fuzz, osv
Additional ways to improve tests:
Hypothesis and pynguin generate tests from type annotations.
There are various tools to generate type annotations for Python code;
> pytype (Google) [1], PyAnnotate (Dropbox) [2], and MonkeyType (Instagram) [3] all do dynamic / runtime PEP-484 type annotation type inference [4] to generate type annotations. https://news.ycombinator.com/item?id=39139198
icontract-hypothesis generates tests from icontract DbC Design by Contract type, value, and invariance constraints specified as precondition and postcondition @decorators: https://github.com/mristin/icontract-hypothesis
Nagini and deal-solver attempt to Formally Verify Python code with or without unit tests: https://news.ycombinator.com/item?id=39139198
Additional research:
"Fuzz target generation using LLMs" (2023) https://google.github.io/oss-fuzz/research/llms/target_gener... https://security.googleblog.com/2023/08/ai-powered-fuzzing-b... https://hn.algolia.com/?q=AI-Powered+Fuzzing%3A+Breaking+the...
OSSF//fuzz-introspector//doc/Features.md: https://github.com/ossf/fuzz-introspector/blob/main/doc/Feat...
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C43&q=Fuz... :
- "Large Language Models Based Fuzzing Techniques: A Survey" (2024) https://arxiv.org/abs/2402.00350 : > This survey provides a systematic overview of the approaches that fuse LLMs and fuzzing tests for software testing. In this paper, a statistical analysis and discussion of the literature in three areas, namely LLMs, fuzzing test, and fuzzing test generated based on LLMs, are conducted by summarising the state-of-the-art methods up until 2024
Thanks for sharing this. By far the best tool I've seen in the market centered around Code Integrity is CodiumAI (https://www.codium.ai/). They generate unit test based on entire code repos. Also integrates into SDLC through a PR Agent on GitHub or GitLab. My whole team uses them.
Any take on whether an LLM trained solely on formally verified code will generate unverifiable code?