Can LLMs Do Everything Better?

Understanding AI: Can LLMs Do Everything Better?

May 2, 2024 No Comments

by Greg Blumash

Large language models (LLMs) power the most popular AI platforms in the world, demonstrating incredible pattern-matching prowess, but are they good at detecting secrets like passwords, access tokens, and API keys in text? GitGuardian, which built its reputation on high-accuracy detection of secrets in source code investigated that and shared the data in its annual “State of Secrets Sprawl” report for 2024.

First, the cost of the queries

LLMs “tokenize” their input, chopping it up into smaller bits. So, the first measure GitGuardian tested was the tokens-per-minute performance. Not only were both GPT-4 and GPT-3.5 slower than GitGuardian’s purpose-built secrets detection engine, but they were also significantly slower. In uncapped speed, GPT-4 performed at 10% of GitGuardian’s speed and GPT-3.5 at 66.7%. Capped with API rate limits, the gap grew larger.

GitGuardian scans around ten million documents a day. To scan the same number of documents using GPT-4 via their commercial API would cost $200,000 and take over 3 years (1,157 days) if performed single-threaded under their current rate limits. GPT-3.5 was significantly faster (289 days) and cheaper ($4,000) than GPT-4, but also less capable.

Second, the LLM’s recall

Next, GitGuardian tested the recall abilities of GPT-4 to remember and match secrets patterns in a set of 1,000 documents. It detected just under 85% of the secrets. Notably, these all followed the computationally easier patterns, making the 15.2% it missed (mostly high-entropy password strings) concerning.

Third, the rate of false positives

One of the biggest factors in the value of a secrets detector is the number of false positives it reports. If your security response team is spending time chasing down reports of secrets only to find that the large majority of them aren’t secrets at all, you’re spending a lot of money and time chasing false leads. On top of that, if the signal to noise ratio is bad enough, the signal gets lost within the noise. Over time, your people get conditioned to expect the report to be a false positive and can end up overlooking a real positive as they churn through a highly noisy batch of results.

Low false-positive rates simultaneously reduce your costs and increase the effectiveness of your secrets detection efforts. Therefore, GitGuardian fed a randomly selected set of 1,000 documents containing Python code to both their secrets detection engine and ChatGPT.

ChatGPT identified secrets in over four hundred documents. GitGuardian’s detection engine identified eight secrets which we validated to all be secrets. The two agreed on six. While it seemed ChatGPT detected 50x more secrets, on further inspection it turned out a large majority of the secrets detected by ChatGPT were not secrets at all. They were IP addresses for machines on the local network, placeholders for secrets, etcetera.

The signal-to-noise ratio of ChatGPT results was concerning, and so were the two secrets ChatGPT missed altogether.

In conclusion

ChatGPT and LLMs, in general, are amazing technology, and there’s no intent here to deny their power. But they’re not specialized, and secrets sprawl requires specific detection tooling.

While iterating the LLM prompts* to refine for maximum accuracy, removing rate limits, and heavy parallelization would improve the comparison, both the set-up and operating costs would still run high. At this current stage in development, a specialized AI engine for detecting secrets still provides advantages in cost, speed, and accuracy over giving the task to the most advanced LLM publicly available.

Generalization is great when you want a lot of different things done well, but specialization still beats it when you need one thing done exceptionally well. This is a lesson that can be applied across domains, not just in the detection of secrets.

* For examples of the prompting and results from GitGuardian’s tests, look in the Methodology section of their State of Secrets Sprawl 2024 report.

Greg Bulmash is a Technical Content Writer at GitGuardian, writing remotely from just outside Seattle. He has worked for some of the biggest brands in news and technology, including IMDb, MSNBC.com, Microsoft, Amazon, AWS, and Amazon Alexa. He has been an invited speaker at tech conferences on three continents and his novel Hell on $5 a Day is available on Amazon.