Secrets detection is the process of finding and flagging sensitive data known as “secrets” across codebases or other locations within software development environments. This automated layer of defense helps ensure that no sensitive information is hardcoded or introduced into source code in an easily readable or unencrypted form.
Detecting secrets serves as part of a “shift left” approach that moves security earlier in the development process. Secrets can sprawl beyond code, increasing the risk of exposure to security incidents. Automated protection through secrets detection can help secure developer workflows at scale.
In application security (AppSec), secrets are digital pieces of information that grant access to human users or nonhuman identities such as apps, devices or workloads and allow them to communicate or perform actions. Because of their sensitive nature, secrets must be kept confidential.
Here are some common types of secrets:
API keys are unique identifiers for authenticating communication between services, software or systems through an application programming interface (API).
Authentication and authorization tokens verify identity and authorize access to resources.
Biometric data includes physical and behavioral traits inherent to a person—such as their facial features, fingerprints, voice or even their gait—which can be used to verify their identity.
Digital certificates and their associated private keys are employed to authenticate endpoints and establish secure communication channels.
Cloud provider credentials can be used to access cloud computing platforms like AWS, Azure, Google Cloud and IBM Cloud.
Connection strings are strings of text containing instructions to connect with a data source.
Database credentials are username and password combinations for gaining access to databases.
Encryption keys and other cryptographic keys are used for signing and encrypting or decrypting data.
Service account credentials allow apps and automated workflows to access and interact with operating systems.
SSH (Secure Shell) keys are used to authenticate entities accessing servers and other infrastructure.
Username and password combinations consist of strings of characters that authenticate a user’s access to a system.
Secrets are prime targets for threat actors. They can employ bots to harvest an exposed token, a leaked credential or a misconfiguration in a cloud-native environment. Attackers exploit these secrets to gain unauthorized access to applications and systems. Since access is obtained through legitimate credentials, it can be harder to detect and can go unnoticed for long periods of time.
This makes secrets detection a crucial component of a company’s cybersecurity strategy. Detecting secrets helps organizations:
Avoid account hijacking and privilege escalation: Hackers can use leaked credentials to escalate their privileges. Armed with higher privileges, they can alter system settings, disrupt servers and infrastructure, execute commands, install malware or take control over assets.
Prevent data breaches: Malicious actors can exploit exposed secrets to steal sensitive personal information or confidential corporate data. These data breaches can be costly and lead to financial losses, a decrease in customer trust and reputational damage.
Reduce vulnerabilities: Exposed secrets are open doors. Locating them allows enterprises to proactively remove these vulnerabilities, closing the door on successful cyberattacks.
Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.
The process starts with secret scanning, wherein secrets detection tools scour code repositories (also called repos) and related resources for exposed secrets. These tools then generate alerts or reports for any identified secrets, including the secret type and where it’s located. Development teams and security teams can draw on these alerts or reports to inform steps for remediation, such as moving secrets to a secrets management solution. These solutions help automate, centralize and streamline the creation, use, rotation and protection of secrets.
Detecting secrets entails a blend of methods to accurately identify secrets across repositories:
Pattern matching
Dictionary scanning
Entropy analysis
Pattern matching algorithms seek out strings that match standard formats of secrets. They also employ regular expressions, which are search patterns composed of sequences of characters.
This method is often effective for secrets that follow a predefined form, such as access tokens for cloud services or API keys. However, secret scanning using regular expressions (also known as regex) can be slow, and secrets that have a random makeup can go undetected.
In secrets detection, dictionaries are data sources containing known secrets. These dictionaries can be used as a reference when searching for secrets.
Dictionary scanning helps determine if a secret is current or not in use. But it might be unable to detect any secrets not within the dictionary.
Entropy measures the randomness or unpredictability of data. The higher the entropy, the more random the data and the harder it is to predict. As such, entropy analysis evaluates character sequences for attributes of randomness.
This method aids in uncovering potential secrets that don’t adhere to known patterns, such as encryption keys. It can also reveal new secrets or high-entropy secrets.
Choices abound for secrets detection systems. When selecting the right fit, enterprises must consider how accurate a solution is, the depth and breadth of its detection abilities, its scalability across large codebases and the software development lifecycle (SDLC), and whether it’s compatible and integrates seamlessly with their tech stack and software development workflows.
Here are a few popular tools for detecting secrets:
GitLab employs an analyzer for secret detection specific to GitLab environments. It offers these functionalities:
Secret push protection surveys changes before they’re pushed to GitLab and blocks the push if secrets are spotted.
Pipeline secret detection runs as a component of CI/CD pipelines, inspecting merge requests and commits to a repo’s default branch.
Client-side secret detection examines comments and descriptions in issues and merge requests.
GitLab can automatically respond to particular types of leaked secrets by revoking them. For certain account tiers, the GitLab Duo false positive detection feature assesses identified secrets to determine potential false positives.
Gitleaks is an open-source tool for detecting secrets in Git repositories, directories, files and other standard input. Its detection engine is based on a mix of pattern matching through regex and entropy analysis.
Organizations can create custom rulesets for their own secrets. They can also run Gitleaks as a pre-commit hook to catch secrets in source code before they make it into the repo.
Vault Radar is HashiCorp’s secrets detection product. It conducts continuous real-time scans for secrets and even personally identifiable information (PII), categorizing and ranking them on a dashboard to help remediation efforts.
Scans are also carried out for code commits, pull requests and when adding data sources. Supported data sources include repositories and platforms such as Azure DevOps, Bitbucket, GitHub and GitLab, and collaboration platforms like Confluence, Jira and Slack.
Vault Radar provides built-in remediation guidance for certain enterprise accounts. Exposed secrets can also be copied into Vault, HashiCorp’s secrets management platform, as part of the remediation process.
Artificial intelligence can enhance the accuracy and efficiency of secrets detection tools, most of which yield high false positive rates. AI models can be trained to recognize characteristics that correspond to a broader range of secret types, making them more dynamic than rule-based solutions.
Development teams implementing AI for secrets detection can gain these benefits:
Context awareness: AI models can learn the context surrounding secrets, including code comments, source code structure and variable names. This semantic and contextual analysis allows models to better distinguish between true secrets and sample data or test values. As such, context awareness can help lift true positive rates and lower false positive rates.
Real-time detection: Some AI-powered secrets detection tools integrate seamlessly with IDEs, flagging hardcoded secrets as developers write code and catching exposed secrets before they’re committed or pushed to repos.
Automated prioritization and remediation: AI-driven secrets detection can automate how flagged secrets are prioritized by assigning a risk score based on factors like exploitability, impact, location and severity. It can also suggest fixes, such as replacing exposed secrets with calls to a secrets management platform.
Applying AI for secrets detection involves a number of techniques:
Classification: Classification allows machine learning models to predict whether a piece of information is a true secret or a false one. Typical classification algorithms for detecting secrets include logistic regression, Naive Bayes and support vector machines (SVMs).
Ensemble learning: Ensemble learning merges multiple classifiers to increase prediction accuracy.
Deep learning: Deep learning, driven by multilayered neural networks, offers a more powerful and versatile way for detecting secrets. Common deep learning architectures employed include convolutional neural networks, long short-term memory (LSTM) and transformer models.
Generative AI: Large language models (LLMs) serve as another option for secrets detection. Both pretrained LLMs and fine-tuned small language models (SLMs) can be used to predict secrets and classify them according to different secret types. A human-in-the-loop approach remains vital to validate the accuracy of LLM predictions and classifications.
A blend of conventional and AI-based strategies can strengthen the secrets detection process. Pattern matching and entropy analysis can be used to search secrets, while AI-driven methodologies validate discovered secrets to eliminate false positives.
Here are a few examples of secrets detection solutions that use AI:
The GitGuardian code security platform scans Git repositories, CI/CD pipelines, Docker images and collaboration systems such as Confluence, Jira and Slack. Developers can set up pre-commit hooks and integrate scanning into IDEs or use GitGuardian’s command line interface app.
GitGuardian’s secrets detection engine consists of two types of pattern-matching detectors: specific detectors dedicated to finding particular secret types, leading to high recall and precision, and generic detectors to catch what specific detectors might miss. Various machine learning models also add valuable functionalities, such as filtering secrets that are likely false positives and evaluating the context around a generic secret to assign it an appropriate category and provider.
Other machine learning-powered features include similar incident grouping that clusters secrets sharing contextual similarities and risk scoring that uses XGBoost (eXtreme Gradient Boosting), an ensemble of decision trees learning from each other’s errors, to rank secrets based on multiple risk signals.
GitHub Secret Protection is GitHub’s own secret scanning system. It scrutinizes entire Git histories on all branches of a repository and descriptions and comments in issues and pull requests.
The system’s push protection function scans each push in real time, blocking commits that contain secrets. It can perform automatic validity checks to verify if a discovered secret is active and publicly exposed.
GitHub Secret Protection also uses GitHub Copilot to detect unstructured secrets committed into repositories.
Bob is IBM’s AI coding assistant designed to support writing source code, debugging, refactoring, code reviews and documentation. It can pinpoint security vulnerabilities within code, and coupled with its built-in secrets detection capability, Bob facilitates and fosters secure coding.
Bob can be programmed to take on a custom agentic mode that searches for hardcoded secrets, explains the security risks and recommends actions to secure secrets. It can replace secrets with references to a secrets management platform like HashiCorp’s Vault and push hardcoded secrets to Vault using the Model Context Protocol (MCP).
Secrets detection can enhance a company’s security posture. It can be most effective when incorporated into the development, deployment and maintenance phases of the SDLC.
Here are some best practices that development teams can keep in mind when implementing secrets detection:
Define the secrets that matter most
Go beyond code-related sources
Embed secrets detection into CI/CD pipelines
Triage and rectify
Educate development and DevOps teams
Businesses can begin by defining what constitutes a secret for them and sorting these secrets by importance to inform the remediation process. They must also audit their entire software supply chain to map out the extent of secrets exposure.
Secrets can spread beyond code-related sources. This means detection must also consider other possible sources of secrets exposure, such as:
Communication, collaboration and developer productivity tools
Configuration files
Containers and container orchestration platforms like Kubernetes
Databases
Documentation
Logs
DevOps teams can implement pre-commit hooks that make secret scanning a required step before developers commit code or initiate pull requests and block changes that contain hardcoded secrets. They must also check that tools have the ability to continuously scan artifacts, build logs and environment variables for any exposed secrets before they reach runtime and production environments.
Enterprises must establish policies for how exposed secrets are prioritized and corrected. Triage policies can include assessments of the security risks posed by different types of secrets and which teams are responsible for triaging.
Remediation policies must clearly outline the actions to take, be it revoking or rotating secrets or replacing them with calls to a secrets management platform. Automating these fixes can save time and result in swift responses, but fixes must be tested to verify that a secret can no longer be detected.
Organizations must include secrets detection as part of a development or DevOps team’s training in secure coding. Teams must understand the dangers of hardcoded or exposed secrets, what they can do to limit those dangers and how to use the necessary tools for secrets detection.
Accelerate software delivery with Bob, your AI partner for secure, intent-aware development.
Optimize software development efforts with trusted AI-driven tools that minimize time spent on writing code, debugging, code refactoring or code completion and make more room for innovation.
Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.