GitLab Public Repository Scan Uncovers Widespread Exposure of Live Secrets

A comprehensive security scan of all 5.6 million public repositories on GitLab Cloud has revealed a significant security issue, with more than 17,000 verified live secrets exposed across over 2,800 unique domains¹. This research, conducted by security engineer Luke Marshall, identified valid credentials including API keys, cloud access tokens, and database passwords that could grant attackers direct access to organizational systems. The findings highlight a persistent problem across code hosting platforms where developers accidentally commit sensitive information to public repositories.

The scale of this exposure becomes more significant when compared with similar research on other platforms. A complementary study of 2.6 million Bitbucket Cloud repositories found 6,212 verified live secrets, earning over $10,000 in bug bounties through responsible disclosure⁶. While GitLab had roughly twice the number of public repositories as Bitbucket, it exhibited approximately 35% higher density of leaked secrets per repository¹, suggesting platform-specific factors may influence leakage rates. These findings demonstrate that secret exposure is not limited to any single platform but represents an industry-wide challenge.

Scanning Methodology and Architecture

The research employed a scalable, serverless architecture on Amazon Web Services to conduct the massive scanning operation¹. A Python script first enumerated all public repositories using GitLab’s public API at `gitlab.com/api/v4/projects`. All repository URLs were then sent to an AWS Simple Queue Service (SQS) queue, creating a durable and fault-tolerant task list. An AWS Lambda function with high concurrency pulled tasks from the queue, with each invocation running a custom Docker container that executed the TruffleHog secret detection tool.

The scanning command was optimized for both efficiency and accuracy: `trufflehog git –json –no-update –only-verified`. The `–only-verified` flag proved critical as it reduces false positives by automatically checking if detected secrets are currently valid. This architecture enabled the scanning of millions of repositories in approximately 24 hours at a cost of around $770¹. The approach demonstrates how cloud infrastructure can be leveraged for large-scale security research while maintaining cost efficiency.

Key Findings and Analysis

The research uncovered several concerning patterns in how secrets are exposed and managed. One of the most alarming discoveries was the longevity of exposed credentials, with researchers finding valid secrets dating back over a decade. This included a live AWS key committed to Bitbucket in June 2013⁶ and a valid secret in a GitLab repository with a commit timestamp of December 16, 2009—nearly two years before GitLab itself was launched, likely from an imported repository¹. These “zombie secrets” demonstrate that credentials do not simply expire but must be actively rotated to eliminate risk.

A clear pattern of platform locality emerged from the research, where developers are more likely to leak a platform’s credentials on that same platform. The GitLab study found 406 valid GitLab tokens on GitLab, compared to only 16 found on Bitbucket¹. Similarly, the Bitbucket study found a disproportionately high number of exposed Atlassian product tokens including Jira, Bitbucket, and Opsgenie credentials⁶. This suggests that developers may be less cautious about exposing credentials for the platforms they are actively using for development work.

Google Cloud Platform credentials were the most frequently leaked on both GitLab and Bitbucket. On GitLab, about one in every 1,060 repositories contained a set of valid GCP credentials¹. Other high-impact secrets commonly found included AWS keys, SendGrid API tokens, Slack tokens, and Stripe credentials. The prevalence of cloud service credentials is particularly concerning given these often provide direct access to infrastructure and data.

Case Study: Tracing a Leaked Slack Token

The research provides a concrete example of effective triage and responsible disclosure¹. A Slack token was discovered committed from a `@hotmail.com` email address, which initially obscured the affiliated organization. Using TruffleHog’s `analyze` feature, researchers extracted metadata from the token, which included a `url` field pointing to a specific Slack instance. Navigating to this URL revealed a login screen for the organization’s Okta single sign-on system, confirming the token belonged to a corporate entity.

This secret was reported and accepted as a critical vulnerability, earning a $2,100 bounty through the organization’s bug bounty program. It is important to note that the `trufflehog analyze` command should only be used on secrets you own or when explicitly authorized by a bug bounty program’s scope¹. This case demonstrates how proper analysis of exposed credentials can lead to successful remediation while respecting ethical boundaries.

Industry Context and Comparative Analysis

The problem of secret exposure extends beyond GitLab to other major code hosting platforms. Research consistently shows GitHub, as the largest platform, has the highest absolute number of leaks. A March 2025 report by GitGuardian found 22.8 million hardcoded secrets in public repositories⁷. A separate analysis by Cybernews corroborated this trend, reporting 23.7 million hardcoded secrets exposed in the previous year—a 25% increase from 2023⁹. Crucially, 70% of the secrets discovered in 2022 were still active three years later, highlighting the persistent nature of the problem.

The issue is particularly acute in fast-moving sectors like artificial intelligence. A November 2025 investigation of the Forbes AI 50 list found that 65% of the world’s most prominent AI companies had accidentally leaked verified secrets on GitHub⁸. These companies, with a combined valuation exceeding $400 billion, exposed credentials including API keys and authentication tokens that could grant attackers direct access to core systems and proprietary models. The research highlighted that these leaks were not only in active repositories but also hidden in deleted forks, old code branches, and personal developer accounts.

Best Practices for Prevention and Response

In response to these risks, GitLab has published official best practices to help users secure their repositories⁴. The most effective preventive measure is to limit public visibility by setting the default visibility for new groups and projects to Private. This prevents accidental exposure of secrets intended for internal use only. Organizations should also secure CI/CD secrets by never storing passwords, tokens, or other secrets in plaintext within code or configuration files, instead leveraging dedicated secrets managers like HashiCorp Vault, AWS Secrets Manager, or GCP Secret Manager.

Platforms like GitLab provide multiple layers of defense that should all be enabled. Secret push protection blocks pushes that contain detected secrets before they reach the repository. Pipeline secret detection scans commits during CI/CD pipeline execution, allowing teams to catch leaks before they are merged into the default branch. Client-side secret detection scans issues and merge request descriptions and comments in real-time before they are saved. These integrated security features provide defense in depth against accidental secret exposure.

If a secret is accidentally exposed, organizations must immediately rotate or revoke the exposed credential and review access logs for the affected service to check for evidence of misuse. If a platform token was leaked, it should be revoked and the platform’s logs reviewed for suspicious activity. Establishing clear and accessible security disclosure channels from the beginning enables effective collaboration with security researchers who may discover exposed credentials⁸.

The exposure of secrets in public repositories represents a significant and persistent threat to organizations of all sizes. The research on GitLab repositories, combined with similar findings from other platforms, demonstrates that this is an industry-wide challenge requiring systematic solutions. The existence of valid secrets dating back over a decade underscores the long-term risk posed by credentials that are not actively managed and rotated. For security professionals, implementing robust secret management practices and enabling automated detection tools is essential for reducing this attack surface. The work of security researchers in identifying and responsibly disclosing these exposures provides valuable insight into the scale of the problem and the importance of proactive security measures.

Leave a Reply Cancel reply

Read More

AI in Vulnerability Management: A Trust but Verify Approach

Autoswagger: Open-Source Tool Detects API Authorization Flaws Before Attackers Do

Brave’s Cookiecrumbler: Community-Driven Cookie Notice Blocking with LLMs

You may have missed

Australian Man Sentenced for “Evil Twin” WiFi Attacks Targeting Airports and Flights

GitLab Public Repository Scan Uncovers Widespread Exposure of Live Secrets

GreyNoise IP Check: A Free Tool for Botnet Detection and IP Reputation Monitoring

OBR Enlists Cyber Expert Following Accidental Pre-Budget Data Leak