Cloud GenAI workloads inherit pre-existing cloud security challenges, and security teams must proactively develop innovative security measures, including threat detection mechanisms.
Traditional Cloud Threat Detection
Threat detection systems are designed to allow early detection of potential security breaches; these indicators typically indicate the presence of attackers who may have bypassed preventive security measures. Therefore, threat detection systems are essential to a multi-layered and in-depth security architecture.
One common strategy used by threat detection systems is to use a threat detection engine, which essentially collects log events for security analysis. These threat detection engines leverage algorithms to detect specific log entries that indicate suspicious activity. Many threat detection engines commonly use sigma rules to determine which log events should be flagged as suspicious. However, due to the wide variety of log formats developed by cybersecurity companies, sigma rules are eventually converted into proprietary formats that are compatible with cybersecurity company detection engines.
False positives are always a challenge in threat detection; therefore, other strategies—such as event correlation and cyber threat intelligence (CTI)—are being leveraged to increase the accuracy of detections and reduce alert fatigue. More recently, detection engineering has emerged as a specialized aspect of threat detection, allowing detection engineers to customize threat detection systems.
Under the shared responsibility model, organizations using the cloud are responsible for conducting threat detection. This responsibility has been a major challenge for organizations because there is a significant difference between threat detection on-premises systems and threat detection in the cloud.
One of the big differences between cloud and cloud is access to event logs, where organizations rely on cloud providers to provide logs. In contrast, logs are directly accessible to on-premises systems. Another big difference is the interconnection of cloud resources via cloud APIs. By design, this allows for the core attributes of the cloud: flexibility, scalability, and elasticity. Interconnection is a double-edged sword for threat detection: Defenders can leverage it to quickly detect and prevent attacks, while attackers can also leverage it to quickly move laterally into the cloud fabric.
Threat Detection for Cloud Workloads GenAI
Threat detection in GenAI cloud workloads should be a major concern for most organizations. While not widely discussed, it is a ticking time bomb that may explode only if attacks emerge or if compliance regulations impose threat detection requirements for GenAI workloads.
There are several challenges facing advanced threat detection systems in GenAI cloud workloads.
Asset Management: Automated inventory systems are needed to track GenAI workloads in organizations. This is a critical requirement for threat detection, which is the foundation for security visibility. However, this can be difficult in organizations where security teams are unaware of GenAI adoption. Similarly, only a few technical tools can detect and maintain inventory of cloud GenAI workloads.
Lack of threat detection logic: Threat detection engines need specific logic to identify malicious or suspicious events in the cloud. However, this logic must be developed through open source efforts, such as the Sigma Rules or cybersecurity vendors. Currently, there appears to be little availability of such detection rules.
MITRE ATLAS Compatibility: MITRE ATLAS (Competitive Threat Landscape for AI Systems) is a globally accessible, living knowledge base of adversary tactics and techniques against AI-powered systems based on real-world attack observations and real-world demos from AI teams and security groups.
Like MITRE ATT&CK, security teams leverage this knowledge base to improve their threat detection systems by aligning them with detection rules. This reduces alert fatigue and enables realistic threat detection. However, the current MITRE ATLAS is generic and does not define cloud-specific GenAI technologies. This may take some time to evolve, similar to the Cloud IaaS Matrix.
Detection Gaps and API Abuse: Most cloud threats are not actual vulnerabilities but rather the misuse of existing features, making malicious behavior difficult to detect. This also poses a challenge for rule-based systems because they cannot always intelligently determine when API calls or log events indicate malicious events. Event correlation is therefore leveraged to formulate potential events that indicate attacks.
GenAI has been exposed to several instances of abuse, such as instant injection and training data poisoning. However, more instances of abuse will emerge as Cloud GenAI becomes more widespread, and identifying these instances can be difficult. So proactive measures are essential to avoid surprises.
Case Study: Amazon Bedrock
Let us illustrate the above points using Amazon Bedrock, one of the leading GenAI services in the cloud, provided by Amazon Web Service.
Amazon Bedrock provides access to many foundation models (FMs) from leading AI companies, including A121 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon. Bedrock uses several AI techniques—for example, fine-tuning and RAG (Retrieval Augmented Generation)—to enable organizations to build innovative GenAI applications without undergoing rigorous AI processes. Additionally, Bedrock is serverless, relieving users of the burden of orchestrating and maintaining the infrastructure.
However, a solid understanding of the AWS shared responsibility model, its characteristics, and its application to Bedrock is essential for a threat detection system. Organizations leveraging Bedrock first need an effective cloud asset management system capable of detecting and maintaining an up-to-date inventory of all Bedrock components. This capability will allow for rapid identification of potentially malicious changes.
Next, you need threat detection systems that collect and analyze event logs based on all API calls against Bedrock. AWS Cloudtrail can be helpful; however, a proportional detection logic is needed to scan the collected logs for malicious Cloudtrail event names. Furthermore, Bedrock’s introduction of AWS S3 into the Bedrock component’s knowledge base is essential to this understanding. This critical Bedrock component manages data retrieval and processing between underlying Amazon Bedrock components. The critical role of S3 as a data source is Bedrock’s Achilles heel; it introduces multiple attack vectors, including data poisoning, denial of service, data breach, and S3 ransomware. It is imperative to develop systems that quickly detect these attack vectors.
Cloud Attack Simulation
Cloud attack simulations simulate the tactics, techniques, and procedures (TTPs) of real-world attacks on controlled cloud infrastructure, allowing organizations to practically and safely assess the impact of these attacks on their infrastructure.
The MITRE ATT&CK framework has a significant impact on the simulated attacks, thus providing meaningful value to defenders. MITRE Engenuity has also formulated Threat-Aware Defense, a guide that organizations can leverage to prioritize realistic attacks over hypothetical attacks that rely on published vulnerabilities. A core pillar of threat-aware defense is adversary simulation, which is used to verify that the combination of security and CTI measures is as expected. Cloud Attack Simulation applies the adversary simulation concept to cloud infrastructure by integrating into the cloud fabric with APIs and providing a cloud-native experience.
Cloud attack simulation reduces cloud detection errors and alert fatigue by safely simulating cyberattacks that represent actual attacker behavior. Simulated attacker behavior, typically captured as security events, provides opportunities to uncover attack vectors that may bypass detection strategies.
Cloud attack simulation is a key component to developing and improving cloud detection significantly, as cloud APIs, features, and resources change unpredictably, and these changes create potential vulnerabilities and attack opportunities.
Cloud security operations teams can benefit from cloud attack simulation in several ways.
Detection engineers can check whether attack patterns are captured in a logging system (e.g., Cloudtrail) and develop rules that reduce alert fatigue by identifying potential false positives.
Cloud logs tend to be either decentralized or unavailable. For example, a data poisoning attack against Amazon Bedrock includes object-level events that are not available in the Cloudtrail console. Identifying these events requires additional configuration, for example, using Security Lake or CLoudtail Lake. Unbeknownst to them, SOC teams may miss data poisoning attacks against an S3 data source bucket.
But running cloud attack simulations provides opportunities to identify these blind spots and develop appropriate detection mechanisms. The simulated attacks can be based on MITRE ATT&CK and MITRE ATLAS, providing contextual understanding of threats against GenAI cloud workloads.
Conclusion
GenAI technology has taken the world by storm, and organizations are rapidly adopting it to enable innovation while gaining business benefits. However, most organizations are adopting GenAI services offered by public cloud computing companies to strike a meaningful balance between the required cost and the benefits of innovation.
Leveraging GenAI cloud workloads opens the door to many security challenges that are not well discussed at the moment, especially how to detect threats effectively. The most confusing aspect of this challenge is understanding the interpretation of the shared responsibility model for GenAI workloads, adapting existing threat detection strategies to align with GenAI-specific challenges, and designing appropriate technologies.
While learning from real attacks has proven to be the most powerful driver for enhanced threat detection, cloud attack simulation provides a means of learning at a low cost without the nitty-gritty implications of an actual cyber attack. Therefore, it is a great way to identify GenAI-specific threat dynamics and develop detection methods that are tailored to them. Furthermore, cloud attack simulation techniques enable threat-aware defense, thereby significantly reducing alert fatigue and false positives for GenAI cloud workloads.