Sysdig https://sysdig.com/ Mon, 05 Aug 2024 22:03:51 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.5 https://sysdig.com/wp-content/uploads/favicon-150x150.png Sysdig https://sysdig.com/ 32 32 Supercharge your investigation with Sysdig Sage™ for CDR https://sysdig.com/blog/supercharge-your-investigation-with-sysdig-sage-for-cdr/ Tue, 06 Aug 2024 13:50:00 +0000 https://sysdig.com/?p=93233 Artificial intelligence has taken over almost every aspect of our everyday lives. In cybersecurity, generative AI models with natural language...

The post Supercharge your investigation with Sysdig Sage™ for CDR appeared first on Sysdig.

]]>
Artificial intelligence has taken over almost every aspect of our everyday lives. In cybersecurity, generative AI models with natural language processing are commonly being used to predict, detect, and respond to threats. But AI security assistants, although an upgrade from traditional machine learning, only provide very basic queries and summarization, which is insufficient to fully comprehend modern cloud attacks. As part of an ongoing effort to improve the cloud detection and response (CDR) experience, Sysdig has announced Sysdig Sage™, which makes it easier than ever to uncover active breach scenarios in real time. 

Sysdig Sage for CDR combines AI with security analysis as part of our ongoing mission to protect our customers in the cloud. Sysdig Sage observes your cloud data and generates responses that enable you to stop attackers. This revolutionary new AI assistant can also execute a variety of use cases, including contextual analysis of cloud and workload data, summarized event overview, and suggested remediations to contain an adversary.  

Here are just a few of the key new capabilities Sysdig Sage offers to enrich your CDR workflows:

  • Statistics on security events: Streamline analysis and proactively address breach scenarios by identifying critical events that need immediate attention.
  • Explanation of security events: Bridge skill gaps within security operations with detailed explanations of runtime events.
  • Suggested next steps: Reduce response timelines and improve compliance with behavioral details of relevant events at a broader level.
  • Contextual awareness: Contextualize the data a user observes to answer questions more precisely and move them across the platform to better visualize threats.

Let’s take a look at how Sysdig Sage can help you with a few key use cases.

Use case 1: Elevate skill gaps across operations

With Sysdig Sage, cybersecurity becomes easier for everyone. It refines your investigation journey as your team trawls through volumes of mundane tasks and events on a daily basis. It also helps foster collaborative workflows and motivates the team to stay vigilant for threats.

To demonstrate this, let’s narrow our scope and search for High severity logs from a specific cluster. Use the Search bar to type the below query.

kubernetes.cluster.name=risks-aws-eks-workloads-shield

Sysdig Secure applies this query across volumes of cloud data and filters the events relevant to the chosen timeline and cluster name. At a glance, we have over 300 exclusive events. Even for mature security operations, this volume of events could be overwhelming. Somewhere, somehow, it’s likely that a critical blindspot will be missed. These missed details will negatively impact response strategies and may leave an opening for an adversary to walk right in through the front door.

Sysdig Sage for CDR
Figure: Events from the cluster

Sysdig Sage alleviates some major operational pain points by enabling users to ask questions in a natural language format, swiftly derive a quick summary of the situation, and deploy prescriptive response strategies to throw a wrench in the adversary’s plans.

For example, let’s launch Sysdig Sage and ask it to summarize our filtered results.

Summarize events for this cluster
Sysdig Sage for CDR
Figure: Summarized events from Sysdig Sage

Sysdig Sage for CDR categorized the events under two distinct headers, namely Drift Detection and Malicious Binary Detected. Without any previous context of what the issue is, we now understand that the threat actor has managed to launch a malicious binary on several Kubernetes workloads, and we know that the Drift Detection policy (curated and maintained by our Sysdig Threat Research Team) prevented the listed workloads from being compromised. 

This information is enough to alert our security teams so they can deploy their established response strategies and mitigate the risk of a breach scenario. 

With Sysdig Sage, every user becomes a security investigator. 

Use case 2: Leverage AI to power your investigation

Sysdig Sage for CDR can respond to multiple queries in a row by correlating context from previous responses. This helps your teams uncover additional details relevant to an attack.

During a breach, you have very little time to do the necessary due diligence. We need to collect enough context at speed so the responsible teams can jump in and prevent the adversary from causing further damage. 

As an example, let’s use Sysdig Sage for CDR to perform a detailed analysis of all the events and generate an investigation report.

Generate detailed investigation report
Sysdig Sage for CDR
Figure: Analysis and Investigation Report

From this report, we notice there’s an active miner (easyminer) running within the risk-10-aws-bedrock-java namespace. A quick online search reveals that the detected binary is a legitimate open source mining software. However, the presence of it within our environment is suspicious. 

The report indicates that the adversary, after compromising the workload, downloaded and launched a cryptominer to serve its objectives.

Let’s ask Sysdig Sage to help us understand the root cause of the detected event.

what was the root cause for the malicious binary detected on risk-10-aws-bedrock-java?
Figure: Root cause for events under specified namespace

Sysdig Sage for CDR understood our query in natural language and identified that the root cause responsible for triggering the detection event was a shell script malicious-bin-e ./malicious-bin-event-gen.sh 

Within seconds, we have enough useful context about the detected malicious binary. Sysdig Sage for CDR has helped us answer the “what” and “why” of the event and saved valuable investigation time. However, our investigation is far from complete. 

Our next goal should be to understand the adversary methods used to breach the perimeters and access our workloads. Let’s ask Sysdig Sage to enlist the tactics and techniques used by the threat actor according to the MITRE ATT&CK framework.

what MITRE ATT&CK tactics & techniques were used?
Figure: MITRE ATT&CK tactics and techniques related to the events

The results show that the threat actor used MITRE ATT&CK tactics to execute the malicious binary, maintain persistence, and evade defenses within the cluster. 

At this stage, if you are curious about what’s going on under the hood, you can always use the accessibility options (top right) to pop into the Events Feed. Here,  you’ll notice filters are automatically applied, and there’s a timeline of every malicious binary event detected within our defined cluster risk-10-aws-bedrock-java.

Figure: Filters are automatically applied as you query in Sysdig Sage

Now, to gain further context on each MITRE ATT&CK tactic, let’s ask Sysdig Sage to list the attack path.

list the attack path
Figure: Attack path aligned to MITRE framework

Within seconds, Sysdig Sage for CDR expands the process tree to align each detected event under a specific MITRE ATT&CK category. This helps discover all the possible entry points and the security gaps that were potentially exploited by the threat actor. 

But now the real question is, how severe is this event? Let’s ask Sysdig Sage to provide us with a blast radius, listing all the workloads that may have been impacted by the threat actor.

how many workloads were impacted?
Figure: Impacted workloads listed by Sysdig Sage

The results indicate that quite a lot of workloads were possibly impacted by the threat actor. After this, you should really be looking for the panic button and calling in the cavalry, aka your SecOps and DevSecOps teams.

Use case 3: Achieve the 555 Benchmark for Cloud Detection and Response 

We demonstrated in the previous use cases how you could use Sysdig Sage for CDR as your security assistant and gather the preliminary information crucial for any security investigation. However, if you are the only one holding the fort for your organization, you need to apply temporary fixes before alerting the specialists. 

Let’s ask Sysdig Sage for suggested steps that may help you to preempt any adverse events, like user credential compromise, SSH key exfiltration, process masquerading, and many more.

how do I fix this?
Figure: Suggested remediations by Sysdig Sage

Sysdig Sage for CDR recommends a few best practices to mitigate potential risks and prevent further compromise of your environment. Here, isolating the affected resource seems like a good way to stop the adversary in their tracks. 

But in case we didn’t know what to do in such a situation, let’s ask Sysdig Sage to provide us with detailed guidance.

give me detailed guidance on isolating affected resources
Figure: Guided response actions

Stay ahead of threats with Sysdig Sage

Sysdig Sage for CDR is the handy security assistant that first, helps you stay calm during an incident, and second, guides you along each step to uncover all the necessary details required for a thorough investigation. It makes a security incident feel like a simple DIY project.

Sysdig Sage empowers security teams to capitalize on the real-time nature of the Sysdig platform and the cutting-edge discoveries of the Sysdig Threat Research team. With Sysdig Sage at your side, you can accelerate your response to threats without leaving the platform.

Join our upcoming seminar: AI-Powered CDR in Action for a technical demonstration of how you can leverage Sysdig Sage to detect, investigate, and respond to attacks in minutes.

The post Supercharge your investigation with Sysdig Sage™ for CDR appeared first on Sysdig.

]]>
5 things I love about Sysdig https://sysdig.com/blog/5-things-i-love-about-sysdig/ Fri, 02 Aug 2024 15:00:00 +0000 https://sysdig.com/?p=92521 Hello there! I’m Sebastian Zumbado, and I’m currently a DevSecOps Engineer in the Sales Engineer business unit at Sysdig. My...

The post 5 things I love about Sysdig appeared first on Sysdig.

]]>
Hello there! I’m Sebastian Zumbado, and I’m currently a DevSecOps Engineer in the Sales Engineer business unit at Sysdig. My journey with Sysdig began some years ago when I was a DevOps engineer at IBM. Back then, one of my key responsibilities was maintaining and developing custom alerts for the IBM cloud using Sysdig Monitor and setting up new custom metrics. I vividly remember how Sysdig’s contributions to the YACE (Yet Another CloudWatch Exporter) open-source project helped me manage the cost consumption of CloudWatch logs at that time. This experience sparked my interest in Sysdig.

What Do I Do?

I know being a DevSecOps Engineer in a Sales Engineer department might sound odd, but I get to have lots of fun. My role revolves around ensuring that our demo environments are always ready for new prospects. Maintaining these environments involves creating custom cybersecurity scenarios to showcase how Sysdig helps secure various infrastructures. A crucial part of this task is ensuring that while we demonstrate insecure configurations, our demos and environments themselves remain secure. This delicate balance requires constant vigilance and innovative thinking.

Lately, I’ve been collaborating on the infrastructure of our Kraken Hunter workshops. These workshops are delivered globally and are meant to bring the attendees hands-on experience on keeping their environments secure and achieving that goal through different use cases. 

Automation is key to ensuring our demo environments are always ready for Sales Engineers to present to new prospects or during major technology events like Black Hat or AWS reInvent.

5 things I love about working at Sysdig

1. Collaborating with Talented Individuals

Something that you value and appreciate when you are young in your career is having colleagues with experience who have been leaders in the area you are in. Working at Sysdig has allowed me to collaborate with really talented individuals in the industry. This is a pretty dynamic and competitive industry so there is always so much to learn from others – as they say: “Never be the smartest person in the room”.

2. Flexibility and Autonomy

One of the greatest perks of working at Sysdig is the flexibility it offers. The freedom to work from home provides a sense of ownership over my schedule, significantly boosting my productivity. At the same time, Sysdig values in-person interactions, and our periodic events with peers and executives are incredibly effective for planning and addressing projects. I’m so excited to visit my peers in the Zaragoza office! This balance between remote work and in-person collaboration ensures that we remain connected and productive, providing the best of both worlds.

3. Meaningful Work

At Sysdig, I feel that my contributions matter and are aligned with the company’s overall goals. I know how crucial it is for companies to maintain cybersecurity best practices and that is no easy task at all. So when I work on reproducing a scenario for a new attack threat like SSH snake or researching the impact of a new CVE, I know that my efforts help alleviate the burden on SOC teams and enhance our customers’ security. This sense of purpose and impact makes me love what I do.

4. Innovative and Dynamic Environment

Sysdig’s dynamic and innovative environment is exhilarating. Because we are pioneers and constantly raising the bar, some processes or standards do not exist yet, so I have the excitement of helping create them and paving the way. This level of autonomy and creativity is invaluable for career growth and provides a unique experience that is hard to find elsewhere.

5. Deepening My Cybersecurity Expertise

Working with Sysdig has allowed me to dig deeper into the cybersecurity space in ways I never imagined. This past March, I had the honor of giving a talk at the first Kubernetes Community Days event held in Costa Rica. My presentation, titled “Securing Your Software Supply Chain: A Practical Guide with Falco,” was a great professional milestone, and helping the education of the open-source community has been an invaluable experience. 

Insights: Things to look for in your career

  • Passion: if you are passionate about what you do, you will eventually become an expert in it because you were able to dedicate the necessary time.
  • Talented Team: ensure the company you work for has a knowledgeable and supportive team to foster your growth. Being surrounded by amazing collaborators will bootstrap your career. You will find mentors, advice, experiences, and lessons from others who have been where you are.
  • Purposeful Work: choose a company where your work aligns with your principles, so you can feel that what you do is making a real difference.
  • Innovation: Seek out dynamic environments that encourage creativity and the development of new ideas – a place where you can learn something new every day.

The post 5 things I love about Sysdig appeared first on Sysdig.

]]>
Sysdig Sage™ for CDR: Accelerate analysis, investigation and response https://sysdig.com/blog/sysdig-sage-for-cdr-accelerate-analysis-investigation-and-response/ Wed, 31 Jul 2024 13:50:00 +0000 https://sysdig.com/?p=91879 Last year, Sysdig outlined our vision for an AI-driven cloud security assistant. Today, we are excited to announce Sysdig Sage™...

The post Sysdig Sage™ for CDR: Accelerate analysis, investigation and response appeared first on Sysdig.

]]>
Last year, Sysdig outlined our vision for an AI-driven cloud security assistant. Today, we are excited to announce Sysdig Sage™ for cloud detection and response (CDR), our new release that embodies our vision. Built upon the core principles we introduced, Sysdig Sage offers actionable insights for cloud environments, with a focus on CDR. Sysdig Sage for CDR is the first milestone on the road to making AI assistance pervasive across our CNAPP platform, enabling customers to secure their cloud environments faster.

The 555 Benchmark for Cloud Detection and Response – 5 seconds to detect, 5 minutes to triage, and 5 minutes to respond – sets the standard for operating securely in the cloud. Achieving 555 means being able to detect and respond to cloud attacks faster than attackers can complete them.

With only 5 minutes to perform cloud investigations and block attacks before they are executed, Sysdig Sage for CDR accelerates analysis and investigation, allowing users to prioritize what matters. With Sysdig Sage, users can focus on attack responses rather than spending time connecting the dots or retrieving key information to understand the attack’s big picture and impact.

What is Sysdig Sage for CDR?

Sysdig Sage is a generative AI cloud security analyst – an expert that empowers users, letting them ask questions about their runtime events in natural language within Sysdig Secure’s Events Feed.

The Events page provides an overview of security events occurring across your infrastructure, allowing you to dive deep into specific details, distinguish false positives, and configure policies – based on open source Falco – to enhance security.

Sysdig Sage elevates these capabilities infusing AI into security analysis operations, delivering:

  • Statistics of security events: Review top statistics for runtime security events based on various groupings such as policy name, rule (event type), severity, and more. This will help users streamline the analysis and quickly identify and focus on events that are relevant to the investigation
  • Explanation of security events: Sysdig Sage can provide details about runtime events to users and dig deeper into them – for example, to explain the command lines that generated them. 
  • Suggested next steps: Sysdig Sage for CDR can get behavioral details from sample runtime events to summarize what happened at a broader level and suggest some next steps to fix and remediate the issues. This will help users move faster and immediately take action.
  • Context awareness: Sysdig Sage for CDR provides a fully integrated experience. It understands what users are navigating in the Secure UI and can control it, allowing users to quickly jump to the events and information relevant to their investigation.

See Sysdig Sage in action

As someone working in security operations, you might want to easily navigate, filter, and focus on relevant events. When viewing the Sysdig Events feed, you want to be able to understand the events you need to focus on.

You might filter out low and medium-severity events but still have tons of events to process. This is when Sysdig Sage can speed up your work. You are one click away from asking “Can you summarize these events?” Sysdig Sage will understand that you activated these filters in the UI and only focus on high-severity events that occurred in the last 6 hours:

Sysdig Sage controlling the Sysdig Secure Events Feed

You can then click on “Link to events” to quickly reach the events you want to analyze in the UI and keep the conversation going with a focus on the event you want to look at more closely:

At this point, you might want to better understand why the user was allowed to perform that action and if it represents a threat:

Now that you connected the dots, you will be able to start crafting your remediation strategy:

And finally: the big picture. Is the threat you analyzed part of a broader security incident? Let’s ask Sysdig Sage!

In just a few questions, you were able to refine your analysis, get all the needed information without leaving Sysdig Secure, and get guidance on what steps to take.

Unlock the power of AI for cloud security

Cloud attacks happen fast. Sysdig Sage for CDR is the ultimate secret weapon to equip security teams to achieve the 555 Benchmark for Cloud Detection and Response, quickly make informed decisions, rapidly respond to threats, and save time on the most complex tasks.

With Sysdig Sage you can:

  • Supercharge skills: Whether a novice or expert, Sysdig Sage for CDR will help you understand your runtime events.
  • Save time: Focus on outcomes, not the analysis. 
  • Get actionable insights: Know where to start and reduce time to respond – from hours to seconds.
  • Collaborate better: Level set knowledge across teams. 

By reducing analysis time to just seconds and seamlessly connecting the dots, Sysdig Sage for CDR impacts daily security operations, supercharging CNAPP capabilities with the power of AI.

Come talk to us about Sysdig Sage at our Black Hat booth.

Webinar: Outpacing cloud attackers with GenAI

Join Sysdig CTO, Loris Degioanni, to learn more about advanced AI strategies for rapid threat detection and response.

The post Sysdig Sage™ for CDR: Accelerate analysis, investigation and response appeared first on Sysdig.

]]>
Sysdig Sage™: A groundbreaking AI security analyst https://sysdig.com/blog/sysdig-sage-a-groundbreaking-ai-security-analyst/ Wed, 31 Jul 2024 13:50:00 +0000 https://sysdig.com/?p=92064 Generative AI (GenAI) is a top priority for organizations looking to increase productivity and solve business problems faster. In cloud...

The post Sysdig Sage™: A groundbreaking AI security analyst appeared first on Sysdig.

]]>
Generative AI (GenAI) is a top priority for organizations looking to increase productivity and solve business problems faster. In cloud security, AI chatbots to aid security practitioners are becoming more common, but to date, most of these solutions offer only basic queries and summarization. Diverse cloud environments and evolving threats require more from an AI security analyst.

To streamline investigation and help teams understand how to respond to fast-moving cloud attacks, AI for cloud security needs specialized, domain-specific programming, contextual awareness, and the ability for teams to have multi-step conversations that transform data into actionable insights.

Navigating cloud complexity

Cloud ecosystems and technology stacks can be incredibly complex. Navigating the intricacies of public and private clouds, containers, and Kubernetes requires domain expertise. Even seasoned professionals can find it challenging to stay ahead of the latest tech as it relates to cloud threats. For this reason, there is a tangible benefit to having an AI analyst that can instantly deliver the collective wisdom of human experts and the continuous learnings of AI models. 

Responding under pressure

Cloud security teams are under tremendous pressure as they race against the clock. When it’s crunch time, insufficient answers from an AI chatbot, or delays as you search for information aren’t just stressful; they can give adversaries the upper hand. During an investigation or incident response, a lot of time can be wasted trying to determine what something is and how to respond. The proper response for a given scenario may be less obvious to less experienced team members. Getting fast, accurate assistance can make a difference between data and workloads being impacted – or not.

Accelerating human response with a purpose-built AI cloud security analyst

When you have only minutes to respond, the ability to have a conversation that helps you quickly understand a cybersecurity event and how to address it is extremely powerful. To provide this level of support requires capabilities beyond just collecting and compiling data from external sources. By employing multi-step reasoning, contextual awareness, and specialized domain-specific programming, AI for cloud security can offer a truly autonomous and comprehensive approach to security analysis.

Sysdig Sage - AI-powered cloud security analyst

This is the approach we’ve taken with Sysdig Sage, Sysdig’s AI cloud security analyst. Sysdig Sage interacts with users through human-like conversations, helping to peel back the layers of security events. 

Architecturally, Sysdig Sage uses an autonomous agents approach, leveraging multiple specialized AI agents that work collaboratively with a common goal: to simplify and accelerate security and enable a faster, better-informed human response. This unique architecture uses advanced agent-based reasoning to not only collect data, but also to provide meaningful, context-aware recommendations that are directly useful for security decisions.

Key capabilities of Sysdig Sage

Multi-step reasoning: Sysdig Sage helps security teams peel back the layers of sophisticated cloud threats through in-depth conversations. Start with a simple question and ask follow-up questions to dive deeper, gaining a clearer understanding of runtime events. Straightforward answers and suggested queries enable quick comprehension of security implications and risks in complex cloud estates.


Contextual awareness: Sysdig Sage understands the context of what users are currently observing in the Sysdig UI and provides precise answers based on that context. It helps you navigate the platform UI, directing you to visualizations that provide a deeper understanding of a given event. As a result, team members of all skill levels get the help they need to manage more and escalate less.


Guided response: Beyond summarizing and explaining threats, Sysdig Sage suggests proactive response actions, prevention strategies, and process improvements. It empowers you to take full advantage of the real-time nature of the Sysdig platform, along with insights available from the Sysdig Threat Research team. Considering the speed at which attacks progress in the cloud, fast answers on how to stop threats are key.


Using Sysdig Sage, cloud security teams are equipped to handle complex security tasks:

  • Incident investigation: Analyze incidents to determine root cause, including performed activities, cloud context, and responsible identities.
  • Prioritization: Prioritize threats based on multiple factors, including severity and potential impact.
  • Risk mitigation: Get effective strategies for mitigating identified risks and enhancing security posture and practices.

And, since Sysdig Sage is multilingual – with support for over 80 languages – you can take advantage of its insights in the language of your choice.

Comparing Sysdig Sage with traditional AI assistants

Sysdig Sage is a true AI security analyst. Looking at the landscape of AI assistance currently available, here’s how Sysdig Sage stacks up:

Insight generation vs. data aggregation

  • Traditional AI assistants: Focus on collecting and compiling data from various sources.
  • Sysdig Sage: Goes beyond aggregation to generate actionable insights through advanced agent-based reasoning.

Contextual awareness

  • Traditional AI assistants: Use a separate prompt interface with little or no UI interaction.
  • Sysdig Sage: Aware of the data the user is observing as context for queries; links users to directly relevant UI views.

Decision support vs. information presentation

  • Traditional AI assistants: Present summarized information for review.
  • Sysdig Sage: Provides detailed, step-by-step reasoning to support critical security decisions.

Adaptive problem-solving

  • Traditional AI assistants: Focus on specific use cases (i.e. remediation information).
  • Sysdig Sage: Tackles unforeseen challenges by combining autonomous agents’ specialized skills. Adaptability ensures AI remains effective in the face of evolving security threats.

Enhanced collaboration

  • Traditional AI assistants: Support single tasks.
  • Sysdig Sage: Acts as a true AI security analyst, supporting users in a free-flowing, contextual manner. Facilitates collaboration between human analysts and AI assistance.

Conclusion

As cloud security threats rapidly evolve, so too must capabilities for cloud security. AI capabilities built with multi-step reasoning and contextual awareness give defenders a new way to understand events, reduce escalations, and streamline response. If you’re new to cloud security, having an AI companion to offer insights and advice can help quickly build your skills and aid you in making the right call in the face of threats. And, if you’re a security veteran, finding ways to save time is likely at the top of your list – AI can help. 

Sysdig has designed its cloud security analyst, Sysdig Sage, to function like a team of experts by your side – always available to help you stay ahead of adversaries in an increasingly complex cloud landscape. We invite you to read the next blog in our launch series to learn more and see Sysdig Sage in action.

Webinar: Outpacing Cloud Attackers with GenAI

Join Sysdig CTO, Loris Degioanni, to learn more about advanced AI strategies for rapid threat detection and response.

The post Sysdig Sage™: A groundbreaking AI security analyst appeared first on Sysdig.

]]>
2024 Gartner® CNAPP Market Guide: Runtime insights is a core pillar of cloud-native application protection platforms https://sysdig.com/blog/gartner-runtime-insights-is-a-core-pillar-of-cnapp/ Fri, 26 Jul 2024 21:15:07 +0000 https://sysdig.com/?p=92045 As organizations continue to look for consolidated platforms to address their security needs, an important shift has happened. Customers have...

The post 2024 Gartner® CNAPP Market Guide: Runtime insights is a core pillar of cloud-native application protection platforms appeared first on Sysdig.

]]>
As organizations continue to look for consolidated platforms to address their security needs, an important shift has happened. Customers have discovered that traditional tools focusing exclusively on static risks (such as misconfigurations, policy/control failures, and network exposure) are not enough to address today’s dynamic cloud threats. What’s needed is a solution with runtime visibility that can help prioritize active risks – like real-time configuration changes and in-use packages with critical vulnerabilities – and tell them what to focus on now.

The 2024 Gartner Market Guide for Cloud-Native Application Protection Platforms (CNAPPs) examines this dynamic as well as other trends in the CNAPP space. It’s clear that CNAPP is no longer a new trend – it’s essential to consolidate security tooling, reduce complexity and costs, and improve agility across the pipeline.

Sysdig has continued to expand real-time capabilities across CNAPP to provide a comprehensive platform for cloud security that prioritizes the risks that matter. With CNAPP adoption continuing to increase, it’s important for customers to consider what is most important when it comes to evaluating a solution.

What is a CNAPP?

According to Gartner, “Cloud-native application protection platforms (CNAPPs) are a unified and tightly integrated set of security and compliance capabilities, designed to protect cloud-native infrastructure and applications.”

CNAPP capabilities help you:

  • Prioritize and remediate cloud risks through a consolidated platform that can communicate across multiple areas of cloud security.
  • Reduce the chance of a misconfiguration, a mistake, or mismanagement of a resource as cloud-native applications are rapidly developed, released into production, and iterated.
  • Converge and reduce the number of tools and vendors involved in the continuous integration/continuous delivery (CI/CD) pipeline.
  • Improve developer acceptance with security-scanning capabilities that seamlessly integrate into their development pipelines and tooling.
gartner CNAPP market guide

Benefits of a CNAPP

Increased end-to-end visibility: A successful CNAPP must process substantial volumes of data from diverse sources. This encompasses data from Linux system calls, Kubernetes audit logs, cloud logs, identity and access tools such as Okta, and more. Further, cloud environments can span many types of workloads running on public, private, on-prem, or hybrid infrastructure. Extensive coverage is crucial due to the many potential entry points for attacks, as well as the potential for attackers to move laterally. Legacy tools may provide visibility of a portion of your environment, but CNAPPs can provide end-to-end visibility across workloads, cloud services, Linux/Windows, identities, and third-party apps.

Rapid prioritization of risk with runtime insights: The key for security teams to prioritize the most impactful issues across cloud environments is runtime insights. Runtime insights provide actionable information on the most critical problems in an environment based on the knowledge of what is running right now. This provides a lens into what’s actually happening in deployments, allowing security and development teams to focus on current, exploitable risks. From vulnerabilities tied to active packages to real-time detections, a CNAPP solution must be able to correlate findings in real time to uncover hidden attack paths and risks. By connecting the dots across cloud domains, security teams can make informed decisions on which risks are the most critical to address. This helps teams eliminate alert fatigue, provides deep visibility, and enables them to identify relevant suspicious activity.

Ability to detect and respond to cloud threats in real time: EDR and XDR tooling are fundamentally unsuited for the cloud, and the security teams that still rely on them find themselves struggling with incomplete and siloed data that lacks cloud context, dramatically slowing detection, investigation, and response. Cloud detection and response (CDR), as part of a CNAPP, automatically correlates posture and runtime insights for true cloud-native context, accelerating workflows and eliminating skill gaps. Prevention alone is no longer enough. As threat actors evolve and develop unique and unknown attacks, defenders must prioritize detecting and stopping unknown attacks in real time. No organization can effectively defend against zero-day exploits without a purpose-built solution for the cloud.

How CNAPP helps security and DevOps teams

A single purpose-built platform can break silos and streamline downstream activities. Security becomes a valuable business partner by delivering relevant, high-context guidance across key stakeholders. Rapid investigation findings enable prescriptive guidance for response actions across incident response, platform, developer, and DevSec teams. These accelerated findings allow response teams to initiate a response within 5 minutes, adhering to the 5-minute response standard outlined in the 555 Benchmark for Cloud Detection and Response.

There’s also value in “closing the loop” between security and platform/dev teams. The ability for security teams to leverage findings from investigations (such as what misconfigurations, permissions, and vulnerabilities were abused to perpetuate the attack) and then share those insights to tune and harden preventive controls are critical for enterprise to stay secure – from prevention and hardening to detection and response. This focus on perpetual improvement to preventative controls helps ensure incidents are non-recurring, which reduces organizational cloud risk.

Why CNAPP must have cloud detection and response

The expansive nature and complexity of modern cloud operations warrant new and practical security approaches, with cloud services extending far beyond operating systems and associated processes, the core domains of endpoint detection and response (EDR) tools. Sophisticated threat actors often leverage AI and automated techniques to exploit cloud services in near real-time.

Identity and access management, vulnerability management, and other preventive controls are important for building a robust defense, but defenders must have the ability to stop attacks in motion. To effectively combat cloud threats, security teams need a comprehensive and actionable cloud detection and response solution — one that is purpose-built for the complexities and speed of the cloud. The solution should correlate findings across the entire cloud estate in real time, matching or even outpacing the speed of cloud threats.

At the end of the day, EDR and XDR tools are fundamentally not suited for cloud security. They lack the cloud context to understand the who, what, where, and how of an attack before a breach can occur. Without this context, teams can’t communicate effectively to effectively respond, greatly increasing the potential for missed threats and a material breach. Without a shared platform that prioritizes CDR, security teams will always be left playing catch-up.

Recommendations for evaluating a CNAPP

In their 2024 Market Guide for Cloud-Native Application Protection Platforms, Gartner shares several recommendations for security and risk management leaders. Based on our understanding from the report, we’ve provided several questions to help you navigate the buying process.

Do they address a broad set of security use cases from source to production?

This includes capabilities such as:

  • Cloud detection and response – Enabling real time multi-cloud correlation, context, and visibility into identity, workload, and cloud activity, including the control plane.
  • IaC security – Scanning IaC manifests to identify misconfigurations and security risks before deployment while preventing drift. 
  • Vulnerability management and supply chain security – Identifying, prioritizing, and fixing vulnerabilities across your software supply chain, such as SCM, CI/CD, registry, and runtime environments.  
  • AI workload security – Uncovering active AI risk by flagging any suspicious activity or changes to workloads that contain AI packages that are in-use.
  • Optimize identities and access in the cloud – Because suspicious user activity is often the first indicator of a breach, it’s critical to be able to detect compromise in seconds, contain compromised identities, and prevent future identity abuse.
  • Kubernetes security posture management (KSPM) – Ensuring that governance, compliance, and security controls are included for Kubernetes.
  • Configuration and access management – Hardening posture by managing misconfigurations and excessive permissions across cloud environments, such as cloud resources, users, and even ephemeral services like Lambda.
  • Threat detection and response across cloud workloads, users, and services – Multi-layered detection approach that combines rules and ML-based policies, enhanced with threat intelligence, along with a detailed audit trail for forensics and incident response.
  • Universal compatibility with eBPF – Simplifying deployment and giving organizations greater flexibility regarding where and how they develop cloud-native applications with extensive coverage of Linux hosts, Windows hosts, and Kubernetes nodes.
  • Compliance– Meeting compliance standards for dynamic cloud/container environments against PCI, NIST, and HIPAA.

Can they accurately prioritize what matters?

Prioritizing the most critical vulnerabilities, configuration or access mistakes based on in-use risk exposure is key. For example:

  • Understanding which packages are in-use at runtime, helps you prioritize the most critical vulnerabilities to fix. Our research shows that 87% of container images have high or critical vulnerabilities, but only 15% of vulnerabilities are actually tied to loaded packages at runtime.
  • Real-time cloud activity helps immediately spot anomalous behavior/posture drift that are most risky
  • Runtime access patterns help to highlight the excessive permissions to fix first. 

Also the ability to provide remediation guidance that ultimately helps teams to make informed decisions directly where it matters most – at the source.

Can they maximize coverage but also give deep visibility?

Evaluate whether CNAPP vendors provide deep visibility and insights across your entire multi cloud footprint, including IaaS and PaaS, extending across VM, container, and serverless workloads. This often includes both agentless for visibility and control, as well as deep runtime visibility based on instrumentation approaches like eBPF. 

Are they truly getting a consolidated view of risk?

Some vendors acquire multiple companies to check the box, and this results in a poor disjointed experience. Look for a CNAPP vendor that tightly integrates the source to production use cases, replacing multiple point products with a comprehensive picture of risk across configurations, assets, user permissions, and workloads.

Do they allow customizations?

Every organization is different. The ability to customize policies, filter results and accept risk based on the organization’s unique environment is key to successfully adopting a solution.

Are they tightly integrated with the DevOps and security ecosystem?

The CNAPP must integrate with CI/CD tools and scan for misconfigurations and vulnerabilities pre-deployment as well as with SIEM/notification tools trigger alerts / forward events so teams can act immediately. Guidance on how to fix is key; the tool needs the ability to map the violation back to the IaC file, provide situational awareness when investigating an alert through rich context, and give suggestions (in the form of pull request for example) to fix it where it matters: at the source.

CNAPP requires runtime insights

Prioritizing CNAPP with runtime insights empowers security teams to span the spectrum of prevention and hardening to detection and response, which ultimately allows all teams in the cloud to handle issues with greater efficiency and confidence. As organizations increasingly navigate cloud security complexities, runtime insights provide a decisive advantage by offering comprehensive visibility, enabling rapid risk prioritization, and mitigating alert overload. 

By addressing the challenges of end-to-end visibility and alert fatigue, CNAPPs equipped with runtime insights enable security and development teams to swiftly identify, prioritize, and address critical vulnerabilities, ensuring the organization’s cloud security posture aligns seamlessly with the pace of innovation. 

Gartner® 2024 CNAPP Market Guide

Gartner, Market Guide for Cloud-Native Application Protection Platforms, Dale Koeppen, Charlie Winckless, Neil MacDonald, Esraa ElTahawy, 22 July 2023.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

The post 2024 Gartner® CNAPP Market Guide: Runtime insights is a core pillar of cloud-native application protection platforms appeared first on Sysdig.

]]>
Kubernetes 1.31 – What’s new? https://sysdig.com/blog/whats-new-kubernetes-1-31/ Fri, 26 Jul 2024 14:00:00 +0000 https://sysdig.com/?p=92018 Kubernetes 1.31 is nearly here, and it’s full of exciting major changes to the project! So, what’s new in this...

The post Kubernetes 1.31 – What’s new? appeared first on Sysdig.

]]>
Kubernetes 1.31 is nearly here, and it’s full of exciting major changes to the project! So, what’s new in this upcoming release?

Kubernetes 1.31 brings a plethora of enhancements, including 37 line items tracked as ‘Graduating’ in this release. From these, 11 enhancements are graduating to stable, including the highly anticipated AppArmor support for Kubernetes, which includes the ability to specify an AppArmor profile for a container or pod in the API, and have that profile applied by the container runtime. 

34 new alpha features are also making their debut, with a lot of eyes on the initial design to support pod-level resource limits. Security teams will be particularly interested in tracking the progress on this one.

Watch out for major changes such as the improved connectivity reliability for KubeProxy Ingress, which now offers a better capability of connection draining on terminating Nodes, and for load balancers which support that.

Further enhancing security, we see Pod-level resource limits moving along from Net New to Alpha, offering a capability similar to Resource Constraints in Kubernetes that harmoniously balances operational efficiency with robust security.

There are also numerous quality-of-life improvements that continue the trend of making Kubernetes more user-friendly and efficient, such as a randomized algorithm for Pod selection when downscaling ReplicaSets.

We are buzzing with excitement for this release! There’s plenty to unpack here, so let’s dive deeper into what Kubernetes 1.31 has to offer.

Editor’s pick:

These are some of the changes that look most exciting to us in this release:

#2395 Removing In-Tree Cloud Provider Code

Probably the most exciting advancement in v.1.31 is the removal of all in-tree integrations with cloud providers. Since v.1.26 there has been a large push to help Kubernetes truly become a vendor-neutral platform. This Externalization process will successfully remove all cloud provider specific code from the k8s.io/kubernetes repository with minimal disruption to end users and developers.

Nigel DouglasSr. Open Source Security Researcher

#2644 Always Honor PersistentVolume Reclaim Policy

I like this enhancement a lot as it finally allows users to honor the PersistentVolume Reclaim Policy through a deletion protection finalizer. HonorPVReclaimPolicy is now enabled by default. Finalizers can be added on a PersistentVolume to ensure that PersistentVolumes having Delete reclaim policy are deleted only after the backing storage is deleted.


The newly introduced finalizers kubernetes.io/pv-controller and external-provisioner.volume.kubernetes.io/finalizer are only added to dynamically provisioned volumes within your environment.

Pietro PiuttiSr. Technical Marketing Manager

#4292 Custom profile in kubectl debug


I’m delighted to see that they have finally introduced a new custom profile option for the Kubectl Debug command. This feature addresses the challenge teams would have regularly faced when debugging applications built in shell-less base images. By allowing the mounting of data volumes and other resources within the debug container, this enhancement provides a significant security benefit for most organizations, encouraging the adoption of more secure, shell-less base images without sacrificing debugging capabilities.

Thomas LabarussiasSr. Developer Advocate & CNCF Ambassador


Apps in Kubernetes 1.31

#3017 PodHealthyPolicy for PodDisruptionBudget

Stage: Graduating to Stable
Feature group: sig-apps

Kubernetes 1.31 introduces the PodHealthyPolicy for PodDisruptionBudget (PDB). PDBs currently serve two purposes: ensuring a minimum number of pods remain available during disruptions and preventing data loss by blocking pod evictions until data is replicated.

The current implementation has issues. Pods that are Running but not Healthy (Ready) may not be evicted even if their number exceeds the PDB threshold, hindering tools like cluster-autoscaler. Additionally, using PDBs to prevent data loss is considered unsafe and not their intended use.

Despite these issues, many users rely on PDBs for both purposes. Therefore, changing the PDB behavior without supporting both use-cases is not viable, especially since Kubernetes lacks alternative solutions for preventing data loss.

#3335 Allow StatefulSet to control start replica ordinal numbering

Stage: Graduating to Stable
Feature group: sig-apps

The goal of this feature is to enable the migration of a StatefulSet across namespaces, clusters, or in segments without disrupting the application. Traditional methods like backup and restore cause downtime, while pod-level migration requires manual rescheduling. Migrating a StatefulSet in slices allows for a gradual and less disruptive migration process by moving only a subset of replicas at a time.

#3998 Job Success/completion policy

Stage: Graduating to Beta
Feature group: sig-apps

We are excited about the improvement to the Job API, which now allows setting conditions under which an Indexed Job can be declared successful. This is particularly useful for batch workloads like MPI and PyTorch that need to consider only leader indexes for job success. Previously, an indexed job was marked as completed only if all indexes succeeded. Some third-party frameworks, like Kubeflow Training Operator and Flux Operator, have implemented similar success policies. This improvement will enable users to mark jobs as successful based on a declared policy, terminating lingering pods once the job meets the success criteria.

CLI in Kubernetes 1.31

#4006 Transition from SPDY to WebSockets

Stage: Graduating to Beta
Feature group: sig-cli

This enhancement proposes adding a WebSocketExecutor to the kubectl CLI tool, using a new subprotocol version (v5.channel.k8s.io), and creating a FallbackExecutor to handle client/server version discrepancies. The FallbackExecutor first attempts to connect using the WebSocketExecutor, then falls back to the legacy SPDYExecutor if unsuccessful, potentially requiring two request/response trips. Despite the extra roundtrip, this approach is justified because modifying the low-level SPDY and WebSocket libraries for a single handshake would be overly complex, and the additional IO load is minimal in the context of streaming operations. Additionally, as releases progress, the likelihood of a WebSocket-enabled kubectl interacting with an older, non-WebSocket API Server decreases.

#4706 Deprecate and remove kustomize from kubectl

Stage: Net New to Alpha
Feature group: sig-cli

The update was deferred from the Kubernetes 1.31 release. Kustomize was initially integrated into kubectl to enhance declarative support for Kubernetes objects. However, with the development of various customization and templating tools over the years, kubectl maintainers now believe that promoting one tool over others is not appropriate. Decoupling Kustomize from kubectl will allow each project to evolve at its own pace, avoiding issues with mismatched release cycles that can lead to kubectl users working with outdated versions of Kustomize. Additionally, removing Kustomize will reduce the dependency graph and the size of the kubectl binary, addressing some dependency issues that have affected the core Kubernetes project.

#3104 Separate kubectl user preferences from cluster configs

Stage: Net New to Alpha
Feature group: sig-cli

Kubectl, one of the earliest components of the Kubernetes project, upholds a strong commitment to backward compatibility. We aim to let users opt into new features (like delete confirmation), which might otherwise disrupt existing CI jobs and scripts. Although kubeconfig has an underutilized field for preferences, it isn’t ideal for this purpose. New clusters usually generate a new kubeconfig file with credentials and host details, and while these files can be merged or specified by path, we believe server configuration and user preferences should be distinctly separated.

To address these needs, the Kubernetes maintainers proposed introducing a kuberc file for client preferences. This file will be versioned and structured to easily incorporate new behaviors and settings for users. It will also allow users to define kubectl command aliases and default flags. With this change, we plan to deprecate the kubeconfig Preferences field. This separation ensures users can manage their preferences consistently, regardless of the –kubeconfig flag or $KUBECONFIG environment variable.

Kubernetes 1.31 instrumentation

#2305 Metric cardinality enforcement

Stage: Graduating to Stable
Feature group: sig-instrumentation

Metrics turning into memory leaks pose significant issues, especially when they require re-releasing the entire Kubernetes binary to fix. Historically, we’ve tackled these issues inconsistently. For instance, coding mistakes sometimes cause unintended IDs to be used as metric label values. 

In other cases, we’ve had to delete metrics entirely due to their incorrect use. More recently, we’ve either removed metric labels or retroactively defined acceptable values for them. Fixing these issues is a manual, labor-intensive, and time-consuming process without a standardized solution.

This stable update should address these problems by enabling metric dimensions to be bound to known sets of values independently of Kubernetes code releases.

Network in Kubernetes 1.31

#3836 Ingress Connectivity Reliability Improvement for Kube-Proxy

Stage: Graduating to Stable
Feature group: sig-network

This enhancement finally introduces a more reliable mechanism for handling ingress connectivity for endpoints on terminating nodes and nodes with unhealthy Kube-proxies, focusing on eTP:Cluster services. Currently, Kube-proxy’s response is based on its healthz state for eTP:Cluster services and the presence of a Ready endpoint for eTP:Local services. This KEP addresses the former.

The proposed changes are:

  1. Connection Draining for Terminating Nodes:
    Kube-proxy will use the ToBeDeletedByClusterAutoscaler taint to identify terminating nodes and fail its healthz check to signal load balancers for connection draining. Other signals like .spec.unschedulable were considered but deemed less direct.
  1. Addition of /livez Path:
    Kube-proxy will add a /livez endpoint to its health check server to reflect the old healthz semantics, indicating whether data-plane programming is stale.
  1. Cloud Provider Health Checks:
    While not aligning cloud provider health checks for eTP:Cluster services, the KEP suggests creating a document on Kubernetes’ official site to guide and share knowledge with cloud providers for better health checking practices.

#4444 Traffic Distribution to Services

Stage: Graduating to Beta
Feature group: sig-network

To enhance traffic routing in Kubernetes, this KEP proposes adding a new field, trafficDistribution, to the Service specification. This field allows users to specify routing preferences, offering more control and flexibility than the earlier topologyKeys mechanism. trafficDistribution will provide a hint for the underlying implementation to consider in routing decisions without offering strict guarantees.

The new field will support values like PreferClose, indicating a preference for routing traffic to topologically proximate endpoints. The absence of a value indicates no specific routing preference, leaving the decision to the implementation. This change aims to provide enhanced user control, standard routing preferences, flexibility, and extensibility for innovative routing strategies.

#1880 Multiple Service CIDRs

Stage: Graduating to Beta
Feature group: sig-network

This proposal introduces a new allocator logic using two new API objects: ServiceCIDR and IPAddress, allowing users to dynamically increase available Service IPs by creating new ServiceCIDRs. The allocator will automatically consume IPs from any available ServiceCIDR, similar to adding more disks to a storage system to increase capacity.

To maintain simplicity, backward compatibility, and avoid conflicts with other APIs like Gateway APIs, several constraints are added:

  • ServiceCIDR is immutable after creation.
  • ServiceCIDR can only be deleted if no Service IPs are associated with it.
  • Overlapping ServiceCIDRs are allowed.
  • The API server ensures a default ServiceCIDR exists to cover service CIDR flags and the “kubernetes.default” Service.
  • All IPAddresses must belong to a defined ServiceCIDR.
  • Every Service with a ClusterIP must have an associated IPAddress object.
  • A ServiceCIDR being deleted cannot allocate new IPs.

This creates a one-to-one relationship between Service and IPAddress, and a one-to-many relationship between ServiceCIDR and IPAddress. Overlapping ServiceCIDRs are merged in memory, with IPAddresses coming from any ServiceCIDR that includes that IP. The new allocator logic can also be used by other APIs, such as the Gateway API, enabling future administrative and cluster-wide operations on Service ranges.

Kubernetes 1.31 nodes

#2400 Node Memory Swap Support

Stage: Graduating to Stable
Feature group: sig-node

The enhancement should now integrate swap memory support into Kubernetes, addressing two key user groups: node administrators for performance tuning and app developers requiring swap for their apps. 

The focus was to facilitate controlled swap use on a node level, with the kubelet enabling Kubernetes workloads to utilize swap space under specific configurations. The ultimate goal is to enhance Linux node operation with swap, allowing administrators to determine swap usage for workloads, initially not permitting individual workloads to set their own swap limits.

#4569 Move cgroup v1 support into maintenance mode

Stage: Net New to Stable
Feature group: sig-node

The proposal aims to transition Kubernetes’ cgroup v1 support into maintenance mode while encouraging users to adopt cgroup v2. Although cgroup v1 support won’t be removed immediately, its deprecation and eventual removal will be addressed in a future KEP. The Linux kernel community and major distributions are focusing on cgroup v2 due to its enhanced functionality, consistent interface, and improved scalability. Consequently, Kubernetes must align with this shift to stay compatible and benefit from cgroup v2’s advancements.

To support this transition, the proposal includes several goals. First, cgroup v1 will receive no new features, marking its functionality as complete and stable. End-to-end testing will be maintained to ensure the continued validation of existing features. The Kubernetes community may provide security fixes for critical CVEs related to cgroup v1 as long as the release is supported. Major bugs will be evaluated and fixed if feasible, although some issues may remain unresolved due to dependency constraints.

Migration support will be offered to help users transition from cgroup v1 to v2. Additionally, efforts will be made to enhance cgroup v2 support by addressing all known bugs, ensuring it is reliable and functional enough to encourage users to switch. This proposal reflects the broader ecosystem’s movement towards cgroup v2, highlighting the necessity for Kubernetes to adapt accordingly.

#24 AppArmor Support

Stage: Graduating to Stable
Feature group: sig-node

Adding AppArmor support to Kubernetes marks a significant enhancement in the security posture of containerized workloads. AppArmor is a Linux kernel module that allows system admins to restrict certain capabilities of a program using profiles attached to specific applications or containers. By integrating AppArmor into Kubernetes, developers can now define security policies directly within an app config.

The initial implementation of this feature would allow for specifying an AppArmor profile within the Kubernetes API for individual containers or entire pods. This profile, once defined, would be enforced by the container runtime, ensuring that the container’s actions are restricted according to the rules defined in the profile. This capability is crucial for running secure and confined applications in a multi-tenant environment, where a compromised container could potentially affect other workloads or the underlying host.

Scheduling in Kubernetes

#3633 Introduce MatchLabelKeys and MismatchLabelKeys to PodAffinity and PodAntiAffinity

Stage: Graduating to Beta
Feature group: sig-scheduling

This was Tracked for Code Freeze as of July 23rd. This enhancement finally introduces the MatchLabelKeys for PodAffinityTerm to refine PodAffinity and PodAntiAffinity, enabling more precise control over Pod placements during scenarios like rolling upgrades. 

By allowing users to specify the scope for evaluating Pod co-existence, it addresses scheduling challenges that arise when new and old Pod versions are present simultaneously, particularly in saturated or idle clusters. This enhancement aims to improve scheduling effectiveness and cluster resource utilization.

Kubernetes storage

#3762 PersistentVolume last phase transition time

Stage: Graduating to Stable
Feature group: sig-storage

The Kubernetes maintainers plan to update the API server to support a new timestamp field for PersistentVolumes, which will record when a volume transitions to a different phase. This field will be set to the current time for all newly created volumes and those changing phases. While this timestamp is intended solely as a convenience for cluster administrators, it will enable them to list and sort PersistentVolumes based on the transition times, aiding in manual cleanup and management.

This change addresses issues experienced by users with the Delete retain policy, which led to data loss, prompting many to revert to the safer Retain policy. With the Retain policy, unclaimed volumes are marked as Released, and over time, these volumes accumulate. The timestamp field will help admins identify when volumes last transitioned to the Released phase, facilitating easier cleanup. 

Moreover, the generic recording of timestamps for all phase transitions will provide valuable metrics and insights, such as measuring the time between Pending and Bound phases. The goals are to introduce this timestamp field and update it with every phase transition, without implementing any volume health monitoring or additional actions based on the timestamps.

#3751 Kubernetes VolumeAttributesClass ModifyVolume

Stage: Graduating to Beta
Feature group: sig-storage

The proposal introduces a new Kubernetes API resource, VolumeAttributesClass, along with an admission controller and a volume attributes protection controller. This resource will allow users to manage volume attributes, such as IOPS and throughput, independently from capacity. The current immutability of StorageClass.parameters necessitates this new resource, as it permits updates to volume attributes without directly using cloud provider APIs, simplifying storage resource management.

VolumeAttributesClass will enable specifying and modifying volume attributes both at creation and for existing volumes, ensuring changes are non-disruptive to workloads. Conflicts between StorageClass.parameters and VolumeAttributesClass.parameters will result in errors from the driver. 

The primary goals include providing a cloud-provider-independent specification for volume attributes, enforcing these attributes through the storage, and allowing workload developers to modify them non-disruptively. The proposal does not address OS-level IO attributes, inter-pod volume attributes, or scheduling based on node-specific volume attributes limits, though these may be considered for future extensions.

#3314 CSI Differential Snapshot for Block Volumes

Stage: Net New to Alpha
Feature group: sig-storage

This enhancement was removed from the Kubernetes 1.31 milestone. It aims at enhancing the CSI specification by introducing a new optional CSI SnapshotMetadata gRPC service. This service allows Kubernetes to retrieve metadata on allocated blocks of a single snapshot or the changed blocks between snapshots of the same block volume. Implemented by the community-provided external-snapshot-metadata sidecar, this service must be deployed by a CSI driver. Kubernetes backup applications can access snapshot metadata through a secure TLS gRPC connection, which minimizes load on the Kubernetes API server.

The external-snapshot-metadata sidecar communicates with the CSI driver’s SnapshotMetadata service over a private UNIX domain socket. The sidecar handles tasks such as validating the Kubernetes authentication token, authorizing the backup application, validating RPC parameters, and fetching necessary provisioner secrets. The CSI driver advertises the existence of the SnapshotMetadata service to backup applications via a SnapshotMetadataService CR, containing the service’s TCP endpoint, CA certificate, and audience string for token authentication.

Backup applications must obtain an authentication token using the Kubernetes TokenRequest API with the service’s audience string before accessing the SnapshotMetadata service. They should establish trust with the specified CA and use the token in gRPC calls to the service’s TCP endpoint. This setup ensures secure, efficient metadata retrieval without overloading the Kubernetes API server.

The goals of this enhancement are to provide a secure CSI API for identifying allocated and changed blocks in volume snapshots, and to efficiently relay large amounts of snapshot metadata from the storage provider. This API is an optional component of the CSI framework.

Other enhancements in Kubernetes 1.31

#4193 Bound service account token improvements

Stage: Graduating to Beta
Feature group: sig-auth

The proposal aims to enhance Kubernetes security by embedding the bound Node information in tokens and extending token functionalities. The kube-apiserver will be updated to automatically include the name and UID of the Node associated with a Pod in the generated tokens during a TokenRequest. This requires adding a Getter for Node objects to fetch the Node’s UID, similar to existing processes for Pod and Secret objects.

Additionally, the TokenRequest API will be extended to allow tokens to be bound directly to Node objects, ensuring that when a Node is deleted, the associated token is invalidated. The SA authenticator will be modified to verify tokens bound to Node objects by checking the existence of the Node and validating the UID in the token. This maintains the current behavior for Pod-bound tokens while enforcing new validation checks for Node-bound tokens from the start.

Furthermore, each issued JWT will include a UUID (JTI) to trace the requests made to the apiserver using that token, recorded in audit logs. This involves generating the UUID during token issuance and extending audit log entries to capture this identifier, enhancing traceability and security auditing.

#3962 Mutating Admission Policies

Stage: Net New to Alpha
Feature group: sig-api-machinery

Continuing the work started in KEP-3488, the project maintainers have proposed adding mutating admission policies using CEL expressions as an alternative to mutating admission webhooks. This builds on the API for validating admission policies established in KEP-3488. The approach leverages CEL’s object instantiation and Server Side Apply’s merge algorithms to perform mutations.

The motivation for this enhancement stems from the simplicity needed for common mutating operations, such as setting labels or adding sidecar containers, which can be efficiently expressed in CEL. This reduces the complexity and operational overhead of managing webhooks. Additionally, CEL-based mutations offer advantages such as allowing the kube-apiserver to introspect mutations and optimize the order of policy applications, minimizing reinvocation needs. In-process mutation is also faster compared to webhooks, making it feasible to re-run mutations to ensure consistency after all operations are applied.

The goals include providing a viable alternative to mutating webhooks for most use cases, enabling policy frameworks without webhooks, offering an out-of-tree implementation for compatibility with older Kubernetes versions, and providing core functionality as a library for use in GitOps, CI/CD pipelines, and auditing scenarios.

#3715 Elastic Indexed Jobs

Stage: Graduating to Stable
Feature group: sig-apps

Also graduating to Stable, this feature will allow for mutating spec.completions on Indexed Jobs when it matches and is updated with spec.parallelism. The success and failure semantics remain unchanged for jobs that do not alter spec.completions. For jobs that do, failures always count against the job’s backoffLimit, even if spec.completions is scaled down and the failed pods fall outside the new range. The status.Failed count will not decrease, but status.Succeeded will update to reflect successful indexes within the new range. If a previously successful index is out of range due to scaling down and then brought back into range by scaling up, the index will restart.

If you liked this, you might want to check out our previous ‘What’s new in Kubernetes’ editions:

Get involved with the Kubernetes project:

The post Kubernetes 1.31 – What’s new? appeared first on Sysdig.

]]>
Transforming enterprise data from leaky sieve to Fort Knox https://sysdig.com/blog/transforming-enterprise-data-from-leaky-sieve-to-fort-knox/ Thu, 25 Jul 2024 14:15:00 +0000 https://sysdig.com/?p=91917 Enterprises today face significant challenges in managing, governing, and securing corporate data. Data moves and is shared more ubiquitously than...

The post Transforming enterprise data from leaky sieve to Fort Knox appeared first on Sysdig.

]]>
Enterprises today face significant challenges in managing, governing, and securing corporate data. Data moves and is shared more ubiquitously than we likely recognize. Through the use of large language models (LLMs), shared with third-party vendors, or exposed on the dark web, there are blind spots that hinder the security and IT teams’ visibility into where data resides and how and by whom it’s accessed. Without this crucial visibility, effectively managing data access becomes near impossible. Whether our data is loaded to a LLM or shared with a vendor, it has never been more exposed to risks.

Data governance practices, including classification, mapping, and access controls, are even more challenging with the technologies and applications that modern enterprises rely upon, including data lakes, APIs, and cloud storage. Adding to this operational complexity is the increasing regulation around privacy.

The bottom line is that companies are held to account for how they handle data.

For too many security leaders, data visibility starts with the data breach. Only then are they aware of the data mismanagement within third-party applications that were discovered in the breach. Breach notification is the wrong time to realize that the service agreements with your vendors or partners did not require reasonable security practices over your organization’s data.

When a data breach happens, you often don’t know where it started or the application, vendor, or source involved. Without knowing how and why the data found its way to the Dark Web, there is no way to determine the appropriate response. “Batten down the hatches” is a bad order if you don’t know which hatches need battening. However, tighter data access controls will make it easier to know who had access to the stolen information. Leverage best practice principles like least privilege, need-to-know, and separation of duties and consider using digital watermarks to track and trace the movement of your sensitive data.

Dark web exposure

CISOs are pressured by the business to regain access to corporate data and bring systems back online quickly following a ransomware attack. In many cases, this haste to restore business functionality results in incomplete eradication of the threat actor and investigation of the true root cause of the attack is often overlooked. Frequently, this results in recycled extortion attempts as network access or exfiltrated corporate data are sold and traded in nefarious circles on the Dark Web. Because investigative practices were incomplete, how this data was compromised is never fully understood. 

Clearly, CIOs and CISOs must be engaged earlier in the data governance lifecycle. Specifically, both roles should understand data classifications, data flows and interfaces, and appropriate controls from an entity perspective. Their insights will help mitigate risk to corporate data either through internal data misuse or data compromise by a threat actor.

Data leaks from the inside out

Inadvertent misuse by employees can be just as impactful as data exfiltration by a threat actor. Take, for example, large language models (LLMs). Employees will leverage free and low-cost LLMs for research and analysis by inputting corporate data in their questions and queries into these models. These tools themselves are not the issue, it’s how they’re used that causes problems. CIOs and CISOs can write as many memos as they like regarding safe data handling, but expediency trumps data governance and security far too often.

LLMs ingest and potentially share your corporate data with other platform users when providing answers. Not only this, but the companies behind the LLMs – which profit from gathering and selling data – will have access to this information as well. In essence, you may lose intellectual property rights over the content uploaded to these systems. For example, look at Section 6.3 of CoPilot’s Terms of Service

“Customer grants to CoPilot a perpetual, worldwide, royalty-free, non-exclusive, irrevocable license to use reproduce, process, and display the Customer Data in an aggregated and anonymized format for CoPilot’s internal business purposes, including without limitation to develop and improve the Service, the System, and CoPilot’s other products and services.”

Third-party data mishandling

Then, there is the third-party data loss. Most corporations rely upon third-party services to collect, process, and store their data. Even when your third parties maintain strict security and data governance controls, there is always an exposure risk if your service provider is compromised. These incidents are not isolated and are now increasingly commonplace. Notable recent examples include the Lash Group, Change Healthcare, and American Express breaches. These breaches highlight how significant and impactful third-party incidents can be. 

As discussed in a previous blog, one way in which CISOs and CIOs can address this problem head on is by ensuring their vendors, suppliers, and partners have defensible security programs backed up by contract provisions that protect your company when security incidents occur. Your contracts should codify your security, privacy, and risk management requirements accordingly.

Unite and conquer

Data governance is a team sport, and IT and security teams cannot operate alone; they require collaboration with key business stakeholders across the organization. With different perspectives, these business stakeholders understand the context of third-party relationships, the nature and extent of the data employed by the company, and the potential impacts on the business if this data is compromised. It’s critical that any remnants of the historical rifts between DevOps and security that make effective data governance challenging be swept away. Visibility and risk mitigation in the cloud are underpinned by collaboration. Given the number of systems, technologies, services, and regulatory requirements that organizations confront, collaboration should not be viewed as a nice to have, but an operational imperative. 

CISOs and CIOs are uniquely positioned to drive this collaboration. One powerful option is to establish a data governance committee of key stakeholders from security, legal, compliance, investor relations, procurement, IT, risk management, and finance. Together, draft a committee charter that ensures stakeholders have a duty to report data governance risks. It should also outline roles and responsibilities throughout the data lifecycle of the organization, including who is authorized to make risk decisions related to specific, high-value data sets. In addition, use a risk register to capture identified risk factors and recommended risk mitigations. Companies that focus on data governance will likely be more resilient when confronting risks to company data.

Conclusion

Managing and securing data is a challenge and without visibility, managing data access is nearly impossible. Data governance practices are complicated by modern technologies and are further complicated by privacy regulations. Security incidents highlight visibility blind spots, revealing that our data is more widely distributed and shared than we often realize.

CISOs and CIOs must engage early in the data governance lifecycle to understand data classifications, mapping, and access controls, and bring this knowledge to stakeholders across the organization. Risk mitigation of data leaks comes from proper understanding, handling, and control throughout the data lifecycle of the organization, from employees to third parties.

The post Transforming enterprise data from leaky sieve to Fort Knox appeared first on Sysdig.

]]>
SANS Cloud-Native Application Protection Platforms (CNAPP) Buyers Guide https://sysdig.com/blog/sans-cnapp-buyers-guide/ Thu, 25 Jul 2024 11:00:00 +0000 https://sysdig.com/?p=69509 The SANS Cloud-Native Application Protection Platform (CNAPP) Buyers Guide gives companies a deep dive into what to look for in...

The post SANS Cloud-Native Application Protection Platforms (CNAPP) Buyers Guide appeared first on Sysdig.

]]>
The SANS Cloud-Native Application Protection Platform (CNAPP) Buyers Guide gives companies a deep dive into what to look for in a CNAPP solution. As organizations continue to shift towards integrated platform-based solutions for their cloud security needs, it becomes critical to evaluate whether a CNAPP solution meets all the requirements across use cases like posture management, permissions management, vulnerability management, and threat detection and response. Ideally, teams will be able to unify these capabilities within a single, comprehensive platform to manage risk and defend against attacks.

The SANS CNAPP Buyers Guide provides an in-depth look at what criteria to consider when purchasing a CNAPP solution, as well as a checklist of required and desired capabilities for the security platform. By utilizing this guide as a resource to navigate the buying process, you can ensure your security platform provides a unified cloud and container security experience with no blind spots. Download the full guide here.

Why Purchase a CNAPP?

The explosive growth of cloud and containers has created an expanded and dynamic attack surface that security teams need to defend. As more developers deploy containerized microservices and utilize cloud services and infrastructure, monitoring and protecting them becomes more complex. Security teams now have dynamic workloads with 10–100x more containerized compute instances, large volumes of cloud assets with dynamic activity to track, and messy and overly permissive identity and access management (IAM) permissions to manage. This rapid expansion of the attack surface in cloud-native applications has led to many vulnerabilities, misconfigurations, and security weaknesses to manage, and security teams need a tool that provides full visibility across cloud and containers.

As weaknesses in security posture have increased, security and operations teams have become overwhelmed by the number of alerts and vulnerabilities they face, leaving organizations with long exposure windows to critical vulnerabilities. As the adoption of cloud services and containers/Kubernetes increases the sources of data to analyze, you need a way to process all this data into insights that can be applied to remediating security issues. Without significant additional context on your cloud workloads and infrastructure, it is difficult for teams to prioritize which of these alerts actually present significant risks and which are just noise. An effective CNAPP will use knowledge of which containers and packages are actually running to provide actionable insights that security and DevOps teams can use to prioritize the most critical risks.

The move to the cloud has also led to an evolution in the threat landscape to take advantage of the security gaps in cloud-native applications. Bad actors have adapted their tactics and techniques to quickly compromise cloud environments with valid credentials, find and exploit vulnerabilities, and move laterally across workloads and clouds to extract maximum return from any breach. The changes to the threat landscape call for a complete solution that can detect these modern threats throughout your cloud-native infrastructure.

Traditional Tools Fall Short

Many traditional security tools are not suited to cloud workloads, environments, and the threats that have evolved to take advantage of their weaknesses. Tools like endpoint detection and response (EDR) solutions lack critical visibility into cloud services, workloads, and Kubernetes, and create blind spots that can easily be exploited. Traditional tools also often send many alerts and signals, but lack the context needed to rapidly and effectively respond to threats in cloud-based applications and workloads. The dynamic nature of software development and deployment, as well as the ephemeral nature of containerized environments, only add to the complexity, and security and DevOps teams need a security tool specifically designed to handle cloud-native environments.

Further, point solutions don’t work. Often organizations must choose from among multiple solutions, or even choose vendors that stitch together a workflow from multiple acquisitions. These tools don’t communicate with each other or share context, resulting in a reactive approach of dealing with disparate vulnerability findings, posture violations, and threats as they become a problem. This approach leaves teams without the insights they need to prioritize issues based on their impact.

What to Look for in a CNAPP Solution

Security and DevOps teams need comprehensive visibility into workloads, cloud activity, and user behavior in real time. The number of signals that teams have to make sense of is exploding, and a comprehensive CNAPP solution needs to help users focus on the most critical risks in their cloud-native infrastructure.

This is where having deep knowledge of what’s running right now can help you shrink the list of things that need attention first. Simply put, knowledge of what’s running (or simply what’s in use) is the necessary context needed by security and DevOps teams to take action on the most critical risks first. Ultimately, this context can be fed back early in the development lifecycle to make “shift-left” better with actionable prioritization. With all the sources of data that a CNAPP has to ingest and analyze, an effective CNAPP solution needs runtime insights to help teams focus on the risks that really matter. For example, by filtering on vulnerabilities in packages active at runtime, you can reduce vulnerability noise by up to 95%.

With the SANS CNAPP Buyers Guide, you can make sure your organization is focused on the most critical risks in your cloud infrastructure. The guide includes a detailed checklist of important capabilities and features to look for in a CNAPP solution. While there are too many to list here in full, the capabilities of an effective CNAPP solution fall into these areas.

User Experience: Many solutions today are not intuitive and may be difficult to work with. Effective CNAPP solutions should offer unified security and risk dashboards, as well as aggregated security findings and remediation suggestions through simple interfaces. They should also be simple to deploy.

Cloud Workload Protection (CWP): A CNAPP solution should protect workloads across the software lifecycle, with capabilities in vulnerability management, configuration management for containers/Kubernetes, and runtime security/incident response. The ability to prioritize the most critical vulnerabilities or configurations based on in-use risk exposure is key. The tool should integrate with CI/CD tools, provide rich context to investigate alerts, and give suggestions to fix at the source.

Cloud Security Posture Management (CSPM): Continuous visibility, detection, and remediation of cloud security misconfigurations is key for a CNAPP solution. The solution should offer capabilities in cloud vulnerability management, configuration management, and permissions/entitlement management (e.g., CIEM).

Cloud Detection and Response (CDR): Detection and response capabilities related to cloud-centric threats are critical. Effective CNAPP solutions should expand beyond just workload runtime security and address the cloud control plane to detect suspicious activities across users and services.

Enterprise-grade Platform: Effective CNAPP solutions often have enhancements and additional features that integrate and align with API use, scripting and automation functionality, auditing and logging, and support for large-scale deployments.

Want to see the full list of capabilities? Download the full SANS Cloud-Native Application Protection Platform (CNAPP) Buyers Guide now for all the details.

The post SANS Cloud-Native Application Protection Platforms (CNAPP) Buyers Guide appeared first on Sysdig.

]]>
Introducing Layered Analysis for enhanced container security https://sysdig.com/blog/layered-analysis-for-enhanced-container-security/ Tue, 23 Jul 2024 14:00:00 +0000 https://sysdig.com/?p=91685 Containerized applications deliver exceptional speed and flexibility, but they also bring complex security challenges, particularly in managing and mitigating vulnerabilities...

The post Introducing Layered Analysis for enhanced container security appeared first on Sysdig.

]]>
Containerized applications deliver exceptional speed and flexibility, but they also bring complex security challenges, particularly in managing and mitigating vulnerabilities within container images. To tackle these issues, we are excited to introduce Layered Analysis — an important enhancement that provides precise and actionable security insights.

What’s new: Layered Analysis capabilities

Layered Analysis enhances our container security toolkit by offering a granular view of container images, breaking them down into their composing layers. This capability enables more accurate identification of vulnerabilities and optimized remediation workflows by clearly discerning whether vulnerabilities belong to the base image or the application layers, aiding in proper team assignment and resolution.

Key benefits

  • Enhanced accuracy and reduced time to fix: Identify vulnerabilities at each container image layer, pinpointing the specific package and instruction responsible, thereby reducing fix time.
  • Facilitate attribution and ownership: Discern whether vulnerabilities belong to the base image or the application layers, aiding in proper team assignment and resolution.
  • Actionable insights: Receive practical, contextual recommendations to expedite and prioritize vulnerability resolution.

Detailed insights with Layered Analysis

Container images are constructed in layers, with each change or instruction during the build process creating a new layer. Layered Analysis helps detect and display vulnerabilities and packages associated with each image layer, identifying different remediation actions and ownership depending on the layer introducing the vulnerabilities.

Enhanced Container Security

For example, vulnerabilities in the base OS layer, such as an end-of-life (EOL) Alpine version, can be remediated by updating the base image version, a task typically performed by the security team. In contrast, vulnerabilities in the application or non-OS layers, such as outdated Go libraries like Gin or Echo, can be addressed by updating the versions of libraries and dependencies, tasks that fall to the development teams.

Request a Demo

Request a personalized demo by one of our experts and explore Enhanced Container Security.

How to enable and use Layered Analysis

Layered Analysis is now generally available and requires the following components for full functionality:

  • Cluster and Registry Scanners: Automatically supported with platform scanning.
  • CLI Version 1.12.0 or Higher: Ensure you are using the latest CLI version.
  • CLI Enhancements: Utilize new flags (–separate-by-layer and –separate-by-image) to modify output and view image hierarchy or layer information.
  • JSON Outputs: Updated to include new fields for detailed layer information.

Exploring the image hierarchy

Understanding the image hierarchy is key to Layered Analysis, as shown in the screenshot below.

This view shows the difference between base images and application layers, helping you quickly identify where vulnerabilities come from:

  • All layers: Shows the total number of vulnerabilities in the final image, including both application and OS layers. If a vulnerability is fixed in an intermediate layer, it won’t be included in the total count.
  • Base Images (prefixed with FROM): Display vulnerabilities present in the base image, including those inherited from parent images.
  • Application layers: Only show vulnerabilities introduced in the application layers, excluding those from base images.

Actionable recommendations

Layered Analysis doesn’t just identify vulnerabilities; it also provides recommendations to fix them. You’ll receive suggestions to upgrade base images, address the worst vulnerabilities in application layers, and fix problematic packages. 

These actionable insights help streamline the remediation process, ensuring that vulnerabilities are addressed efficiently and effectively.

Full visibility of image history

Layered Analysis also offers full visibility into the history of your container image. You can see packages that existed in previous layers but were removed in subsequent layers. 

While these packages no longer pose a security issue, having this historical view is invaluable for understanding the evolution of your image and ensuring comprehensive security management. 

This helps teams trace back through changes, making it easier to collaborate and maintain a secure container environment.

Investigate single layers

Another powerful feature of Layered Analysis is the ability to investigate single layers of your container image. You can see exactly what packages exist in each layer and identify any vulnerabilities introduced at that specific stage. 

This granular investigation capability allows teams to pinpoint the source of security issues and understand the impact of each layer’s changes. By isolating and analyzing single layers, you can more effectively manage and remediate vulnerabilities.

Leveraging Layered Analysis for better security

Layered Analysis empowers security and development teams by providing a clear and actionable view of container image vulnerabilities. By enhancing the precision of vulnerability identification and optimizing remediation workflows, teams can effectively reduce risks and improve overall security.

With Layered Analysis, teams can pinpoint exactly where a vulnerability was introduced, identifying the specific layer responsible. This capability is particularly useful in large organizations where multiple teams are involved in containerized applications lifecycle, from building images to deploying and monitoring their health — such as infrastructure engineers creating/curating base images, developers packaging applications, and all of them working together to make sure workloads are as secure and vulnerability free as possible and security patches are promptly applied. By tracing vulnerabilities back to their source, teams can determine responsibility and ensure accountability.

By clearly distinguishing between base image and application layer vulnerabilities, Layered Analysis enables more efficient routing of remediation tasks. Security teams can focus on updating base images to mitigate inherited vulnerabilities, while development teams handle issues within the application layers. This structured approach not only streamlines the remediation process but also enhances the overall security posture of containerized environments.

Want to learn more? Reach out to your Sysdig representative, or book a demo here!

The post Introducing Layered Analysis for enhanced container security appeared first on Sysdig.

]]>
Sysdig Threat Research Team – Black Hat 2024 https://sysdig.com/blog/sysdig-threat-research-team-black-hat-2024/ Mon, 22 Jul 2024 14:00:00 +0000 https://sysdig.com/?p=91686 The Sysdig Threat Research Team (TRT)  is on a mission to help secure innovation at cloud speeds. A group of...

The post Sysdig Threat Research Team – Black Hat 2024 appeared first on Sysdig.

]]>
The Sysdig Threat Research Team (TRT)  is on a mission to help secure innovation at cloud speeds.

A group of some of the industry’s most elite threat researchers, the Sysdig TRT discovers and educates on the latest cloud-native security threats, vulnerabilities, and attack patterns.

We are fiercely passionate about security and committed to the cause. Stay up to date here on the latest insights, trends to monitor, and crucial best practices for securing your cloud-native environments.

Below, we will detail the latest research and how we have improved the security ecosystem.

And if you want to chat with us further, look us up at the Sysdig booth at Black Hat 2024!

LLMJACKING

The Sysdig Threat Research Team (TRT) recently observed a new attack known as LLMjacking. This attack leverages stolen cloud credentials to target ten cloud-hosted large language model (LLM) services.

Once initial access was obtained, they exfiltrated cloud credentials and gained access to the cloud environment, where they attempted to access local LLM models hosted by cloud providers: in this instance, a local Claude (v2/v3) LLM model from Anthropic was targeted. If undiscovered, this type of attack could result in over $46,000 of LLM consumption costs per day for the victim.

Sysdig researchers discovered evidence of a reverse proxy for LLMs being used to provide access to the compromised accounts, suggesting a financial motivation.  However, another possible motivation is to extract LLM training data. 

All major cloud providers, including Azure Machine Learning, GCP’s Vertex AI, and AWS Bedrock, now host large language model (LLM) services. These platforms provide developers with easy access to various popular models used in LLM-based AI. 

The attackers are looking to gain access to a large amount of LLM models across different services. No legitimate LLM queries were actually run during the verification phase. Instead, just enough was done to figure out what the credentials were capable of and any quotas. In addition, logging settings are also queried where possible. This is done to avoid detection when using the compromised credentials to run their prompts.

The ability to quickly detect and respond to those threats is crucial for maintaining strong defense systems. Essential tools like Falco, Sysdig Secure, and CloudWatch Alerts help monitor runtime activity and analyze cloud logs to identify suspicious behaviors. Comprehensive logging, including verbose logging, provides deep visibility into the cloud environment’s activities. This detailed information allows organizations to gain a nuanced understanding of critical actions, such as model invocations, within their cloud infrastructure.

SSH-SNAKE

SSH-Snake is a self-modifying worm that leverages SSH credentials discovered on a compromised system to start spreading itself throughout the network. The worm automatically searches through known credential locations and shell history files to determine its next move. SSH-Snake is actively being used by threat actors in offensive operations. 

Sysdig TRT uncovered the command and control (C2) server of threat actors deploying SSH-Snake. This server holds a repository of files containing the output of SSH-Snake for each of the targets they have gained access to. 

Filenames found on the C2 server contain IP addresses of victims, which allowed us to make a high-confidence assessment that these threat actors are actively exploiting known Confluence vulnerabilities in order to gain initial access and deploy SSH-Snake. This does not preclude other exploits from being used, but many of the victims are running Confluence.  

The output of SSH-Snake contains the credentials found, the targets’ IPs, and the victims’ bash history. The victim list is growing, which means that this is an ongoing operation. At the time of writing, the number of victims is approximately 300.

The Rebirth Botnet

In March 2024, the Sysdig Threat Research Team (TRT) began observing attacks against one of our Hadoop honeypot services from the domain “rebirthltd[.]com.” Upon investigation, we discovered that the domain pertains to a mature and increasingly popular DDoS-as-a-Service botnet: the Rebirth Botnet. The service is based on the Mirai malware family, and the operators advertise its services through Telegram and an online store (rebirthltd.mysellix[.]io).

The threat actors operating the botnet are financially motivated and advertise their service primarily to the video gaming community. Although there is no evidence that this botnet is not being purchased beyond gaming-related purposes, organizations may still be at risk of being exploited and being part of the botnet. We’ve taken a detailed look at how this group operates from a business and technical point of view.  

At the core of RebirthLtd’s business is its DDoS botnet, which is rented out to whomever is willing to pay. RebirthLtd offers its services through a variety of packages listed on a web-based storefront that has been registered since August 2022. The cheapest plan, for which a buyer can purchase a subscription and immediately receive access to the botnet’s services, is priced at $15. The basic plan seems to only include access to the botnet’s executables and limited functionalities in terms of the available number of infected clients. More expensive plans include API access, C2 servers availability, and improved features, such as the number of attacks per second that can be launched.

The botnet’s main services target video game streaming platforms for financial gain, as its Telegram channel claims that RebirthHub (another moniker for the botnet, along with RebirthLtd) is capable of “hitting almost all types of game servers.” The Rebirth admin team is quite active on YouTube and TikTok as well, where they showcase the botnet’s capabilities to potential customers. Through our investigation, we detected more than 100 undetected executables of this malware family.

SCARLETEEL

The attack graph discovered by this group is the following: 

Compromise AWS accounts by exploiting vulnerable compute services, gaining persistence, and attempting to make money using crypto miners. Had we not thwarted their attack, our conservative estimate is that their mining would have cost over $4,000 per day until stopped.

We know that they are not only after crypto mining, but stealing intellectual property as well. In their recent attack, the actor discovered and exploited a customer mistake in an AWS policy, which allowed them to escalate privileges to AdministratorAccess and gain control over the account, enabling them to do with it what they wanted. We also watched them target Kubernetes in order to scale their attack significantly.

AMBERSQUID

Keeping with the cloud threats, Sysdig TRT has uncovered a novel cloud-native cryptojacking operation which they’ve named AMBERSQUID. This operation leverages AWS services not commonly used by attackers, such as AWS Amplify, AWS Fargate, and Amazon SageMaker. The uncommon nature of these services means that they are often overlooked from a security perspective, and the AMBERSQUID operation can cost victims more than $10,000/day.

The AMBERSQUID operation was able to exploit cloud services without triggering the AWS requirement for approval of more resources, as would be the case if they only spammed EC2 instances. Targeting multiple services also poses additional challenges, like incident response, since it requires finding and killing all miners in each exploited service.

We discovered AMBERSQUID by analyzing over 1.7M Linux images to understand what malicious payloads are hiding in the container images on Docker Hub.

This dangerous container image didn’t raise any alarms during static scanning for known indicators or malicious binaries. It was only when the container was run that its cross-service cryptojacking activities became obvious. This is consistent with the findings of our 2023 Cloud Threat Report, in which we noted that 10% of malicious images are missed by static scanning alone.

MESON NETWORK

Sysdig TRT discovered a malicious campaign using the blockchain-based Meson service to reap rewards ahead of the crypto token unlock happening around March 15th 2024. Within minutes, the attacker attempted to create 6,000 Meson Network nodes using a compromised cloud account. The Meson Network is a decentralized content delivery network (CDN) that operates in Web3 by establishing a streamlined bandwidth marketplace through a blockchain protocol.

Within minutes, the attacker was able to spawn almost 6,000 instances inside the compromised account across multiple regions and execute the meson_cdn binary. This comes at a huge cost for the account owner. As a result of the attack, we estimate a cost of more than $2,000 per day for all the Meson network nodes created, even just using micro sizes. This isn’t counting the potential costs for public IP addresses which could run as much as $22,000 a month for 6,000 nodes! Estimating the reward tokens amount and value the attacker could earn is difficult since those Meson tokens haven’t had values set yet in the public market.

In the same way, as in the case of AMBERSQUID, the image looks legitimate and safe from a static point of view, which involves analyzing its layers and vulnerabilities. However, during runtime execution, we monitored outbound network traffic, and we spotted gaganode being executed and performing connections to malicious IPs.

Besides actors and new Threats, CVEs

The only purpose of STRT is not to hunt for new malicious actors, it is also to react quickly to new vulnerabilities that appear and to update the product with new rules for their detection in runtime. The last two examples are shown below.

CVE-2024-6387

On July 1st, Qualys’s security team announced CVE-2024-6387, a remotely exploitable vulnerability in the OpenSSH server. This critical vulnerability is nicknamed “regreSSHion” because the root cause is an accidental removal of code that fixed a much earlier vulnerability CVE-2006-5051 back in 2006. The race condition affects the default configuration of sshd (the daemon program for SSH).

OpenSSH versions older than 4.4p1 – unless patched for previous CVE-2006-5051 and CVE-2008-4109) – and versions between 8.5p1 and 9.8p1 are impacted. The general guidance is to update the versions. Ubuntu users can download the updated versions.

The exploitation of regreSSHion involves multiple attempts (thousands, in fact) executed in a fixed period of time. This complexity is what downgrades the CVE from “Critical” classified vulnerability to a “High” risk vulnerability, based mostly on the exploit complexity.

Using Sysdig, we can detect drift from baseline sshd behaviors. In this case, stateful detections would track the number of failed attempts to authenticate with the sshd server. Falco rules alone detect the potential Indicators of Compromise (IoCs). By pulling this into a global state table, Sysdig can better detect the spike of actual, failed authentication attempts for anonymous users, rather than focus on point-in-time alerting.

CVE-2024-3094

On March 29th, 2024, the Openwall mailing list announced a backdoor in a popular package called XZ Utils. This utility includes a library called liblzma, which is used by SSHD, a critical part of the Internet infrastructure used for remote access. When loaded, the CVE-2024-3094 affects the authentication of SSHD, potentially allowing intruders access regardless of the method.

  • Affected versions: 5.6.0, 5.6.1
  • Affected Distributions: Fedora 41, Fedora Rawhide

For Sysdig Secure users, this rule is called “Backdoored library loaded into SSHD (CVE-2024-3094)” and can be found in the Sysdig Runtime Threat Detection policy.

- rule: Backdoored library loaded into SSHD (CVE-2024-3094)

  desc: A version of the liblzma library was seen loading which was backdoored by a malicious user in order to bypass SSHD authentication.

  condition: open_read and proc.name=sshd and (fd.name endswith "liblzma.so.5.6.0" or fd.name endswith "liblzma.so.5.6.1")

  output: SSHD Loaded a vulnerable library (| file=%fd.name | proc.pname=%proc.pname gparent=%proc.aname[2] ggparent=%proc.aname[3] gggparent=%proc.aname[4] image=%container.image.repository | proc.cmdline=%proc.cmdline | container.name=%container.name | proc.cwd=%proc.cwd proc.pcmdline=%proc.pcmdline user.name=%user.name user.loginuid=%user.loginuid user.uid=%user.uid user.loginname=%user.loginname image=%container.image.repository | container.id=%container.id | container_name=%container.name|  proc.cwd=%proc.cwd )

  priority: WARNING

 tags: [host,container]

Sysdig Secure Solution

Sysdig Secure enables security and engineering teams to identify and eliminate vulnerabilities, threats, and misconfigurations in real-time. Leveraging runtime insights gives organizations an intuitive way to both visualize and analyze threat data. 

Sysdig Secure is powered by Falco’s unified detection engine. This cutting‑edge engine leverages real‑time behavioral insights and threat intelligence to continuously monitor the multi‑layered infrastructure, identifying potential security incidents. 

Whether it’s anomalous container activities, unauthorized access attempts, supply chain vulnerabilities, identity‑based threats, or simply meeting your compliance requirements, Sysdig ensures that organizations have a unified and proactive defense against these rapidly evolving threats.

MEET SYSDIG TRT AT BLACK HAT 2024

Sysdig Threat Research Team (TRT) members will be onsite at booth #1750 at BlackHat Conference 2024, August 7 – 8 in Las Vegas, to share insights from their findings and analysis of some of the hottest and most important cybersecurity topics this year.

Reserve a time to connect with the Sysdig TRT team at the show!

The post Sysdig Threat Research Team – Black Hat 2024 appeared first on Sysdig.

]]>