Hey, this is AmBlue. How can I help you today?

Categories
Cloud Cost Optimization Cloud Resources Management

Databricks Cost Optimization Best Practices

What is Databricks?

Databricks is a fully managed cloud-based unified analytics platform built on Apache Spark. It provides a collaborative environment for data engineers, data scientists, and analysts to process big data, conduct data science, and implement machine learning workflows.

Databricks simplifies the use of Spark by offering a managed environment where users can spin up clusters, collaborate using interactive notebooks, and access various built-in libraries for analytics, machine learning, and data engineering tasks.

 This platform not only streamlines the development and deployment of big data applications but also promotes an environment of teamwork and innovation, enabling organizations to extract actionable insights from their data more efficiently.

Related reading: What is Databricks?

Understanding Databricks Pricing:

Databricks pricing comprises two main components:

Instance Cost: This is the cost of the underlying compute instances on which Databricks clusters run. These costs depend on the instance types and the duration for which the instances are running.

Databricks Unit (DBU) Cost: A Databricks Unit (DBU) is a unit of processing capability per hour, billed on a per-second usage basis. The cost depends on the type of cluster and its configuration. Each operation performed on Databricks consumes a certain number of DBUs.

Monitor and Analyze Performance Metrics:

It is essential to set up custom configurations to gather the metrics.

Enable Custom Metrics: To monitor performance metrics like CPU and memory usage, you need to enable custom metrics on your EC2 instances. This involves using initialization (INIT) scripts to send these metrics to AWS CloudWatch. Custom metrics provide deeper insights into cluster performance and help in making informed decisions.

Create INIT Scripts: Use INIT scripts to create custom namespaces in CloudWatch/Log Analytics for each cluster. This allows us to track performance metrics like CPU and memory usage for individual clusters. For instance, you can create an INIT script to capture metrics and send them to CloudWatch. This step ensures that all necessary performance data is collected systematically.

Attach INIT Scripts to Clusters: Attach the INIT scripts to the Databricks clusters. This ensures that the necessary performance metrics are collected and sent to CloudWatch/Log Analytics whenever the cluster is active. Regular monitoring of these metrics helps in identifying inefficiencies and optimizing resource usage.

Challenges in Databricks Cost Optimization:

 Lack of Direct Performance Metrics: Earlier in Databricks, there were no direct performance metrics available. Performance metrics must be gathered from the underlying computing instances. Memory metrics require custom configurations to be reported to AWS CloudWatch/Log Analytics, adding another layer of complexity. This lack of direct visibility can make it challenging to optimize and manage costs effectively. Now in august Databricks made it available for the public access.

Limited Visibility into Resource Usage: Understanding which workloads or departments are driving up the costs can be challenging, especially in multi-tenant environments. This can make it difficult to allocate costs accurately and find optimization opportunities.

Databricks Cost Optimization Best Practices:

Enable Cluster Termination Option: During cluster configuration, enable the automatic termination option. Specify the period of inactivity after which the cluster should be terminated. Once this period is exceeded without any activity, the cluster will move to a terminated state, thus saving costs associated with running idle clusters.

Optimize Cluster Configurations: Choosing the right configuration for the Databricks clusters is essential for cost efficiency. Consider the following:

Select Appropriate Node Types: Match the node types to your workload requirements to avoid over-provisioning resources. By selecting the most suitable instance types, you can ensure that your clusters are cost-effective and performant.

DBU: Understanding the DBU consumption patterns and optimizing workloads can lead to significant cost savings.

Why CloudCADI for Databricks?

CloudCADI helps you in optimizing,

1.     Autoscaling inefficiency

Though autoscaling in databricks brings enormous benefits, it can easily add up your cloud bills without adding any value.  CloudCADI gives multiple actionable recommendations on the node resizing possibilities that can potentially save your databricks costs.

Example: Instead of  5 nodes of type – Standard_D4ads_v5 that costs $0.21 /h, you can alter it to 2 nodes of type Standard_D8as_v5 and realize 20% savings.

 2.     Cluster-node resizing inefficiency

CloudCADI analyzes the number of anomalies (inefficient CPU and Memory utilization) with its intelligent engine and gives recommendations on resizing.

Example: “Reduce the worker count from 8 to 5 for optimal usage” 

Conclusion:

Optimizing costs on Databricks involves a combination of strategic configurations, attentive monitoring, and the use of best practices for the specific workloads. By implementing cluster termination policies, monitoring performance metrics, and optimizing cluster configurations, you can ensure that your Databricks environment is both cost-effective and efficient.

Want to explore CloudCADI? Call us today : Book a Demo

Nandhini - Author
Nandhini Kumar - Senior Software Engineer

Author

Nandhini Kumar, is our  Software Engineer L2 who was part of Databricks implementation team in CloudCADI.

Categories
Cloud Cost Optimization Cloud FinOps Cloud Resources Management

Azure SQL Database Cost Optimization Tips

Azure SQL Database cost optimization is a much sought after request we get from our customers. Azure SQL Database is a powerful, fully managed database-as-a-service (DBaaS) that eliminates the need for businesses to handle maintenance, security, and reliability. Despite its versatility, effectively managing costs within Azure SQL Database can be complex, especially when dealing with different purchasing models and configuration options. This is where CloudCADI steps in, providing the insights and automation necessary to optimize your Azure SQL Database investments.

Azure SQL Database Overview

Azure SQL Database offers flexibility with two main purchasing models:

  1. vCore-Based Purchasing Model:
    • Customization: Users can select the number of vCores, memory, and storage speed, providing more control over database performance.
    • Scalability: Ideal for applications with specific resource needs, allowing for precise performance tuning.
  2. DTU-Based Purchasing Model:
    • Simplicity: Users choose the number of DTUs, which represent a mix of CPU, memory, reads, and writes.
    • Predictability: This model offers fixed compute sizes, storage, and backup retention, simplifying budgeting and cost forecasting.

How CloudCADI Enhances Azure SQL Database Management

CloudCADI is a cloud cost optimization platform that helps businesses maximize their cloud investments. By offering granular insights and actionable recommendations on Azure SQL Database, CloudCADI ensures that your database configurations are both efficient and cost-effective.

1. Intelligent Benchmarking for Optimal Configuration Selection

When migrating to Azure SQL Database, choosing the correct compute tier is crucial for balancing performance and cost. CloudCADI analyzes your existing workloads and recommend the most suitable configuration—whether it’s the vCore or DTU model. This helps prevent over-provisioning and under-utilization of resources, ensuring that your databases run efficiently.

2. Dynamic Scaling with Serverless Compute Tier

For databases with unpredictable traffic patterns, Azure SQL Database’s serverless compute tier automatically adjusts CPU resources based on demand. CloudCADI continuously monitors these traffic patterns and provides recommendations on to leverage the serverless option. This ensures you only pay for the resources you use, leading to significant cost savings.

3. Optimizing Single Database Configurations

If your database has regular traffic and predictable computational needs, selecting the appropriate configuration is key. CloudCADI helps determine when to use the Provisioned vCore or DTU models for such databases. Furthermore, if one of your databases experiences dramatically low usage than others, CloudCADI can advise to resize the configuration to optimize both performance and cost.

4. Efficient Resource Management with SQL Elastic Pools

SQL Elastic Pools allow multiple databases to share resources, making it easier to manage fluctuations in usage. CloudCADI’s advanced analytics help you allocate resources across your databases more effectively, ensuring that you’re not overpaying for underutilized capacity. If a particular database is consuming a disproportionate amount of resources within an Elastic Pool, CloudCADI will recommend  to resize the configuration, thus balancing cost and performance across your environment.

Cost Optimization Tips with CloudCADI

Effective cost management is crucial for any cloud-based database solution. CloudCADI provides several strategies to help you optimize your Azure SQL Database costs:

  • Right-Sizing Resources: CloudCADI analyzes your application’s needs and recommends the most appropriate database configuration. For example, it may suggest moving a high-usage database from an Elastic Pool to a Single Database model to reduce overall costs.
  • Performance Monitoring and Adjustments: CloudCADI continuously monitors key performance metrics such as CPU usage, DTU percentages, and deadlocks. Based on these insights, it provides actionable recommendations to scale down resources if they’re underutilized or to optimize queries for better performance.

Conclusion

CloudCADI is an indispensable tool for businesses using Azure SQL Database, helping them achieve optimal performance while controlling costs. Through intelligent benchmarking, dynamic scaling, and continuous performance monitoring, CloudCADI ensures that your Azure SQL Database configurations are tailored to your specific needs, making your cloud investment as efficient and cost-effective as possible. Whether managing a single database or a complex multi-tenant environment, CloudCADI provides the tools and insights necessary to maximize the value of your cloud infrastructure.

Product Engineering Head
Karthick Perumal - Product Eng. Head

Author

Karthick Perumal leads our Product Engineering team with over 16 years of extensive IT experience. He is a certified cloud engineer. His calm and composed approach consistently drives the team forward, ensuring focus and motivation even in challenging circumstances.

Categories
Cloud Cost Optimization Cloud FinOps Cloud Resources Management

Cloud cost optimization – Steps to make note of – Part 1

Cloud cost optimization is a practice any organization should adopt to ensure they SPEND RIGHT on the cloud. We have discussed what are the benefits of cloud cost optimization in the past. Let’s see what are common challenges in implementing the same and how to overcome them in this article.

Common Cloud Cost Optimization Challenges

While cloud cost optimization offer remarkable benefits, there are number of challenges that organization might face when trying to achieve the optimal savings. To mention a few,

  1. Lack of visibility into cloud costs : Without proper monitoring and analysis, it can be challenging to identify areas where cost optimization is required.
  2. Lack of expertise: Many organizations do not have dedicated cloud finops professionals or resources for managing cloud expenses.
  3. Lack of continuous monitoring: Without consistent monitoring, organizations might miss the opportunities for savings at the right time.

Now let’s see one by one how to overcome these below,

Step 1: Arm them

Every cloud stakeholder should be armed with documents, tutorials, training, guidance, and tools to effectively handle the cloud environment. FinOps products should have the ability to provide graphical representation and reports on cloud usage. Reports should facilitate the stakeholders to dive deep into granular pod level, node level, business unit level, tag level usage, associated cost details, etc. 

For example, our product CloudCADI offers reports and trend charts covering parameters like

  • CPU utilization
  • Memory
  • Disk Read
  • Disk Write
  • Storage Disk Read
  • Storage Disk Write
  • Network Received
  • Network Sent
  • Storage 

These reports should equip the cloud practitioners with the necessary cost information for effective decisions.

Step 2: Herd them

One of the major challenges the enterprises face is cross-functional transparency. There may be two app development teams developing two different cloud-native applications without knowing that they both use different monitoring tools that satisfy the same purpose. Procurement teams go with a vendor based on the options provided by the cloud teams and better negotiation with the vendor. They have little or no interest in the usage of the tools by diverse teams.

It is crucial to identify these common requirements and consolidate the resources accordingly. 

Step 3: Pivot on center

Cloud management is a tricky process. Cloud involves the operations team, finance team, cloud engineers, cloud architects, the procurement team, LoB managers, C-suite executives, etc. conveying a different message. Requirements vary from time to time. Organizations should have a centralized cloud cost optimization/FinOps team to mitigate the differences. Any cloud financial decision like buying new licenses, renewal, going hybrid cloud, etc., before reaching the CXO’s office should pass through the FinOps team’s scan.

After a thorough scanning of real needs and expectations, costs and business value mapping should be carried out. Once it is acknowledged, it should reach the decision maker’s table for approval. 

Related Reading: FinOps principles

Step 4: Analyze your cloud

Optimizing starts with analyzing. Review your organization’s cloud usage and spending patterns. This helps to identify the areas that needs restructuring or elimination and develop a targeted cost optimization strategy. You can either do this with a dedicated FinOps team or an effective cloud finops solution like CloudCADI.

Step 5: Retire the unused

There are resources that secretly weigh the cloud bills. Cloud practitioners set up auto-scaling to ensure enough capacity to face the traffic demands and improved cost management. Let’s consider Azure GPU machines. For high-end remote visualization, ML, and deep learning, GPU category, N-series virtual machines are ideal.

They accommodate low latency, high-throughput network interface for graphics or video-intensive workloads. When the engineers miss out on calculating the right number of nodes and configure in excess, the organization ends up paying for these zombie nodes.

For example,

Azure Instance NC12 with 1XK80 GPU offering 12 vCPUs costs $1.8 per hour. Consider 10 such instances counting 120 vCPUs configured but 5 left unused. At the end of the month, you need to pay $13140 instead of $6570 to Azure midst of no accountable benefits. 

It is hard to identify these nodes until you address these in the line items of lengthy cloud bills. For larger organizations handling several applications, identification and mitigation go out of manual efforts. Options left with us are to manually plan and closely watch the configuration process, identify the unclaimed assets, and retire (which is not always feasible) or to go with cloud cost optimization products.  

Step 6: Leverage services from your CSPs

Cloud Service Providers(CSPs) provide various cost saving options to facilitate clients in saving their cloud investments. Savings plans, discount on bulk scaling, reserved instances are a few options to make use of and realize significant cost savings.

Organizations tend to lose millions when they miss out on optimizing their new workloads along with the previous. Select a FinOps solution that runs along with your vision, each day dragging everything under one umbrella.

Keep optimizing. CloudCADI is with you!

Find this useful? Read Part 2 of Cloud Cost Optimization Steps here.

Categories
Cloud Resources Management

Cloud Tagging – Strategic Practices

Enterprises using public cloud assets need a well-thought public cloud tagging of assets, an inventory model to achieve the highest level of visibility, and utilization, thereby robust mechanisms to ensure minimal wastage. It is vital for all organizations at scale.

“The wider the cloud adoption, the more complex is the cloud cost management”

According to NASSCOM, enterprises are expected to increase their cloud budget by nearly 5-15% CAGR till FY 2025. Selection, allocation, tracking, and monitoring of the cloud resources that seem simple and manual during the initial cloud adoption stages will turn into a headache when cloud assets number multiplied by hundreds.

The Cloud infrastructure management team can make use of “Tagging” and bucket the resources under a tag defining their function in the cloud. Any cloud practitioner can easily call, filter, and organize the resources using their tags.

Even the leading cloud service providers, Microsoft, AWS, and GCP emphasize tagging as the best practice to effectively sort and ally FinOps. We covered the benefits of FinOps in our previous article. Let’s see in this article,

  • What is Cloud Tagging?
  • Benefits of Cloud Tagging
  • Cloud Tagging best practices
  • Cloud CADI & Cloud Tagging

What is Cloud Tagging?

Cloud tagging is the practice of assigning custom names to cloud resources. Its nomenclature varies from organization to organization based on their teams’ preferences and ease.

Cloud tag comprises two – Key and Value. Key conveys the categories (ex: Environment, Owner) and Value conveys the meta description (ex: Testing, Priya)

Example: Environment: Testing

Key – Environment

Value- Testing

Key and Value can be case-sensitive or insensitive based on the service provider. For example, In AWS, both keys and values are case-sensitive whereas, in Azure, keys are case-insensitive, and values are case-sensitive.

What are the benefits of Cloud Tagging?

Structured resource allocation:

Enterprises opt for the multi-cloud environment for improved efficiency. Cloud resources procured from different vendors when contributing to a common project can be tagged under one label. In this way, no cloud resources are left unused.

Streamlined governance: 

Cloud tagging introduces organized cloud resource management in the organization. The Cloud infrastructure management personnel tend to lose control of the track of assets while handling multiple workloads deployed in multiple projects. Cloud tagging helps LOB Managers, CIOs, CTOs, and CFOs easily associate and understand resources with their business value, usage frequency, time of operation, and cost.

Team-specific reports generation:

 It is crucial to identify which team involves which resource to optimize the cloud utilization. Cloud tagging helps to drill down the cloud usage specific to business units. Reports help the engineering team to assess and alter workloads. This further helps the finance team to understand which team consumes more cloud budget and frame solutions along with the other teams to curb the bills.

Aid automation: 

Automation in the cloud relieves the burden of the cloud team in handling repetitive tasks. In an agile environment, businesses should be able to scale down or up the storage bandwidth, CPU utilization, or change configurations quickly as the demand rises.

Tags allies in automating the actions like sending notifications to cloud engineers on resources that are idle for a given period; automating the storage for a specific environment(testing); automatic decommissioning or provisioning of bulk resources.

Resource access management:

 Security is still a debating factor in the cloud. Tag-based access control can alleviate data breaches and ensure the confidentiality of sensitive data transacted over the cloud. For example, we can actively allow the developing team to access only the resource with the tag, environment: dev (user-defined tag).

Traceback the roots:

CTOs and CIOs report, that finding the source for cloud spending is a daunting task in cloud infrastructure management according to Economic Times. Tagging helps to find the specific team or resource that is critical and runs for a longer duration eating most of the infrastructure budget. Once it is identified, restructuring, or rebuilding the workload brings down the unintended cloud expenditures.

Cloud Tagging Checklist

Even though cloud tagging policies vary from business to business, it’s important to make them standard and globalized across the enterprise. So that it makes sense to every team handling cloud. Defining the rules for cloud tagging during the deployment itself mitigates anomalies in the future.

It’s confidential! – Never include sensitive data to be in the tags. 

Why do you need it? – Defining the tagging needs brings better resolution in meta-describing the resources. Stakeholders responsible handling, maintaining, tracking, and improving the tags should collectively define the use cases and name tags accordingly.

Make it speak for itself – Ensure that the naming convention carries all required information like a business unit, region, unique resource identifier, criticality, etc. enough to help the business team, engineering team, and CXOs in locating, and tracking, and cost optimizing the workloads.

Consistency is the Key– Once the tag schema is developed, it is important to stick to it. The person handling the tags should be cautious while defining, improving and update the same in the rules.

Example: Env: nztesting01 is different from Env: testingnz01

Minimum suggested tags– Cloud giants like Azure, GCP, and AWS recommend below as the minimum suggested tags to include for an effective tagging process,

  • workload name
  • Data classification
  • Operations commitment
  • Operations team
  • Cost center
  • Cluster
  • Version
  • App id
  • Disaster recovery
  • Service class
  • Start date of a project
  • Owner name
  • Business unit

Tag naming limits – Have a watch on the character limits before naming your resources. Key-Value character limits vary based on the CSP you choose. For example, the Key character limit is 1-63 with UTF-8 encoding for GCP whereas, it is 1-128 for AWS resources. Key and values should contain only lowercase letters, numeric characters, underscores, and dashes in GCP. Tag keys and values are case-sensitive in AWS.

Related reading: AWS cloud tagging best practices

CloudCADI & Cloud Tagging

CloudCADI is a one-stop solution for all your FinOps shortfalls.  It pulls off all your hidden cloud resources bundling up your cloud bills without any productive output. This financial visibility in the organization makes the employees feel accountable.

CloudCADI makes use of cloud tags to enhance your cloud experience in one go.

Our Recommendations:

1. Cloud practitioners name resources before deploying them into the services or app or any other function. We suggest our clients include at least five fields in their naming convention for easy identification and filtering.

Example,

cloud tagging best practices

2. Adding multiple tags to one resource leads to multiple filters to fetch the exact resource utilization and cost data.

Examples,

Owner: Priya (person responsible for the resource)

Platform id: AZR

Region: AU (Australia)

Zone: E (East)

Our flagship product gives you comprehensive reports on the unused, under-utilized, and over-provisioned resources using tags

cloudcadi dashboard
Cloud CADI – Tag filter screenshot

Tag-based reports are detailed and specific. CloudCADI gives a simple virtual representation for a quicker overview. It allows you to further filter out and point out any resource contributing to the cloud waste.

We can implement custom automation scripts to automate the scale-up, shut down, or any repetitive cloud tasks saving manual labor.

We don’t stop right there. CloudCADI gives “intelligent recommendations” with which you can immediately realize the benefits. Our actionable insights facilitate you with the best alternatives along with the cost savings report to decide immediately.

Start leveraging now. SPEND RIGHT on cloud.

Individual privacy preferences

We use cookies and similar technologies on our website and process your personal data (e.g. IP address), for example, to personalize content and ads, to integrate media from third-party providers or to analyze traffic on our website. Data processing may also happen as a result of cookies being set. We share this data with third parties that we name in the privacy settings.

The data processing may take place with your consent or on the basis of a legitimate interest, which you can object to in the privacy settings. You have the right not to consent and to change or revoke your consent at a later time. For more information on the use of your data, please visit our privacy policy.

Below you will find an overview of all services used by this website. You can view detailed information about each service and agree to them individually or exercise your right to object.

Essential services are required for the basic functionality of the website. They only contain technically necessary services. These services cannot be objected to.  •  Show service information