News Feed

Building FinOps With k0rdent Open Source – Cloud Native Now




Managing cloud finances across multi-cluster Kubernetes environments is a challenge that platform teams face daily. Although monitoring tools exist, most teams still struggle to connect infrastructure data to actionable cost-saving decisions. What’s missing is a practical way to quickly build and run specialized FinOps solutions that fit your organizational context without reinventing the wheel each time. 
This article discusses how open-source k0rdent and k0rdent Observability and FinOps (KOF) enable teams to deploy custom FinOps capabilities such as predictive cost forecasting and actionable savings recommendations at scale. 
Think of k0rdent and KOF as your standardized foundation for building custom business solutions. 
Together, they eliminate the “reinvent the wheel” problem for platform engineers. 
The power of k0rdent + KOF lies in its composable architecture. Think of it as the standardized LEGO baseplate, and your custom solutions as specialized bricks that snap perfectly into place.  
The FinOps Agent is one such brick that snaps into the infrastructure and provides foresight and optimization. The beauty of this model lies in its standardization. 
This approach empowers users to continuously enhance their multi-cluster Kubernetes platforms by developing custom components tailored to unique business challenges.  
The FinOps Agent leverages the Time Series Optimized Transformer for Observability (TOTO) model to deliver enterprise-grade forecasting capabilities. Its real business value comes from transforming standard Kubernetes metrics into actionable financial insights across the entire infrastructure. 
It seamlessly plugs into the existing monitoring stack and delivers: 
It follows a simple workflow. 
The FinOps Agent follows a modular, microservices-inspired architecture that separates concerns while maintaining simplicity. 

Accurate forecasting and, by extension, actionable cost-saving recommendations depend entirely on the quality and granularity of the data. 
The FinOps Agent follows a structured collection pattern to ensure coverage at the right aggregation levels (cluster-wide, node-specific) and with configurable temporal resolutions (from 1 minute to 1 hour). This makes forecasts more accurate and recommendations more trustworthy. 
This block defines the PromQL contract that the FinOps Agent expects from KOF. At the cluster scope, we build the target cost series and normalization baselines (CPU %, memory %, node count, total memory) used by the forecaster. At the node scope, we collect per-node cost and utilization to detect idle capacity and quantify removal savings. 
 
# Cluster-level queries
CLUSTER_QUERIES = {
    “cost_usd_per_cluster”“sum(node_total_hourly_cost) by (cluster)”,
    “cpu_pct_per_cluster”“100 * (1 – avg by (cluster)  (rate(node_cpu_seconds_total{mode=”idle”}[5m])))”,
    “mem_pct_per_cluster”“100 * (1 – avg by (cluster) (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))”,
    “node_count_per_cluster”“count by (cluster) (kube_node_status_condition{condition=”Ready”,status=”true”})”,
    “mem_total_gb_per_cluster”“sum by (cluster) (node_memory_MemTotal_bytes) / 1024 / 1024 / 1024”,
}
# Node-level queries
NODE_QUERIES = {
    “cost_usd_per_node”“sum by (cluster, node) (node_total_hourly_cost)”,
    “cpu_pct_per_node”“100 * (1 – avg by (cluster, instance) (rate(node_cpu_seconds_total{mode=”idle”}[5m])))”,
    “mem_pct_per_node”“100 * (1 – avg by (cluster, instance) (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))”,
    “cpu_total_cores_per_node”“sum by (cluster, instance) (machine_cpu_cores)”,
} 
Collecting this much data reliably at scale is non-trivial; long-range Prometheus queries can time out or exhaust memory. The agent solves this by automatically chunking large queries into smaller, manageable pieces, ensuring data-handling at scale without failure. 
def _execute_chunked_query(self, promql: str, start: datetime, end: datetime):
    “””Execute query in chunks for large time range.”””
    chunk_size = timedelta(days=self.chunk_threshold_days)
    all_results = []
    current_start = start
    
    while current_start < end:
        current_end = min(current_start + chunk_size, end)
        chunk_result = self._execute_single_query(promql, current_start, current_end)
        if chunk_result:
            all_results.extend(chunk_result)
        current_start = current_end 
The agent’s intelligence comes from TOTO, a pre-trained transformer model that represents a paradigm shift in time-series forecasting. Unlike traditional models that require extensive training on domain-specific data, TOTO delivers highly accurate zero-shot predictions out of the box. This democratizes AI, allowing teams to get advanced forecasting without needing a data science team. 
TOTO generates probabilistic forecasts, providing not just a single prediction but a range of likely outcomes with confidence intervals (e.g., p10, p50, p90 quantiles). 
    {
      “metric”: {
        “__name__”: “cost_usd_per_cluster_forecast”,
        “clusterName”: “aws-ramesses-regional-0”,
        “node”: “cluster-aggregate”,
        “quantile”: “0.10”,
        “horizon”: “14d”
      },
      “values”: [10.511.29.8],
      “timestamps”: [164099520000016409988000001641002400000]
    }
    {
      “metric”: {
        “__name__”: “cost_usd_per_cluster_forecast”,
        “clusterName”: “aws-ramesses-regional-0”,
        “node”: “cluster-aggregate”,
        “quantile”: “0.50”,
        “horizon”: “14d”
      },
      “values”: [13.513.213.8],
      “timestamps”: [164099520000016409988000001641002400000]
    }
    {
      “metric”: {
        “__name__”: “cost_usd_per_cluster_forecast”,
        “clusterName”: “aws-ramesses-regional-0”,
        “node”: “cluster-aggregate”,
        “quantile”: “0.90”,
        “horizon”: “14d”
      },
      “values”: [17.516.215.8],
      “timestamps”: [164099520000016409988000001641002400000]
    } 
While forecasting provides visibility into future costs, the true value of a FinOps Agent lies in its intelligent recommendation engine which transforms predictions into actionable cost-saving opportunities. It makes data-driven decisions that can save thousands of dollars across your multi-cluster infrastructure. 
The core of the recommendation engine is the IdleCapacityOptimizer. 
class IdleCapacityOptimizer:
    def __init__(self, config: Dict[str, Any] | None = None):
        cfg = config or {}
        self.idle_cpu_th = float(cfg.get(“idle_cpu_threshold”0.30))
        self.idle_mem_th = float(cfg.get(“idle_mem_threshold”0.30))
        self.min_node_savings = int(cfg.get(“min_node_savings”1))
    def optimise(self, forecast_dict: Dict[str, Any]) -> List[Dict[str, Any]]:
        # Analyze each cluster for optimization opportunities
        # Return actionable recommendations with dollar-amount savings 
When the agent spots idle capacity, it outputs a clear, actionable recommendation. Here’s what you’d see. 
{
  “status”: “success”,
  “recommendations”: [
    {
      “cluster”: “aws-ramesses-regional-0”,
      “type”: “idle_capacity”,
      “nodes_to_remove”: “aws-ramesses-regional-0-md-8snzm-9shzt”,
      “forecast_horizon_days”: 7,
      “estimated_savings_usd”: 7.82,
      “message”: “Over the 7-day forecast, removing node ‘aws-ramesses-regional-0-md-8snzm-9shzt’ could save approximately $7.82.”
    }
  ],
  “generated_at”: “2025-01-18T10:21:16.642907”
} 
This can be parsed and visualized in Grafana, sent to finance or integrated into automation platforms. But what does it mean? 
In the above JSON, the agent found that one node stays mostly idle. Shutting it down would save ~$8 over the next seven days. For a small team, this may sound minor, but in reality: 

 
FinOps Agent includes built-in validation that runs every few forecast cycles. 
It validates forecasts regularly: 
Results are exposed via/stats so operators and finance teams can see the quality alongside the forecasts. 
def validate_forecasts(self, raw_prometheus_data):
    # Split historical data into train/test sets
    train_ratio = 0.7  # 70% training, 30% testing
    
    # Generate forecasts on training data
    # Compare with actual values in test set
    
    # Calculate accuracy metrics:
    # – MAPE (Mean Absolute Percentage Error)
    # – MAE (Mean Absolute Error)  
    # – RMSE (Root Mean Square Error) 
Here’s an example of validation stats from a production cluster: 
 
An output JSON generated by FinOps Agent: 
{
  “status”: “success”,
  “timestamp”: “2025-07-18T12:41:48.686914”,
  “validation_results”: {
    “aws-ramesses-regional-0”: {
      “cost_usd_per_cluster_cluster-aggregate”: {
        “mae”: 0.0006218099733814597,
        “mape”: 0.27760169468820095,
        “rmse”: 0.0006950463284738362
      },
      “cpu_pct_per_cluster_cluster-aggregate”: {
        “mae”: 3.592783212661743,
        “mape”: 4.132195591926575,
        “rmse”: 3.7026379108428955
      },
      …..
      “cpu_total_cores_per_node_aws-ramesses-regional-0-md-8snzm-stgb6”: {
        “mae”: 0.01127923745661974,
        “mape”: 0.563961872830987,
        “rmse”: 0.011976901441812515
      }
    }
  },
  “summary”: {
    “cluster_count”: 1,
    “clusters_with_errors”: 0,
    “metrics_validated”: 21,
    “average_mape”: 2.9
  },
  “validation_config”: {
    “train_ratio”: 0.7,
    “metrics”: [
      “mape”,
      “mae”,
      “rmse”
    ],
    “format”: “toto”
  }
} 
The FinOps Agent exposes forecasts through a RESTful HTTP API, making integration with existing tooling straightforward. 
# Get cluster-specific forecasts
GET /metrics/{cluster_name}
# List all available clusters
GET /clusters
# Retrieve optimization recommendations
GET /optimize
# Access validation statistics
GET /stats 
 
For visualization, the agent includes pre-configured Helm charts that deploy custom Grafana dashboards with Infinity Datasource integration. 

The recommended way is to install a FinOps Agent using the MultiClusterService resource in k0rdent. 
Follow the instructions in the README section to get started with the FinOps Agent: https://github.com/Mirantis/finops-agent?tab=readme-ov-file#k0rdent–kof-recommended 
Deploying the FinOps Agent across your clusters is simple thanks to k0rdent’s MultiClusterService resource. Here’s an example YAML: 
kind: MultiClusterService
metadata:
  name: forecasting-agent
spec:
  clusterSelector:
    matchLabels:
      group: production
  serviceSpec:
    services:
    – template: forecasting-agent-0-1-0
      name: forecasting-agent
      namespace: finops 
The FinOps Agent’s integration with KOF provides seamless observability insights, including pre-built Grafana dashboards. 
# Create custom GrafanaDashboard objects
apiVersion: grafana.grafana.com/v1beta1
kind: GrafanaDashboard
metadata:
  name: finops-forecasting
  namespace: monitoring
spec:
  json: |
    {
      “title”: “FinOps Forecasting Dashboard”,
      “panels”: […]
    } 
The FinOps Agent demonstrates how quick and effective it is to address real business challenges when building on the right foundation. With k0rdent and KOF’s standardized platform, you can: 
Ready to get started? Deploy the FinOps Agent as your launchpad, then start reporting metrics, take action and realize measurable cost savings. 
Satyam is a Software Engineer with Mirantis, where he actively advocates for open-source adoption and contribution. He specializes in vendor-neutral, composable infrastructure that spans Kubernetes, cloud-native observability, and edge computing. Satyam regularly contributes to projects like Kubernetes-ClusterAPI and k0s, and enjoys turning real-world lessons into practical advice through his talks and writing.
Satyam Bhardwaj has 1 posts and counting. See all posts by Satyam Bhardwaj
RSS Error: A feed could not be found at `https://devops.com/webinars/feed/`; the status code is `403` and content-type is `text/html; charset=UTF-8`

Listen to all of our podcasts



source
This article was autogenerated from a news feed from CDO TIMES selected high quality news and research sources. There was no editorial review conducted beyond that by CDO TIMES staff. Need help with any of the topics in our articles? Schedule your free CDO TIMES Tech Navigator call today to stay ahead of the curve and gain insider advantages to propel your business!

Leave a Reply