Kubernetes
Insecure default configurations enabling privilege escalation
9Deploying containers with insecure settings (root user, 'latest' image tags, disabled security contexts, overly broad RBAC roles) persists because Kubernetes doesn't enforce strict security defaults. This exposes clusters to container escape, privilege escalation, and unauthorized production changes.
Change management and system modification governance
879% of production incidents originate from recent system changes. Organizations struggle with change management across multi-cluster, multi-environment estates. The complexity of change governance and its impact on stability is a persistent operational challenge.
Network policies not enforced by default
8Kubernetes clusters lack default network policies, allowing unrestricted Pod-to-Pod communication. Pods without explicit NetworkPolicy objects have no networking restrictions, significantly increasing attack surface and enabling compromised containers to direct malicious traffic to sensitive workloads.
Edge deployment challenges with low-power hardware and intermittent connectivity
8Edge computing for Kubernetes faces unique constraints: single-node clusters on low-power hardware, intermittent connectivity making remote management difficult, security concerns from hardware tampering, and deployment complexity across hundreds/thousands of sites without local expertise.
Running outdated, unsupported Kubernetes versions
831% of organizations still run unsupported Kubernetes versions, missing vital security and performance patches. Each skipped release compounds technical debt and increases API breakage risks when eventually upgrading.
Complex surrounding infrastructure requiring deep expertise
8The real challenge in Kubernetes deployment goes beyond cluster setup to configuring RBAC, secrets management, and infrastructure-as-code. Teams without prior experience make decisions that require painful redesigns later, as shown by organizations requiring 50% of their year dedicated to cluster maintenance.
Multi-cluster visibility and context gaps
8Production Kubernetes deployments span multiple clusters across clouds, regions, and environments without centralized visibility. When incidents occur, teams lack context on what broke and where, leading to slower incident detection, configuration drift, and higher outage risk.
Enforcing consistent security posture across hybrid multi-cloud
8Maintaining consistent security posture, audit trails, and supply-chain guarantees across cloud and on-premises environments with multiple vendor distributions and custom images is extremely difficult. Kubernetes distributions and custom images fragment security enforcement.
Persistent Storage and Stateful Application Limitations
7Docker's native volume management lacks comprehensive enterprise-grade stateful operations. Data integrity guarantees, backups, encryption at rest, and cross-host replication cannot be reliably accomplished using only Docker volume commands. Organizations must adopt complex external orchestration systems like Kubernetes to meet production stateful workload requirements.
Configuration drift from identical dev and prod manifests
7Using the same Kubernetes manifests across development, staging, and production without environment-specific customization leads to instability, poor performance, and security gaps. Environment factors like traffic patterns, scaling needs, and access control differ significantly.
ConfigMap and Secret management scattered across environments
7Configuration management starts simple but becomes unmaintainable with dozens of scattered ConfigMaps, duplicated values, no source of truth, and no automated rotation. Manual updates across multiple environments cause inconsistencies, forgotten updates, and lack of audit trails.
Premature adoption of advanced networking solutions
7Teams implement service meshes, custom CNI plugins, or multi-cluster communication before mastering Kubernetes' native networking primitives (Pod-to-Pod communication, ClusterIP Services, DNS, ingress). This introduces additional abstractions and failure points making troubleshooting extremely difficult.
Persistent volume provisioning failures with cryptic errors
7PersistentVolumes fail to provision correctly leaving stateful applications in pending state. Error messages are cryptic and debugging is difficult, blocking deployments.
Image bloat and unused dependencies increasing attack surface
7In-use vulnerabilities dropped below 6% in 2025, but image bloat has quintupled. Heavier, less-optimized container images increase attack surfaces despite fewer known CVEs, creating a security paradox.
No built-in monitoring and logging observability
7Standard Kubernetes lacks native observability features for monitoring cluster utilization, application errors, and performance data. Teams must deploy additional observability stacks like Prometheus to gain visibility into spiking memory, Pod evictions, and container crashes.
Operational toil and fragmented incident response workflows
7Manual deployments, inconsistent workflows, and fragmented observability across tools increase on-call load and MTTR. Engineers jump between tools during incidents instead of fixing issues, driving burnout and slower delivery due to constant firefighting.
Application security and third-party integration challenges
733% of respondents cite securing applications and integrating third-party tracing systems as pain points. Security has emerged as the #1 concern for DoK workloads, driven by complexity of securing distributed data workloads and regulatory compliance.
Storage I/O performance bottlenecks in AI/ML workloads
7Storage I/O performance is the primary performance concern (24%), followed by model/data loading times (23%). For AI/ML workloads, storage costs have become the dominant concern (50% cite as primary), reflecting enormous data requirements of training datasets and model checkpoints.
Skills shortage in Kubernetes and SRE expertise
7Managing Kubernetes add-ons, cluster operations, and platform engineering requires cross-disciplinary talent (SRE, security, developers) that is in short supply. Teams struggle to staff and retain experienced Kubernetes operators and SREs, delaying critical work.
Insufficient liveness and readiness probe configuration
7Deploying containers without explicit health checks causes Kubernetes to assume containers are functioning even when unresponsive, initializing, or stuck. The platform considers any non-exited process as 'running' without additional signals.
Developer productivity blocked by manual cluster provisioning
6Developers lack Kubernetes expertise and want to consume infrastructure without delays, but provisioning new clusters is time-consuming and expensive. This creates bottlenecks where developers wait for ops to provision infrastructure rather than focusing on feature development.
Performance optimization across diverse workload types
6Performance optimization has emerged as the #1 operational challenge (46%), displacing earlier basic adoption concerns. Organizations struggle to optimize performance across databases, AI/ML, and traditional containerized workloads simultaneously.
Massive cluster resource overprovisioning and wasted spending
699.94% of Kubernetes clusters are over-provisioned with CPU utilization at ~10% and memory at ~23%, meaning nearly three-quarters of allocated cloud spend sits idle. More than 65% of workloads run under half their requested resources, and 82% are overprovisioned.
Pod misconfiguration and affinity rule errors
6Misconfigured Kubernetes affinity rules cause Pods to schedule on incorrect Nodes or fail to schedule at all. Affinity configurations support complex behavior but are easy to misconfigure with contradictory rules or impossible selectors.
Accumulation of orphaned and unused Kubernetes resources
6Unused or outdated resources like Deployments, Services, ConfigMaps, and PersistentVolumeClaims accumulate over time since Kubernetes doesn't automatically remove resources. This consumes cluster resources, increases costs, and creates operational confusion.
Multiple ingress controller management and networking complexity
560% of respondents employ multiple ingress controllers, adding operational complexity and potential inconsistency in application networking configuration and management across clusters.
Kubernetes hasn't improved cost, security, and architectural refactoring
5More than 25% of developers report Kubernetes has made cost management worse, 13% cite worsened security posture, and 15% report hindered architectural refactoring. Kubernetes provides scalability and HA benefits but creates new problems in these critical domains.