10 PCA questions with full explanations for every option, free to view on this page.
Observability Concepts
Q1. Prometheus uses which model for collecting metrics from targets?
Reveal answer and explanations
APull over HTTP from `/metrics` endpoints
Correct. Prometheus scrapes metrics from HTTP endpoints on targets on a configured interval; targets are passive during collection.
BPush over HTTP where targets send metrics to a central Prometheus ingestion service
Incorrect. Push is used by StatsD and Graphite models; Prometheus only supports push via the Pushgateway as an exception.
CPush over UDP via StatsD with client-side aggregation before forwarding samples
Incorrect. UDP/StatsD is a different system; Prometheus uses HTTP-based pulls.
DStreaming over gRPC with bidirectional flow control and TLS mutual authentication
Incorrect. Prometheus does not use streaming gRPC for scraping.
Observability Concepts
Q2. Which statement best captures the fundamental trade-off between metrics and events (logs) as observability signals?
Reveal answer and explanations
AMetrics are aggregated numeric samples with bounded cost per series; events preserve per-occurrence detail at higher storage cost
Correct. Metrics aggregate numerically and scale with cardinality rather than request volume, while events preserve fidelity per occurrence at higher per-item storage cost.
BMetrics preserve full per-request context while events are always sampled to reduce cost
Incorrect. Metrics discard per-occurrence detail by design; they are aggregations, not per-request records.
CEvents are strictly cheaper than metrics at every scale because they compress better
Incorrect. Event storage typically grows with request volume, whereas metric storage grows with cardinality; neither is strictly cheaper at all scales.
DMetrics and events are functionally interchangeable when Prometheus is used as the backend provided that scrape jitter stays well below the configured evaluation interval window
Incorrect. Prometheus is a metrics database; it does not store arbitrary events, and the two signal types have distinct trade-offs.
Observability Concepts
Q3. The Four Golden Signals from the Google SRE book are which four metrics?
Reveal answer and explanations
ACPU, memory, disk, and network
Incorrect. These are system resources, not the SRE Golden Signals.
BAvailability, latency, throughput, and cost
Incorrect. Although related to reliability, these are not the canonical Golden Signals.
CRate, errors, duration, and utilization
Incorrect. This mixes RED and USE; it is not the Golden Signals definition.
DLatency, traffic, errors, and saturation
Correct. The Four Golden Signals are latency, traffic, errors, and saturation — the baseline signals to monitor for any user-facing service.
Observability Concepts
Q4. A service's internal metrics all look healthy, yet real users report that the public endpoint is unreachable because DNS at the edge is failing. Which monitoring approach is specifically positioned to catch this class of outage?
Reveal answer and explanations
AWhite-box monitoring via the application's `/metrics` endpoint, because it has the most detailed view of the instrumented process internals
Incorrect. White-box metrics cannot observe DNS, TLS handshakes, or edge reachability outside the process; they see only internal behavior.
BBlack-box probing from outside the network, measuring the user's full path including DNS, TLS, and reachability
Correct. Black-box probing from an external vantage point exercises the full user-facing path, so it catches DNS, routing, or TLS issues that internal instrumentation cannot see.
CLog-based alerting on application error logs alone
Incorrect. If requests never reach the service, the application generates no logs about the failure.
DDistributed tracing of internal calls, because traces reveal all external dependencies
Incorrect. Traces originate inside the process; they do not observe DNS resolution or edge-path failures before the request arrives.
Observability Concepts
Q5. What is an error budget in SRE practice?
Reveal answer and explanations
AThe monetary cost allocated for fixing bugs each quarter and each planning cycle
Incorrect. Error budgets are reliability accounting, not financial budgets.
BThe number of on-call escalations allowed per engineer per week according to team policy
Incorrect. That describes on-call burden metrics, not error budgets.
CThe maximum number of alerts that may fire in production without formal review board approval
Incorrect. Alert volume is a symptom of noise, not an error budget concept.
DPermitted unreliability derived from `1 - SLO`
Correct. The error budget equals `1 - SLO` applied to the measured window — it quantifies how much unreliability is acceptable before halting risky changes.
Observability Concepts
Q6. You are instrumenting a request-serving microservice and a Kubernetes node-level daemon that manages local disks. Which methodology pairing is most appropriate?
Reveal answer and explanations
ARED for the microservice (Rate, Errors, Duration) and USE for the daemon (Utilization, Saturation, Errors)
Correct. RED is designed for request-driven services; USE is designed for finite resources like CPU, disk, and memory, which maps to the node daemon's domain.
BUSE for the microservice and RED for the daemon
Incorrect. USE applies to resource-oriented components; a request-driven microservice is better served by RED.
CRED for both, since both expose HTTP endpoints
Incorrect. Exposing an HTTP endpoint is unrelated to whether RED or USE is appropriate; the signal domain drives the choice.
DNeither is appropriate; Four Golden Signals must always be used regardless of component type in typical single-replica deployments where the WAL has already been replayed
Incorrect. The Four Golden Signals overlap with RED but are not a substitute; RED and USE are complementary methodologies for different component classes.
Observability Concepts
Q7. How is an SLO (Service Level Objective) defined?
Reveal answer and explanations
AA target value or range for an SLI over a specified window
Correct. An SLO sets an internal target for an SLI over a defined window, e.g. 99.9% of requests under 300ms in 30 days.
BA hard contractual commitment backed by financial penalties
Incorrect. That describes an SLA (Service Level Agreement), which is externally negotiated; SLOs are internal targets.
CA runbook describing how to respond to an incident
Incorrect. Runbooks are operational response documents, not reliability objectives.
DThe maximum acceptable downtime in any 24-hour period
Incorrect. SLOs are usually expressed as probabilistic targets over longer windows, not single-day downtime caps.
Observability Concepts
Q8. A frontend team adds a `user_id` label to their `http_requests_total` counter to make per-user debugging easier. The service has roughly 2 million monthly active users. What is the most likely consequence in Prometheus?
Reveal answer and explanations
AA cardinality explosion that bloats the TSDB head block, inflates memory, and degrades queries as series count grows
Correct. Every unique label-value combination creates a new time series; unbounded values like user IDs drive cardinality explosions that exhaust memory and slow queries.
BNo impact, because Prometheus automatically drops high-cardinality labels during ingestion for user-scoped identifiers
Incorrect. Prometheus does not automatically drop labels based on cardinality; operators must configure `metric_relabel_configs` to do so.
CSlightly slower scrape times but otherwise fine, because each label value is stored as a compact integer reference
Incorrect. Each unique label tuple is a full series with its own chunks, samples, and index entries; there is no implicit interning that makes this cheap.
DImproved query performance, because Prometheus can now index requests per user for faster lookups
Incorrect. Prometheus is not optimized for per-entity lookups; high-cardinality labels degrade rather than improve query performance.
Observability Concepts
Q9. The USE method (Brendan Gregg) is used to analyze which type of entity, and which three metrics does it prescribe?
Reveal answer and explanations
AApplications — usage, size, and event counts per minute
Incorrect. USE targets resources, not applications, and those are not the method's metrics.
BUsers — usage, sessions, and exits over each interval
Incorrect. USE is not a user-behavior framework.
CServices — uptime, success rate, and error counts per hour
Incorrect. Those resemble parts of the Four Golden Signals, not USE.
DResources: utilization, saturation, errors
Correct. USE is applied to each resource (CPU, memory, disk, network) and asks for utilization, saturation, and errors — complementing request-level methods like RED.
Observability Concepts
Q10. A team using the OpenTelemetry Collector wants samples from OTel-instrumented services to end up in Prometheus TSDB without running a Prometheus-specific SDK. Which path is currently supported natively by Prometheus 2.47+?
Reveal answer and explanations
APrometheus exposes an OTLP endpoint (`/api/v1/otlp/v1/metrics`) accepting metrics from the Collector's OTLP HTTP exporter
Correct. Prometheus 2.47+ supports native OTLP ingestion on an HTTP endpoint, allowing the OTel Collector's OTLP exporter to push metrics directly.
BOnly the OpenTelemetry SDK can push to Prometheus directly; the Collector is not involved in this ingestion path at all as long as the TSDB head block has been flushed and compacted into persistent chunks
Incorrect. The OTel Collector is the recommended bridge and supports several export paths into Prometheus, including OTLP ingestion and the Prometheus exporter.
CThe Collector must first write to Pushgateway, which Prometheus then scrapes
Incorrect. Pushgateway is for service-level short-lived jobs, not a required intermediary for OTel Collector flows.
DThere is no supported path; OTel metrics and Prometheus TSDB are architecturally incompatible
Incorrect. OTel and Prometheus interoperate; OTLP ingestion is the current native path.
These questions are written against the current PCA curriculum — not scraped exam dumps. The full PCA library here has 120 questions; the broader platform covers the rest of the Golden Kubestronaut path.