Free PCA Sample Questions (10 Q&A with Explanations)

Observability Concepts

Q1. Prometheus uses which model for collecting metrics from targets?

Reveal answer and explanations

A Pull over HTTP from `/metrics` endpoints
Correct. Prometheus scrapes metrics from HTTP endpoints on targets on a configured interval; targets are passive during collection.
B Push over HTTP where targets send metrics to a central Prometheus ingestion service
Incorrect. Push is used by StatsD and Graphite models; Prometheus only supports push via the Pushgateway as an exception.
C Push over UDP via StatsD with client-side aggregation before forwarding samples
Incorrect. UDP/StatsD is a different system; Prometheus uses HTTP-based pulls.
D Streaming over gRPC with bidirectional flow control and TLS mutual authentication
Incorrect. Prometheus does not use streaming gRPC for scraping.

Observability Concepts

Q2. Which statement best captures the fundamental trade-off between metrics and events (logs) as observability signals?

Reveal answer and explanations

A Metrics are aggregated numeric samples with bounded cost per series; events preserve per-occurrence detail at higher storage cost
Correct. Metrics aggregate numerically and scale with cardinality rather than request volume, while events preserve fidelity per occurrence at higher per-item storage cost.
B Metrics preserve full per-request context while events are always sampled to reduce cost
Incorrect. Metrics discard per-occurrence detail by design; they are aggregations, not per-request records.
C Events are strictly cheaper than metrics at every scale because they compress better
Incorrect. Event storage typically grows with request volume, whereas metric storage grows with cardinality; neither is strictly cheaper at all scales.
D Metrics and events are functionally interchangeable when Prometheus is used as the backend provided that scrape jitter stays well below the configured evaluation interval window
Incorrect. Prometheus is a metrics database; it does not store arbitrary events, and the two signal types have distinct trade-offs.

Observability Concepts

Q3. The Four Golden Signals from the Google SRE book are which four metrics?

Reveal answer and explanations

A CPU, memory, disk, and network
Incorrect. These are system resources, not the SRE Golden Signals.
B Availability, latency, throughput, and cost
Incorrect. Although related to reliability, these are not the canonical Golden Signals.
C Rate, errors, duration, and utilization
Incorrect. This mixes RED and USE; it is not the Golden Signals definition.
D Latency, traffic, errors, and saturation
Correct. The Four Golden Signals are latency, traffic, errors, and saturation — the baseline signals to monitor for any user-facing service.

Observability Concepts

Q4. A service's internal metrics all look healthy, yet real users report that the public endpoint is unreachable because DNS at the edge is failing. Which monitoring approach is specifically positioned to catch this class of outage?

Reveal answer and explanations

A White-box monitoring via the application's `/metrics` endpoint, because it has the most detailed view of the instrumented process internals
Incorrect. White-box metrics cannot observe DNS, TLS handshakes, or edge reachability outside the process; they see only internal behavior.
B Black-box probing from outside the network, measuring the user's full path including DNS, TLS, and reachability
Correct. Black-box probing from an external vantage point exercises the full user-facing path, so it catches DNS, routing, or TLS issues that internal instrumentation cannot see.
C Log-based alerting on application error logs alone
Incorrect. If requests never reach the service, the application generates no logs about the failure.
D Distributed tracing of internal calls, because traces reveal all external dependencies
Incorrect. Traces originate inside the process; they do not observe DNS resolution or edge-path failures before the request arrives.

Observability Concepts

Q5. What is an error budget in SRE practice?

Reveal answer and explanations

A The monetary cost allocated for fixing bugs each quarter and each planning cycle
Incorrect. Error budgets are reliability accounting, not financial budgets.
B The number of on-call escalations allowed per engineer per week according to team policy
Incorrect. That describes on-call burden metrics, not error budgets.
C The maximum number of alerts that may fire in production without formal review board approval
Incorrect. Alert volume is a symptom of noise, not an error budget concept.
D Permitted unreliability derived from `1 - SLO`
Correct. The error budget equals `1 - SLO` applied to the measured window — it quantifies how much unreliability is acceptable before halting risky changes.

Observability Concepts

Q6. You are instrumenting a request-serving microservice and a Kubernetes node-level daemon that manages local disks. Which methodology pairing is most appropriate?

Reveal answer and explanations

A RED for the microservice (Rate, Errors, Duration) and USE for the daemon (Utilization, Saturation, Errors)
Correct. RED is designed for request-driven services; USE is designed for finite resources like CPU, disk, and memory, which maps to the node daemon's domain.
B USE for the microservice and RED for the daemon
Incorrect. USE applies to resource-oriented components; a request-driven microservice is better served by RED.
C RED for both, since both expose HTTP endpoints
Incorrect. Exposing an HTTP endpoint is unrelated to whether RED or USE is appropriate; the signal domain drives the choice.
D Neither is appropriate; Four Golden Signals must always be used regardless of component type in typical single-replica deployments where the WAL has already been replayed
Incorrect. The Four Golden Signals overlap with RED but are not a substitute; RED and USE are complementary methodologies for different component classes.

Observability Concepts

Q7. How is an SLO (Service Level Objective) defined?

Reveal answer and explanations

A A target value or range for an SLI over a specified window
Correct. An SLO sets an internal target for an SLI over a defined window, e.g. 99.9% of requests under 300ms in 30 days.
B A hard contractual commitment backed by financial penalties
Incorrect. That describes an SLA (Service Level Agreement), which is externally negotiated; SLOs are internal targets.
C A runbook describing how to respond to an incident
Incorrect. Runbooks are operational response documents, not reliability objectives.
D The maximum acceptable downtime in any 24-hour period
Incorrect. SLOs are usually expressed as probabilistic targets over longer windows, not single-day downtime caps.

Observability Concepts

Q8. A frontend team adds a `user_id` label to their `http_requests_total` counter to make per-user debugging easier. The service has roughly 2 million monthly active users. What is the most likely consequence in Prometheus?

Reveal answer and explanations

A A cardinality explosion that bloats the TSDB head block, inflates memory, and degrades queries as series count grows
Correct. Every unique label-value combination creates a new time series; unbounded values like user IDs drive cardinality explosions that exhaust memory and slow queries.
B No impact, because Prometheus automatically drops high-cardinality labels during ingestion for user-scoped identifiers
Incorrect. Prometheus does not automatically drop labels based on cardinality; operators must configure `metric_relabel_configs` to do so.
C Slightly slower scrape times but otherwise fine, because each label value is stored as a compact integer reference
Incorrect. Each unique label tuple is a full series with its own chunks, samples, and index entries; there is no implicit interning that makes this cheap.
D Improved query performance, because Prometheus can now index requests per user for faster lookups
Incorrect. Prometheus is not optimized for per-entity lookups; high-cardinality labels degrade rather than improve query performance.

Observability Concepts

Q9. The USE method (Brendan Gregg) is used to analyze which type of entity, and which three metrics does it prescribe?

Reveal answer and explanations

A Applications — usage, size, and event counts per minute
Incorrect. USE targets resources, not applications, and those are not the method's metrics.
B Users — usage, sessions, and exits over each interval
Incorrect. USE is not a user-behavior framework.
C Services — uptime, success rate, and error counts per hour
Incorrect. Those resemble parts of the Four Golden Signals, not USE.
D Resources: utilization, saturation, errors
Correct. USE is applied to each resource (CPU, memory, disk, network) and asks for utilization, saturation, and errors — complementing request-level methods like RED.

Observability Concepts

Q10. A team using the OpenTelemetry Collector wants samples from OTel-instrumented services to end up in Prometheus TSDB without running a Prometheus-specific SDK. Which path is currently supported natively by Prometheus 2.47+?

Reveal answer and explanations

A Prometheus exposes an OTLP endpoint (`/api/v1/otlp/v1/metrics`) accepting metrics from the Collector's OTLP HTTP exporter
Correct. Prometheus 2.47+ supports native OTLP ingestion on an HTTP endpoint, allowing the OTel Collector's OTLP exporter to push metrics directly.
B Only the OpenTelemetry SDK can push to Prometheus directly; the Collector is not involved in this ingestion path at all as long as the TSDB head block has been flushed and compacted into persistent chunks
Incorrect. The OTel Collector is the recommended bridge and supports several export paths into Prometheus, including OTLP ingestion and the Prometheus exporter.
C The Collector must first write to Pushgateway, which Prometheus then scrapes
Incorrect. Pushgateway is for service-level short-lived jobs, not a required intermediary for OTel Collector flows.
D There is no supported path; OTel metrics and Prometheus TSDB are architecturally incompatible
Incorrect. OTel and Prometheus interoperate; OTLP ingestion is the current native path.

Free PCA Sample Questions

About these questions