Q1. The Four Golden Signals from the Google SRE book are which four metrics?
Reveal answer and explanations
ACPU, memory, disk, and network
Incorrect. These are system resources, not the SRE Golden Signals.
BAvailability, latency, throughput, and cost
Incorrect. Although related to reliability, these are not the canonical Golden Signals.
CRate, errors, duration, and utilization
Incorrect. This mixes RED and USE; it is not the Golden Signals definition.
DLatency, traffic, errors, and saturation
Correct. The Four Golden Signals are latency, traffic, errors, and saturation — the baseline signals to monitor for any user-facing service.
Prometheus Fundamentals
Q2. What is the primary purpose of Prometheus `remote_write`?
Reveal answer and explanations
ATo allow an operator to write ad-hoc samples via HTTP POST
Incorrect. `remote_write` is not an interactive ingestion API; the Pushgateway is used for short-lived job ingestion.
BTo replicate TSDB blocks to a standby Prometheus for disaster recovery at block granularity continuously
Incorrect. It is not block-level replication; it forwards samples on ingest.
CTo stream samples to long-term storage like Thanos or Cortex
Correct. `remote_write` forwards incoming samples over HTTP to a compatible backend, enabling long-term storage, global querying, or downsampling outside the local TSDB.
DTo expose a writable `/metrics` endpoint so other services can push values over HTTP continuously in bulk
Incorrect. `/metrics` is an exposition endpoint; remote_write is an egress mechanism.
PromQL
Q3. In PromQL, what is the difference between `sum by(service) (rate(http_requests_total[5m]))` and `sum without(service) (rate(http_requests_total[5m]))`?
Reveal answer and explanations
A`by(service)` keeps only the `service` label and aggregates over the rest; `without(service)` drops `service` and keeps every other label
Correct. `by(L)` retains the listed labels as the grouping key; `without(L)` removes the listed labels and groups by all remaining labels.
BBoth clauses produce identical results because aggregation grouping is symmetric
Incorrect. The two clauses are inverses, not equivalents, and produce different output label sets.
C`by(service)` filters series that have the `service` label; `without(service)` filters series that lack it
Incorrect. Neither clause filters by label presence; both define the grouping key for aggregation.
D`by(service)` sums per service-and-pod combination, while `without(service)` sums across all pods and services
Incorrect. `by(service)` ignores the pod label entirely, collapsing across pods rather than splitting per pod.
Instrumentation and Exporters
Q4. Which of the following labels is the most dangerous from a cardinality perspective and should generally be avoided in instrumentation?
Reveal answer and explanations
A`method` with values like `GET`, `POST`, `PUT`, `DELETE`
Incorrect. HTTP methods are low-cardinality (a handful of values); safe to label on.
B`status_code` using HTTP status classes like `2xx`, `4xx`, `5xx`
Incorrect. Status classes are bounded and low-cardinality.
C`environment` only
Incorrect. Environment values are typically a small fixed set.
D`user_id` per-request
Correct. Unbounded identifiers like `user_id`, `request_id`, or full URLs explode series count, inflate memory, and can destabilize Prometheus — instrument with bounded labels only.
Alerting & Dashboarding
Q5. How do multiple Alertmanager replicas coordinate so that each notification is normally sent only once in a highly-available deployment?
Reveal answer and explanations
AA Raft-based leader picks one primary node
Incorrect. Alertmanager HA is not leader-based; Raft is used by other systems.
BThey share a Redis cluster that stores dispatch state, acquiring distributed locks per alert fingerprint before sending notifications
Incorrect. No external datastore like Redis is required or used.
CGossip mesh: peers share state, single delivery
Correct. Alertmanager peers connect via `--cluster.peer` and share sent-notification state over a gossip protocol; each peer waits a position-based delay and skips sending if a peer already did, so under normal operation only one peer notifies each receiver (best-effort, not strict exactly-once across partitions).
DHA is not supported; you must run a single replica and fail over manually via a DNS change during any outage of the active node itself
Incorrect. HA via clustered gossip is explicitly supported and recommended for production.
90 minutes, multi-choice format. See the official CNCF page for the current question count.
How difficult is the PCA exam?
Rated intermediate. Plan 2–8 weeks depending on your background.
How much does the PCA exam cost?
Pricing changes periodically — check the official CNCF PCA page at https://www.cncf.io/training/certification/pca/.
Are these PCA mock exams free?
Sample questions on this page are free with no account. Full timed PCA mocks require a paid plan.
How is this mock exam different from the real PCA exam?
Original questions written against the official CNCF curriculum — not scraped dumps. Format mirrors the real exam; the real one is proctored, these are self-paced.
What is the best way to study for PCA?
Work through the official curriculum in order of domain weight (heaviest first), then run full timed mocks until you hit 85%+ consistently.