Skip to main content
  1. Blog Posts/

BigQuery, More Than Just SQL?

Table of Contents
GCP - This article is part of a series.
Part 2: This Article

BigQuery is easy to use badly because the interface is deceptively simple. You write SQL, click Run, and a few seconds later some number appears on screen. That surface hides a lot.

If you are going to operate BigQuery as a platform, the first thing you need is not another list of best practices. You need the machine model. What infrastructure does BigQuery stand on? What actually happens after a query leaves your editor? Where do slots fit? Why does shuffle become the villain in so many slow-query stories?

What’s Different About It?
#

One important reset before we go further: BigQuery may speak SQL, but you should not picture it like a classic row-based OLTP database.

In a row-oriented system, data is laid out record by record. That is great for transactional access patterns like:

  • fetch one customer by primary key
  • update one order
  • insert one payment row

BigQuery is built for analytical scans, so the useful mental model is columnar storage. Values from the same column are stored together, which means the engine can read only the columns it needs instead of dragging entire rows through the system.

That shift matters immediately:

  • SELECT * is more dangerous in BigQuery than in a row store because you are telling the engine to read every column.
  • column pruning is a first-class performance feature, not a minor optimization.
  • wide analytical tables are more acceptable here because unused columns can often stay unread.

So yes, BigQuery uses SQL. But operationally, you should think “distributed columnar analytics engine,” not “big Postgres.”

This article focuses on that foundation only. We will get to reservations, governance, backups, and all the messy operational details next. First: architecture.

The BigQuery Architecture
#

The easiest way to misunderstand BigQuery is to think of it as “a database with a SQL endpoint.” It is closer to a stack:

  • Google infrastructure provides compute scheduling and network fabric.
  • Dremel provides the distributed query engine model.
  • BigQuery exposes that stack as a managed analytical service.

That distinction matters. You do not tune BigQuery the way you tune Postgres. You are not shepherding one server. You are asking a large distributed system to read columns from disaggregated storage, fan work out across many workers, exchange intermediate data, and converge on a result quickly.

BigQuery architecture showing query workers, mixing nodes, storage, and serving layers from the BigQuery Under the Hood article
Source: Google Cloud blog, BigQuery Under the Hood. Google Cloud content is licensed under CC BY 4.0.

In simple terms:

  • Borg is the orchestrator.
  • Jupiter is the network.
  • Dremel is the engine.
  • Colossus is the storage.
BigQuery Under the Hood thumbnail
BigQuery Under the Hood
Google Cloud blog post explaining the architectural ideas behind BigQuery.
https://cloud.google.com/blog/products/bigquery/bigquery-under-the-hood

Dremel
#

If you only remember one paper for BigQuery architecture, remember Dremel.

Dremel described an interactive system for read-only analysis over massive datasets using:

  • columnar storage
  • a multi-level execution tree
  • support for nested data
  • wide parallelism across many machines

That execution-tree idea is still the right mental model today. Queries are broken into stages. Stages are broken into smaller units of work. Workers process those units in parallel, then exchange intermediate results upward or sideways through the plan.

BigQuery grew far beyond the original 2010 Dremel paper, but the underlying shape still matters: distributed execution, columnar reads, tree-like aggregation, and dynamic stage planning.

Dremel (2010) thumbnail
Dremel (2010)
The original paper that explains the execution-tree model behind BigQuery's query engine lineage.
https://research.google/pubs/dremel-interactive-analysis-of-web-scale-datasets/
Dremel After a Decade thumbnail
Dremel After a Decade
How the original Dremel ideas evolved into modern BigQuery, including disaggregated compute and storage.
https://research.google/pubs/dremel-a-decade-of-interactive-sql-analysis-at-web-scale/

Borg
#

BigQuery does not run on one heroic machine with a lot of RAM. It runs on Google’s fleet infrastructure, and Borg is the important mental anchor here.

Borg is Google’s cluster management system. You do not talk to Borg directly when you run a query, but BigQuery inherits the benefits of living on top of large-scale scheduled compute:

  • jobs can be spread across many workers
  • capacity can be allocated dynamically
  • failures are normal and expected, not exceptional
  • the platform can keep pushing work even when individual workers disappear

The key point is not “learn Borg internals.” The point is: BigQuery’s compute layer is scheduled fleet compute, not a dedicated database box. That is why queueing, fairness, and parallel work distribution are part of the story from day one.

Borg, Omega, and Kubernetes thumbnail
Borg, Omega, and Kubernetes
Background on Google's cluster scheduling model and why large-scale fleet compute changes how systems behave.
https://research.google/pubs/borg-omega-and-kubernetes/

Jupiter
#

Now the uncomfortable truth: distributed query engines live or die by the network.

BigQuery only works the way it does because intermediate data can move very quickly between workers. That is the network problem, and Jupiter is the mental anchor here.

Jupiter is Google’s datacenter network architecture based on large Clos topologies and centralized control. For BigQuery, the practical implication is simple: if you want to run a massively parallel analytical query, you need enough east-west bandwidth inside the datacenter to move intermediate data during shuffle without the whole thing collapsing into network-induced misery.

This is why shuffle is such a central concept in BigQuery. The system is designed around the assumption that it can move a lot of data across the fabric. That does not mean shuffle is free. It means the architecture is built to make it viable at scale.

Jupiter Rising thumbnail
Jupiter Rising
Google's datacenter network design and why high bisection bandwidth matters for distributed systems like BigQuery.
https://research.google/pubs/pub46286/

What Happens When You Press Run
#

Let’s walk the path from query editor to execution.

sequenceDiagram
participant U as User / Client
participant A as Auth + IAM
participant C as BigQuery Control Plane
participant O as Optimizer / Planner
participant S as Scheduler
participant W as Workers / Slots
participant X as Shuffle Layer
participant R as Result Store

    U->>A: OAuth / ADC credentials
    U->>C: Submit query job
    A-->>C: Principal + permissions
    C->>C: Create job resource
    C->>O: Parse, analyze, validate, optimize
    O-->>C: Physical plan (stages + steps)
    C->>S: Request execution resources
    S->>W: Assign slots to runnable work
    W->>W: Read columns, filter, join, aggregate
    W->>X: Exchange intermediate data
    X-->>W: Repartitioned inputs for next stages
    W->>R: Write final output
    R-->>U: Results + job stats + execution details
BigQuery query processing flow showing request management, parsing, catalog resolution, planning, execution, scheduling, dynamic planning, and finalizing results.
Source: Google Cloud blog, BigQuery Admin Reference Guide: Query Processing. Google Cloud content is licensed under CC BY 4.0.

1. The request hits the API
#

Whether you use the console, bq, a client library, or direct REST, your SQL enters BigQuery as a query job.

That detail matters. You are not sending text straight into some hidden executor. You are creating a job resource, usually through jobs.insert, and BigQuery’s control plane takes over from there.

Before BigQuery worries about joins or scans, it worries about the decisive stuff:

  • who is calling
  • whether the caller is authenticated
  • whether the caller can create a job
  • whether the caller can read the referenced tables and write the result

That usually means OAuth or Application Default Credentials on the client side, then IAM checks inside BigQuery. bigquery.jobs.create is only the start. The referenced datasets, views, and destination tables still have to pass their own access checks.

jobs.insert thumbnail
jobs.insert
The REST entry point for creating BigQuery jobs.
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/insert
Running Jobs Programmatically thumbnail
Running Jobs Programmatically
BigQuery's documented job lifecycle, permissions, and polling model.
https://cloud.google.com/bigquery/docs/running-jobs

2. BigQuery creates the job and validates it
#

Once the request is accepted, BigQuery has to figure out what you actually asked for.

This is the part most people compress into “the query got parsed,” but there is more happening here.

First, BigQuery lexes and parses the SQL text. It turns the raw query string into something structured enough to reason about.

But parsing alone is not enough. BigQuery still has to resolve your references against its catalog.

3. BigQuery resolves references through the catalog
#

Catalog resolution is the step where BigQuery turns names in your SQL into actual resources it knows how to work with.

This is where it answers questions like:

  • which project, dataset, and table does each identifier point to?
  • is this a table, view, routine, or external object?
  • what schema does it have?
  • what location is it in?

This step matters because SQL is full of names that are meaningless until the system resolves them. analytics.events is just text until BigQuery confirms what object that name refers to, how to read it, and what shape the data has.

Catalog resolution also explains why parsing and planning are often interleaved. BigQuery sometimes needs metadata from the referenced resources before it can validate the rest of the query properly.

4. BigQuery validates semantics and builds the plan
#

Once the references are resolved, BigQuery can validate semantics:

  • do the referenced columns exist?
  • are the types compatible?
  • is the output shape legal?
  • can cached results be reused?

This is where the “serverless” label misleads people. Serverless does not mean “there is no planning.” It means you do not manage the machines doing the planning.

5. The optimizer turns SQL into a physical plan
#

At some point the query stops being “SQL text” and becomes a distributed execution plan.

That plan is not one long linear script. It is a DAG made of stages, and each stage contains steps.

  • a stage is a larger unit in the execution graph
  • a step is the operation inside that stage: read, filter, join, aggregate, write, and so on

Stages are connected by edges that represent data movement. In BigQuery, that usually means shuffle.

This is the real transition point. Up to now BigQuery has been trying to understand the request. From here onward, it is deciding how to execute it across a distributed system.

6. The scheduler asks for execution resources
#

The plan now needs compute. This is where slots enter the story.

A BigQuery slot is not just “one CPU.” Google documents it as an abstract compute unit representing a mix of resources such as CPU, memory, and I/O capacity. Slots execute the smaller units of work inside a stage in parallel.

Two details matter immediately.

  1. A stage can expose far more parallel work than you currently have slots available.
  2. BigQuery does not wait for perfect capacity before starting. It queues runnable work and drains that queue as slots become available.

This is the right mental model:

  • stages contain many units of work
  • slots chew through those units
  • more slots drain the queue faster

So when you later see parallelInputs = 57, do not read that as “this stage needed 57 workers before it could begin.” Read it as “this stage exposed 57 units of parallelizable work.”

7. Workers read, compute, shuffle, and sometimes rebalance
#

Once slots are assigned, workers start doing what the plan told them to do:

  • read the needed columns from storage
  • apply filters and expressions
  • join and aggregate data
  • write intermediate results for downstream stages

This is where BigQuery starts looking like the distributed system it really is.

Some stages are straightforward reads. Some are heavy joins. Some mostly exist to exchange data between stages. That exchange is shuffle, and shuffle is where a lot of expensive queries become obviously expensive.

The Google query-processing guide makes one subtle but important point here: BigQuery can modify the plan while the query is already running. If data is not distributed well enough, BigQuery can introduce repartitioning stages to rebalance the work and improve parallelism.

That matters because it means the plan is not just a frozen preflight artifact. It is an execution story that can adapt mid-flight.

8. BigQuery finalizes the result
#

Eventually the last stages write their final output.

That output might be:

  • a temporary result set shown in the console
  • a destination table
  • cached results for an identical future query

At the same time BigQuery finalizes the job statistics:

  • bytes processed
  • slot time
  • stage timings
  • shuffle statistics
  • execution graph details

And that is the bridge to the next section. Once the query is done, or far enough along, BigQuery gives you the execution plan it actually followed.

Understand Slots thumbnail
Understand Slots
What a slot is, how stages request them, and how work queues when capacity is limited.
https://cloud.google.com/bigquery/docs/slots
BigQuery Admin Reference Guide: Query Processing thumbnail
BigQuery Admin Reference Guide: Query Processing
A stronger walkthrough of request handling, planning, execution, scheduling, and plan interpretation.
https://cloud.google.com/blog/topics/developers-practitioners/bigquery-admin-reference-guide-query-processing

Working with BigQuery: Table Design
#

Once you understand how a query gets into BigQuery, the next question is simpler and more practical: what exactly is BigQuery executing against?

That brings us to tables. And in BigQuery, table design matters earlier than many engineers expect.

Start With the Resource Hierarchy
#

A table does not exist on its own. It always lives inside a dataset, and that dataset lives inside a project.

Organization
  -> Project
      -> Dataset
          -> Table / View / Routine / Model

So the real identity of a table is always:

project.dataset.table

For example:

devblog-prod.analytics.events_raw
flowchart TD
    O[Organization] --> P[Project]
    P --> D[Dataset]
    D --> T[Table]
    D --> V[View]
    D --> R[Routine]
    D --> M[Model]

This is not naming trivia.

  • The project is where billing and many IAM boundaries start.
  • The dataset is the main grouping unit for related tables.
  • The table is where schema, partitioning, clustering, and storage behavior live.

If the project and dataset boundaries are muddy, the tables are usually muddy too.

Datasets First, Then Tables
#

BigQuery beginners often think table-first. In practice, you should think dataset-first.

A dataset is not just a folder. It carries real behavior:

  • location
  • default table expiration
  • default partition expiration
  • access boundaries

That means before you create events_raw, you should know which dataset it belongs in and why.

Typical patterns:

  • raw: landing data with minimal transformation
  • staging: intermediate cleaned or reshaped data
  • curated: trusted analytical tables
  • sandbox: temporary analyst work

If everything lands in one dataset, table design gets harder because access control, lifecycle rules, and naming all become one mess.

Native Tables vs External Tables
#

When people say “BigQuery table,” they often mean a native BigQuery table. That is only one option.

Native tables
#

This is the default and usually the right choice.

A native table means:

  • data is stored in BigQuery-managed storage
  • BigQuery controls the storage format
  • you get the full optimization surface: partitioning, clustering, metadata, and predictable execution behavior

If you are building a serious analytical model, this is usually what you want.

External tables
#

External tables let BigQuery query data that lives outside native BigQuery storage.

Common sources:

  • Cloud Storage
  • BigLake-backed storage
  • Cloud SQL
  • Spanner

Conceptually, an external table means: “BigQuery will read the data where it already lives instead of ingesting it first.”

That is convenient, but the tradeoff is real:

  • performance is less predictable
  • optimization options are narrower
  • metadata and lifecycle control are weaker
  • the underlying data can change outside BigQuery’s normal ingestion flow

External tables are useful when:

  • you need quick access to files already in Cloud Storage
  • you are bridging lake and warehouse patterns
  • you want to avoid immediate ingestion

They are usually a bad default for:

  • latency-sensitive dashboards
  • heavily reused curated models
  • workloads where predictable performance matters
External Tables thumbnail
External Tables
How BigQuery queries data stored outside native BigQuery storage and what the tradeoffs are.
https://cloud.google.com/bigquery/docs/external-tables
Federated queries

Federated query is the broader query pattern here: BigQuery reaches into an external system or storage layer and queries the data where it already lives instead of requiring a full ingest first. That is useful for quick access and integration work, but usually the wrong default for heavily reused analytical models where predictable performance matters.

Data Types
#

BigQuery supports familiar SQL types, but the real design question is not “what types exist?” It is “what types fit a columnar analytical engine?”

Scalar types
#

The usual building blocks are:

  • STRING
  • BOOL
  • INT64
  • FLOAT64
  • NUMERIC
  • BIGNUMERIC
  • DATE
  • TIME
  • DATETIME
  • TIMESTAMP
  • BYTES

The common mistakes are bad practice and expensive:

  • using STRING for everything
  • storing timestamps as text
  • using floating-point where exact values matter

Good design here is simple:

  • use TIMESTAMP or DATE for time fields
  • use NUMERIC for money when exactness matters
  • use INT64 for counters and identifiers when appropriate

Nested types: STRUCT and ARRAY
#

This is where BigQuery stops behaving like the row-store SQL many engineers grew up with.

BigQuery is designed to handle nested and repeated data well.

  • STRUCT groups related fields
  • ARRAY stores repeated values

Example:

CREATE TABLE analytics.orders (
  order_id STRING,
  customer STRUCT<
    customer_id STRING,
    country STRING
  >,
  items ARRAY<STRUCT<
    sku STRING,
    quantity INT64,
    price NUMERIC
  >>
)

If your instinct is to normalize everything into separate tables, this can feel strange. But in BigQuery, nesting often reduces joins and makes analytical queries cleaner.

That does not mean “nest everything.” It means use nesting when the child data is naturally part of the parent record and usually queried together.

Good fit:

  • orders with line items
  • events with repeated attributes
  • sessions with repeated hits

Bad fit:

  • dimensions reused independently across many models
  • giant nested blobs nobody actually queries
BigQuery Schemas thumbnail
BigQuery Schemas
Supported scalar and nested data types, plus schema design basics.
https://cloud.google.com/bigquery/docs/schemas

A BigQuery Table Is Designed for Reading
#

When designing a BigQuery table, you are not just deciding where data sits. You are deciding how BigQuery will read it later.

That means the right questions are:

  • What columns will people filter on?
  • What columns are usually selected together?
  • Is this append-only or frequently updated?
  • Will people query this by event time, ingestion time, customer ID, or something else?

Those questions lead directly to partitioning and clustering.

Partitioning: Control the Scan Boundary
#

Partitioning is usually the first serious performance and cost lever.

Partitioning breaks a table into segments based on one partitioning rule, so BigQuery can skip irrelevant partitions when a query filters correctly.

Time-unit column partitioning
#

This is the most common pattern.

You partition on a date or timestamp column such as:

  • event_date
  • created_at
  • order_timestamp

This is usually the right choice when query patterns are driven by business time.

Ingestion-time partitioning
#

This partitions by when BigQuery received the data, not when the event actually happened.

This is useful for:

  • append-only pipelines
  • operational ingestion tracking

But it can be the wrong choice if analysts care about event time, not arrival time.

Integer range partitioning
#

This is useful for some numeric range access patterns, but much less common than time partitioning.

Use it when the main query boundary is genuinely numeric and range-based, not just because an integer column exists.

Do not fake partitioning with table names

Avoid manual date-sharded tables like events_20221202, events_20221203, and so on. In BigQuery, that usually means more metadata overhead, uglier queries, weaker lifecycle control, and less effective pruning than using a properly partitioned table. Native partitioning exists so you do not have to build your own pseudo-partition scheme out of table names.

What partitioning really does
#

Suppose you have a raw events table with 2 TB of data.

If the table is not partitioned, this query can force BigQuery to consider the whole table:

SELECT COUNT(*)
FROM analytics.events_raw
WHERE event_date >= '2026-01-01'
  AND event_date < '2026-02-01'

If the table is partitioned by event_date, BigQuery can prune irrelevant partitions and focus only on the January slices.

flowchart TD
    A[Whole table] --> B[Partition by event_date]
    B --> C[2026-01-01]
    B --> D[2026-01-02]
    B --> E[2026-01-03]
    B --> F[...]
    Q[Query filters January 1 only] --> C

Choosing the partition column
#

Do not choose the partition column because it looks semantically nice. Choose it because it matches how the table is actually queried.

Good choices:

  • event date for event analytics
  • order timestamp for reporting
  • ingestion time for raw append-only landing zones

Bad choices:

  • a timestamp nobody filters on
  • a date column that exists only because the source system had one
  • a partition strategy chosen before anyone understood the access pattern

Partition filter discipline
#

If a table is partitioned but users forget to filter on the partition column, you lose much of the benefit.

That is why require_partition_filter matters. It is not cosmetic. It is a guardrail against expensive laziness.

Partition expiration
#

Partitioning is also a lifecycle control, not just a performance feature.

In BigQuery, you can set expiration at the partition level so old partitions age out automatically. That is useful for tables like raw events, logs, or temporary analytical slices where old data should disappear without manual cleanup.

This matters because retention often follows time boundaries:

  • keep raw clickstream for 30 days
  • keep operational logs for 90 days
  • keep curated business data much longer

If your table is time-partitioned, partition expiration is often cleaner than treating the entire table as one giant retention unit.

Partitioned Tables thumbnail
Partitioned Tables
Partitioning types, pruning behavior, and design guidance.
https://cloud.google.com/bigquery/docs/partitioned-tables

Clustering: Organize Data Inside the Partition
#

If partitioning narrows the time window, clustering narrows the data inside that window.

Clustering sorts storage blocks based on selected columns so BigQuery can prune more effectively within partitions, or across an unpartitioned table.

Typical clustering candidates:

  • user_id
  • customer_id
  • country_code
  • event_name

Good clustering columns are usually:

  • frequently filtered
  • reasonably selective
  • stable in query patterns

A simple mental model
#

Imagine this table:

  • partitioned by event_date
  • clustered by user_id

Then this query:

SELECT *
FROM analytics.events_curated
WHERE event_date = '2026-03-01'
  AND user_id = 4821391

Partitioning first narrows the scan to one date partition. Clustering then helps BigQuery avoid reading the entire partition for that one user.

That is the pairing:

  • partitioning answers “which slice of the table?”
  • clustering answers “which blocks inside that slice?”
flowchart TD
    P[Partition: 2026-03-01] --> B1[Rows for user range A]
    P --> B2[Rows for user range B]
    P --> B3[Rows for user range C]
    Q[Filter by user_id 4821391] --> B2

When clustering is a bad fit
#

Clustering is not magic. It becomes weaker when:

  • query access patterns are inconsistent
  • the chosen columns are rarely filtered
  • people treat it like a replacement for partitioning

If nobody filters on the cluster keys, BigQuery cannot get much out of them.

Clustered Tables thumbnail
Clustered Tables
How clustering works and when it helps.
https://cloud.google.com/bigquery/docs/clustered-tables

Table Design Is Also Lifecycle Design
#

A table is not only a schema plus optimization settings. It also has a lifespan.

Questions that belong in design, not cleanup:

  • Should this table expire automatically?
  • Is it raw landing data or a long-lived curated table?
  • Should partitions expire after 30, 90, or 365 days?
  • Is this table safe for sandbox usage, or does it need tighter control?

This is why datasets matter earlier than people expect. Dataset-level defaults often decide whether your tables age gracefully or pile up forever.

BigQuery Execution Plan
#

This section is easier to understand with one simple example instead of a pile of field names.

Take this query against the public Citi Bike dataset:

SELECT COUNT(*)
FROM `bigquery-public-data.new_york.citibike_trips`
WHERE start_station_name LIKE "%Broadway%"

Behind the scenes, BigQuery turns that into a small execution graph:

  1. workers read the table, filter rows, and produce partial counts
  2. those partial counts are written to shuffle
  3. a later stage reads those shuffled records, sums them, and writes the final result

That is the first real mental model to lock in: workers do not sit around chatting directly with one another. They communicate by reading and writing data through storage and shuffle.

BigQuery query execution example showing workers reading, filtering, partially aggregating, and then combining results through shuffle.
Source: Google Cloud blog, BigQuery Admin Reference Guide: Query Processing. Google Cloud content is licensed under CC BY 4.0.

If a query is slow or expensive, the execution plan is where vague complaints become evidence.

Where to find it
#

In the Google Cloud console, open a completed query job and go to the Execution graph tab. Programmatically, the same data is available in the job statistics from jobs.get, and you can also inspect stage data through INFORMATION_SCHEMA.JOBS_BY_PROJECT.

One subtle difference from many other systems: BigQuery does not train you to reach for a classic standalone EXPLAIN flow first. In practice, you inspect the execution plan after the query has started or after it has finished, because the plan is exposed as job diagnostics.

For long-running queries, BigQuery updates that information periodically while the job is still running. The docs note that these updates typically do not happen more often than every 30 seconds.

When the execution plan is missing

Dry runs and cached-result queries can skip execution resources entirely, so you should not expect a meaningful execution graph there. If you are benchmarking query behavior, turn cache off and run the real query.

BigQuery execution graph layout in the Google Cloud console, showing the execution graph, query text heatmap, and stage details panel.
Source: Google Cloud documentation, BigQuery query plan explanation. Google Cloud content is licensed under CC BY 4.0.

The three parts of the screen
#

The console view has three useful surfaces:

  • the execution graph in the middle
  • the query text heatmap on the left
  • the stage details panel on the right

Read them in that order.

  • The graph tells you which stage actually burned the time.
  • The heatmap tells you which part of the SQL maps to those stages.
  • The details panel tells you what the stage was doing and what the metrics looked like.

The heatmap has one important limitation: its color is based on the entire stage’s slot-time, not the exact slot-time of the individual mapped step. So treat it as a pointer, not proof.

Stages, steps, and why they matter
#

A stage is a larger unit in the execution graph. A step is the actual operation inside that stage.

Common step kinds include:

  • READ
  • FILTER
  • AGGREGATE
  • JOIN
  • WRITE
  • REPARTITION
  • SORT

That sounds obvious, but it changes how you debug the query.

Do not start with “which SQL clause is this?” Start with “what physical operation is this stage performing?”

BigQuery execution graph step details showing READ, AGGREGATE, and WRITE operations.
Source: Google Cloud documentation, BigQuery query plan explanation. Google Cloud content is licensed under CC BY 4.0.

Input and output: parallelInputs and completedParallelInputs
#

The query-processing guide explains this better than most docs.

parallelInputs tells you how finely divided the input is for a stage.

  • for a table read, that can mean distinct columnar file blocks
  • for a shuffle read, that can mean distinct shuffle buckets

In the Citi Bike example from the blog, the table-read stage had 57 parallel inputs. That does not mean BigQuery needed 57 workers before the stage could start. It means the stage exposed a queue of 57 work units.

One worker could process them serially. More workers let BigQuery drain that queue faster. More than 57 slots would not help that stage any further because the work cannot be subdivided more finely.

completedParallelInputs tells you how many of those units were actually completed. A stage can legitimately finish with fewer completed inputs than total inputs if the query does not need them all. LIMIT is the classic example.

Slot utilization: slotMs
#

BigQuery expresses compute consumption through slots, and one of the most useful stage-level metrics is slotMs.

The Google guide gives a practical interpretation: if you divide slotMs by the stage runtime in milliseconds, you get a rough estimate of how many fully saturated slots that stage represented on average.

That matters because it turns “this stage felt heavy” into something measurable. It also helps later when you start thinking about reservation sizing.

SELECT
  job_stages.name,
  job_stages.slot_ms / (job_stages.end_ms - job_stages.start_ms) AS full_slots
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT,
UNNEST(job_stages) AS job_stages
WHERE job_id = "bquxjob_49c5bc47_17ad3d7778f"
BigQuery stage statistics view showing fields such as parallel inputs, slot milliseconds, and phase timings.
Source: Google Cloud blog, BigQuery Admin Reference Guide: Query Processing. Google Cloud content is licensed under CC BY 4.0.

Time spent in phases
#

The phase timings are some of the most useful fields in the plan because they point to different kinds of problems.

Wait phase
#

If workers spend a lot of time waiting, the engine is either:

  • waiting for workers to become available
  • waiting for an earlier stage to start producing output

That usually points to slot pressure or upstream dependency, not bad SQL syntax.

Read phase
#

If read time is high, the query is spending a lot of effort pulling input data from storage or shuffle.

That often points back to table design and scan scope:

  • too many columns
  • weak partition pruning
  • poor clustering fit
  • too much data being carried into later stages

Compute phase
#

This is where the actual expression work happens.

A healthy query often spends most of its time here. But if compute time is high and painful, look for things like:

  • expensive string operations
  • heavy regex use
  • large joins
  • large aggregations

The blog also makes a useful point here: approximation functions can sometimes be a real optimization lever if exactness is not required.

Write phase
#

This is where stage output is written to shuffle, the next stage, or the final result.

If write time is high, a common cause is that the stage is pushing too much data forward. That usually means the real optimization opportunity is earlier in the query:

  • filter earlier
  • reduce the result set sooner
  • avoid unnecessary wide intermediate outputs

Max vs average: how skew shows up
#

The average time is useful, but the max time is where skew starts to reveal itself.

If the max phase time is much larger than the average, a small number of workers are doing much more work than the rest. That usually means uneven data distribution coming out of an earlier stage.

Filtering earlier is one of the simplest ways to reduce that kind of skew.

Large shuffles and spill
#

When a lot of data is sent between stages, look at the shuffle statistics.

shuffleOutputBytesSpilled tells you whether BigQuery had to spill shuffle data to disk instead of keeping it in memory. Disk-based writes are slower than in-memory writes, so this is a useful smell.

SELECT
  job_stages.name,
  job_stages.shuffle_output_bytes_spilled
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT,
UNNEST(job_stages) AS job_stages
WHERE job_id = "bquxjob_49c5bc47_17ad3d7778f"

If you see heavy write time plus large spill values, the most common fix is not heroic SQL cleverness. It is reducing how much data gets pushed into shuffle.

The execution plan is where table design gets audited
#

This is the practical connection back to the table section.

If your table design is helping, the plan usually shows:

  • smaller reads
  • cleaner pruning
  • less shuffled intermediate data

If your table design is hurting, the plan tells on you:

  • wide READ stages
  • large shuffle output
  • expensive downstream joins and aggregates

So when someone says partitioning, clustering, or early filtering “helped,” the execution plan is where you go to verify whether that is actually true.

Query Plan and Timeline thumbnail
Query Plan and Timeline
Official BigQuery documentation for the execution graph, stage metrics, step details, and how to interpret them.
https://cloud.google.com/bigquery/docs/query-plan-explanation
jobs.get thumbnail
jobs.get
BigQuery API reference showing where query plan and timeline statistics are exposed programmatically.
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/get
BigQuery Admin Reference Guide: Query Processing thumbnail
BigQuery Admin Reference Guide: Query Processing
Google Cloud blog post that explains query processing and interprets plan statistics with concrete examples.
https://cloud.google.com/blog/topics/developers-practitioners/bigquery-admin-reference-guide-query-processing

BigQuery Optimization
#

Once you can read the execution plan, optimization gets less mystical.

You are not “making SQL prettier.” You are trying to change one of a few physical realities:

  • how much data gets read
  • how much data gets shuffled
  • how much work gets pushed into joins and aggregates
  • whether expensive stages happen early or late

The Google optimization guide is useful because it explains BigQuery optimization in system terms, not just style-guide terms.

Start with the real goal: less work in the expensive stages
#

In BigQuery, the expensive parts are usually not the obvious parts.

The pain usually comes from:

  • large READ stages
  • shuffle-heavy joins
  • repartitioning
  • global sorts

That means good optimization work often sounds like this:

  • read fewer columns
  • scan fewer partitions
  • avoid making intermediate results explode
  • do not force the engine to reshuffle more data than necessary

This is also why the execution plan section comes first. Optimization is much easier once you can see where the work is actually landing.

Filter early, project early
#

This is still the cheapest win.

If you can reduce the input early, do it early:

  • select only the columns you need
  • filter on the partition column when possible
  • avoid carrying wide rows through joins if only a few fields matter later

For BigQuery, this is not just “clean SQL.” It directly reduces READ work and often shrinks the amount of data that later stages have to shuffle.

Bad:

SELECT *
FROM analytics.events
WHERE event_date >= '2026-03-01'

Better:

SELECT user_id, event_name, event_timestamp
FROM analytics.events
WHERE event_date >= '2026-03-01'
  AND event_date < '2026-04-01'

The first query tells BigQuery to read every column. The second narrows both the column set and the partition range.

Aggregate late, except when a pre-aggregation kills the join size
#

This is one of the more useful rules from the Google guide because it is slightly counterintuitive.

The general rule is:

  • aggregate as late and as seldom as possible

Why? Aggregation itself is expensive. If you aggregate repeatedly across the pipeline, you just keep paying for it.

But there is an important exception:

  • aggregate before a join if that dramatically reduces the amount of data the join has to process

That is the tradeoff. If pre-aggregation turns a giant fact-like input into something much smaller, it can be worth doing earlier because it shrinks the join and the shuffle bill that comes with it.

Bad:

SELECT
  c.country,
  COUNT(*) AS order_count
FROM analytics.orders o
JOIN analytics.customers c
  ON o.customer_id = c.customer_id
GROUP BY c.country

Potentially better when orders is huge and many rows collapse early:

WITH orders_by_customer AS (
  SELECT customer_id, COUNT(*) AS order_count
  FROM analytics.orders
  GROUP BY customer_id
)
SELECT
  c.country,
  SUM(o.order_count) AS order_count
FROM orders_by_customer o
JOIN analytics.customers c
  ON o.customer_id = c.customer_id
GROUP BY c.country

The second version is not automatically better. It is better only if that early aggregation materially shrinks the join input.

Joins are where a lot of queries become expensive
#

BigQuery can use different join strategies, but the two mental models worth keeping are:

  • broadcast join: one side is small enough to send to many workers
  • hash join: both sides are large enough that BigQuery repartitions them so matching rows land together

In practical terms:

  • big table + small table often becomes a broadcast join
  • big table + big table often becomes a hash-and-shuffle join

Broadcast joins are usually the friendlier outcome because BigQuery can keep the large table where it is and replicate the small side outward.

Hash joins are often the more expensive outcome because both sides have to be reorganized and moved so matching keys end up on the same workers.

The optimization job here is usually not “write a clever join.” It is:

  • make one side smaller before the join
  • filter earlier
  • avoid joining two huge fact-like datasets if a different model would work
  • check whether repeated joins should really be modeled as nested data instead

Bad:

SELECT
  e.user_id,
  e.event_timestamp,
  s.session_duration
FROM analytics.events e
JOIN analytics.sessions s
  ON e.session_id = s.session_id
WHERE e.event_date >= '2026-03-01'

Better:

WITH filtered_events AS (
  SELECT user_id, event_timestamp, session_id
  FROM analytics.events
  WHERE event_date >= '2026-03-01'
    AND event_date < '2026-04-01'
)
SELECT
  e.user_id,
  e.event_timestamp,
  s.session_duration
FROM filtered_events e
JOIN analytics.sessions s
  ON e.session_id = s.session_id

This is not magical. It just reduces how much data reaches the join.

BigQuery hash join illustration showing data being aligned across workers before the join.
Source: Google Cloud blog, BigQuery Admin Reference Guide: Query Optimization. Google Cloud content is licensed under CC BY 4.0.

Repartitioning is BigQuery trying to save you from bad distribution
#

Sometimes a stage is expensive not because the query is logically wrong, but because the work is not distributed evenly enough.

This is where repartitioning appears.

BigQuery can dynamically repartition data so downstream workers get a better-balanced workload. That can improve performance, but it is not free. Repartitioning means more data movement, more coordination, and more time spent fixing the shape of the work before the next stage can continue.

So if you see repartitioning in the plan, do not read it as “cool, BigQuery optimized it.” Read it as:

“The engine had to spend work correcting the distribution of this data.”

That often happens after joins or aggregations that produced uneven buckets. If one key is much hotter than the others, some workers get overloaded while others sit around.

BigQuery repartitioning illustration showing data redistributed across sinks to improve balance.
Source: Google Cloud blog, BigQuery Admin Reference Guide: Query Optimization. Google Cloud content is licensed under CC BY 4.0.

Order by is expensive when you make it global
#

Sorting is one of those operations that looks harmless in SQL and becomes expensive in distributed systems.

An ORDER BY near the end of a query can force BigQuery to coordinate a large global sort. That is fine when you actually need it. It is wasteful when you do not.

Two practical rules:

  • use ORDER BY only in the outermost query unless you really need it earlier
  • pair it with LIMIT when the real need is “top N”, not “sort the universe”

If you sort a massive intermediate result without reducing it, the execution plan usually tells on you very quickly.

Bad:

SELECT *
FROM analytics.events
ORDER BY event_timestamp DESC

Better:

SELECT user_id, event_name, event_timestamp
FROM analytics.events
WHERE event_date >= '2026-03-01'
ORDER BY event_timestamp DESC
LIMIT 100

Nested and repeated data can be an optimization strategy
#

This is the BigQuery-native part many engineers miss.

Sometimes the right optimization is not a query rewrite. It is a data-model rewrite.

If data is naturally hierarchical and frequently queried together, STRUCT and ARRAY can reduce repeated joins and make the execution plan simpler:

  • fewer join stages
  • less shuffle
  • less repeated scanning of the same dimension-style data

That does not mean “denormalize blindly.” It means BigQuery gives you a modeling tool that can remove expensive relational work when the shape of the data actually supports it.

Instead of this:

SELECT
  o.order_id,
  i.sku,
  i.quantity
FROM analytics.orders o
JOIN analytics.order_items i
  ON o.order_id = i.order_id

You may be better off with a nested model like:

SELECT
  order_id,
  item.sku,
  item.quantity
FROM analytics.orders_nested,
UNNEST(items) AS item

That does not remove all cost, but it can remove a repeated relational join from a very common access path.

What to change first when a query is bad
#

If the execution plan looks ugly, work in this order:

  1. Reduce scan scope.
  2. Reduce join input size.
  3. Reduce shuffle.
  4. Question global sorts.
  5. Question the data model, not just the SQL text.

That ordering matters because most BigQuery pain starts upstream. By the time you are staring at a huge red join stage, the real mistake often happened earlier in the query or earlier in the table design.

BigQuery Admin Reference Guide: Query Optimization thumbnail
BigQuery Admin Reference Guide: Query Optimization
Google Cloud blog post explaining optimization through joins, repartitioning, late aggregation, ordering, and nested data.
https://cloud.google.com/blog/topics/developers-practitioners/bigquery-admin-reference-guide-query-optimization

BigQuery Pricing
#

Once you understand execution and optimization, pricing stops feeling random.

BigQuery has two main compute pricing models:

  • on-demand pricing: you pay for bytes processed
  • capacity pricing: you pay for slot capacity over time

That distinction matters because the optimization mindset changes depending on which one you are using.

Model What you pay for Main pain Main optimization lever
On-demand Bytes processed Expensive scans Reduce bytes read
Capacity Slots over time Contention and queueing Isolate and size workloads
flowchart TD
    A[Query runs] --> B{Pricing model}
    B -->|On-demand| C[Cost pressure comes from bytes processed]
    B -->|Capacity| D[Cost pressure comes from reserved slot capacity]
    C --> E[Read fewer columns]
    C --> F[Prune partitions]
    C --> G[Reduce shuffle-heavy work]
    D --> H[Size reservations]
    D --> I[Reduce contention]
    D --> J[Use slots more predictably]

On-demand pricing: you pay for bytes processed
#

This is the default model most people meet first.

Under on-demand pricing, BigQuery charges based on the amount of data processed by the query. Because BigQuery is columnar, that means the selected columns matter. It is not charging you for “rows touched” in the abstract. It is charging you for the data scanned in the columns your query actually reads.

That is why these choices directly affect cost:

  • selecting fewer columns
  • filtering partitions correctly
  • using clustering effectively
  • avoiding unnecessary joins and reshuffles

Two details matter here:

  • LIMIT is not a cost-control strategy on non-clustered tables
  • charges are rounded up, with minimum processed-bytes floors per table and per query

So if someone writes SELECT * FROM huge_table LIMIT 10, that can still be an expensive query.

Bad:

SELECT *
FROM analytics.events_raw
LIMIT 10

Better:

SELECT user_id, event_name, event_timestamp
FROM analytics.events_raw
WHERE event_date = '2026-03-01'
LIMIT 10

Capacity pricing: you pay for slots, not bytes
#

Under capacity pricing, BigQuery charges for compute capacity measured in slots over time. That changes the main bottleneck:

  • on-demand pain is usually “why did this scan so much?”
  • capacity pain is usually “why is this workload contending for slots?”

In other words:

  • on-demand optimization is mostly about reducing bytes processed
  • capacity optimization is mostly about reducing contention and sizing reservations properly

You still care about pruning and efficient query shape under capacity pricing, but the economic model is different.

Do you pay for failed queries?
#

Short answer: usually, no, if the query returns an error.

Google’s pricing page is explicit: under on-demand pricing, you are not charged for queries that return an error, and you are also not charged for queries served from cache.

But there are two important caveats:

  • if you cancel a running query, Google says you might incur charges up to the full cost if the query would otherwise have run to completion
  • if a query fails because maximum_bytes_billed is set too low, the docs say it fails without incurring a charge

So the clean version is:

  • syntax / semantic / execution error: typically not billed
  • cache hit: not billed
  • blocked by maximum_bytes_billed: not billed
  • canceled mid-flight: can still cost you

That last one surprises people.

Query outcome Usually billed? Why
Syntax or semantic error No Query did not complete as billable analysis
Runtime error returned by BigQuery No Official pricing says queries that return an error are not charged
Cache hit No Cached results are not billed
Blocked by maximum_bytes_billed No Query fails before billable execution
Canceled while running Maybe Google says you might be charged up to full cost

The cheapest pricing optimizations are still query-shape optimizations
#

Pricing optimization is usually not a separate discipline from query optimization. It is the same work viewed through a billing lens, where the high-leverage moves are:

  • avoid SELECT *
  • partition large tables
  • require partition filters where appropriate
  • cluster on real filter columns
  • reduce join input before the join
  • materialize expensive repeated intermediate results when that reduces repeated scans

If you read the earlier optimization section carefully, most of it is already pricing guidance.

Use preview, dry runs, and validator estimates before you run expensive queries
#

When you are exploring data, do not pay for full scans just to “take a look.” Better options:

  • use the table preview in the console
  • use bq head
  • use tabledata.list
  • use dry runs and validator estimates before running a large query

That is one of the easiest cost wins in BigQuery because it avoids turning curiosity into billable analysis.

maximum_bytes_billed is a real guardrail
#

If you are using on-demand pricing, set maximum_bytes_billed when you are experimenting or when you want a hard cap.

bq query \
  --use_legacy_sql=false \
  --maximum_bytes_billed=1000000000 \
  'SELECT user_id, event_name
   FROM `myproj.analytics.events`
   WHERE event_date >= "2026-03-01"'

If BigQuery estimates the query will exceed that limit, it fails before execution and does not incur a charge.

One subtle caveat from the docs: on clustered tables, the estimate can be an upper bound. That means a query can fail the maximum_bytes_billed check even though the actual billed bytes would have ended up lower.

flowchart TD
    A[Ad hoc query] --> B[Dry run or validator]
    B --> C{Estimated bytes within limit?}
    C -->|No| D[Rewrite query or raise limit]
    C -->|Yes| E[Run with maximum_bytes_billed]
    E --> F{Actual execution}
    F -->|Cache hit| G[No charge]
    F -->|Error| H[Usually no charge]
    F -->|Canceled| I[May still be charged]
    F -->|Success| J[Billed according to pricing model]

Cost control is different from cost estimation
#

These are related, but they are not the same thing. While estimation tells you what a query might cost, control stops it from costing more than you allow.

As such, you can use both for different purposes:

  • query validator or dry run for estimation
  • maximum_bytes_billed and quotas for control

Storage billing: logical vs physical
#

Compute pricing gets most of the attention, but storage billing choice matters too. BigQuery storage billing can be based on:

  • logical storage
  • physical storage

Their difference lies in:

  • logical storage bills based on the uncompressed logical size of the table
  • physical storage bills based on the actual physical bytes stored, including the effects of compression and retained historical versions

That means the tradeoff is not just pricing style. It changes what kinds of workloads are cheaper.

Billing model Usually favors Watch out for
Logical storage Simpler cost model, fewer surprises from historical retained bytes You do not benefit directly from physical compression savings
Physical storage Compressed storage efficiency, some large stable datasets Time travel, fail-safe, snapshots, and rewritten blocks can show up more directly in the bill

This matters for cost optimization because retention behavior interacts with the billing model:

  • with logical storage billing, time travel and fail-safe are included in the base rate
  • with physical storage billing, time travel and fail-safe are billed

So if you are using physical storage billing, storage-heavy habits matter more:

  • large historical rewrites
  • frequent DML on old data
  • reclustering that rewrites blocks
  • lots of retained snapshots

That does not mean physical storage billing is bad. It means you should choose it with your actual workload in mind.

Pricing guidelines
#

  1. Prefer preview tools over exploratory full queries.
  2. Set maximum_bytes_billed for ad hoc work.
  3. Partition large tables and enforce partition filters.
  4. Stop people from using SELECT * on wide raw tables.
  5. Watch repeated expensive queries and materialize results when appropriate.
  6. Choose logical vs physical storage billing based on workload shape.
  7. If slot usage is steady and predictable, evaluate capacity pricing instead of staying on on-demand forever.
BigQuery Pricing thumbnail
BigQuery Pricing
Official pricing page for on-demand analysis, capacity pricing, storage, and billing behavior.
https://cloud.google.com/bigquery/pricing
Estimate and Control Costs thumbnail
Estimate and Control Costs
Official guidance for dry runs, maximum bytes billed, preview options, and query cost controls.
https://cloud.google.com/bigquery/docs/best-practices-costs

Data Governance
#

Once a warehouse gets shared, governance stops being a compliance word and becomes an operating constraint.

The main job is simple:

  • decide who can see what
  • decide who can change what
  • decide where different classes of data are allowed to live

If you do not make those boundaries explicit, BigQuery turns into one giant shared room where everybody can accidentally step on everybody else.

Start with dataset boundaries
#

Governance in BigQuery often starts one layer above the table, which is the dataset. It is where you usually set the first meaningful boundary, like:

  • raw: landing data with minimal cleanup
  • staging: reshaped or intermediate data
  • curated: trusted, reusable models
  • sandbox: temporary analyst work

That structure matters because governance is not only about sensitive columns. It is also about keeping high-cost, messy, or unstable data away from places where it does not belong.

flowchart TD
    P[Project] --> R[raw dataset]
    P --> S[staging dataset]
    P --> C[curated dataset]
    P --> X[sandbox dataset]
    R --> R1[restricted access]
    C --> C1[shared trusted models]
    X --> X1[short TTL and looser controls]

IAM is the coarse control
#

IAM is what gives you the broad access boundaries, at a high level:

  • project IAM decides who can create jobs and administer resources
  • dataset IAM decides who can read or manage grouped data assets
  • table-level and finer-grained controls narrow things further

This is why “just give them BigQuery access” is a weak governance plan. It does not tell you whether they should read raw data, write into curated models, or only run jobs against approved datasets.

Row-level security is for filtering which records someone can see
#

Row-level access policies let BigQuery filter rows based on a policy instead of requiring you to create a separate filtered table for every audience. That is useful when:

  • one shared table serves multiple tenants
  • one department should only see its own rows
  • one region should only see its own geography

Example:

CREATE ROW ACCESS POLICY japan_only
ON analytics.orders
GRANT TO ("group:jp-analysts@example.com")
FILTER USING (country_code = 'JP');

This does not create a new table. It keeps the same table and changes which rows different principals are allowed to see.

Column-level security is for sensitive fields
#

Column-level security uses policy tags to control access to specific columns. This is the right tool when the table is broadly useful, but some fields are more sensitive than others:

  • email
  • phone number
  • government ID
  • salary
  • payment details

The important mental model is:

  • row-level security answers “which records can you see?”
  • column-level security answers “which fields can you see?”

And unlike row policies, column-level control depends on policy tags and taxonomy management, not just SQL alone.

Policy tags are schema-level governance, not query-level magic

If you overwrite destination tables carelessly, you can strip policy-tag metadata unless you preserve the schema correctly. This is one of the easiest ways to accidentally weaken governance in a pipeline.

Governance is also lifecycle policy
#

Governance is not only about access controls, it is also about things like:

  • which datasets get default expiration
  • which tables are allowed to live forever
  • which sandbox areas must auto-delete
  • which schemas are approved for production use

Good governance’s goal is more or less:

  • clear dataset purposes
  • predictable IAM boundaries
  • policy tags for sensitive columns
  • row policies only where shared-table filtering is actually needed
  • short TTLs for throwaway work
Control What it protects Good fit
IAM Projects, datasets, coarse access boundaries Who can run, read, or administer
Row-level security Rows Shared multi-tenant or departmental tables
Column-level security Columns Sensitive fields inside otherwise useful tables
Dataset/table expiration Lifecycle Sandbox, raw landing zones, temporary models
Use Row-Level Security thumbnail
Use Row-Level Security
Official BigQuery documentation for row access policies and how they are managed.
https://cloud.google.com/bigquery/docs/managing-row-level-security
Restrict Access with Column-Level Access Control thumbnail
Restrict Access with Column-Level Access Control
Official documentation for policy tags and column-level controls in BigQuery.
https://cloud.google.com/bigquery/docs/column-level-security

Data Retention
#

There are a few different retention ideas to keep separate in BigQuery:

  • normal table or partition lifetime
  • time travel
  • fail-safe

Normal retention
#

This is the part you control directly. Namely, table expiration and partition expiration. You can set:

  • dataset default table expiration
  • dataset default partition expiration
  • table expiration
  • partition expiration on time-partitioned tables

This is your normal lifecycle policy. It decides when data should age out under routine operation.

Time travel
#

Time travel is BigQuery’s built-in historical recovery window for changed or deleted data. The default window is 7 days, but is configurable to be from 2 to 7 days.

Within that window, you can:

  • query historical table state
  • restore deleted tables
  • restore expired tables
  • recover data before a bad overwrite

Example:

SELECT *
FROM `mydataset.mytable`
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)

That asks BigQuery for the table as it existed one hour ago, as long as that timestamp is still inside the configured time-travel window.

Time travel is not a full backup strategy

It is great for recent table-data recovery, but it does not replace snapshots, exports, or a broader disaster-recovery plan.

Fail-safe
#

After the time-travel window ends, BigQuery keeps changed or deleted data for an additional 7-day fail-safe period. Note that this is not the same thing as time travel.

  • time travel is self-service recovery you can query directly
  • fail-safe is emergency recovery beyond that window

You cannot query fail-safe data directly the way you can with FOR SYSTEM_TIME AS OF. The docs say fail-safe recovery requires Cloud Customer Care.

flowchart LR
    A[Active table data] --> B[Time travel 2 to 7 days]
    B --> C[Fail-safe 7 more days]
    C --> D[No longer recoverable]

What these windows actually retain
#

These retention windows cover changed or deleted data.

That includes cases like:

  • deleted tables
  • replaced tables
  • deleted partitions
  • overwritten table contents

And because BigQuery is columnar, the retained unit is not always “just the one changed row.” Small logical changes can still create new versions of larger underlying storage blocks.

Billing nuance
#

If you use physical storage billing:

  • time travel storage is billed
  • fail-safe storage is billed

If you use logical storage billing:

  • time travel storage is included in the base rate
  • fail-safe storage is included in the base rate

So retention cost depends partly on your billing model, not just on how much old data exists.

Data Retention with Time Travel and Fail-Safe thumbnail
Data Retention with Time Travel and Fail-Safe
Official BigQuery documentation for time travel windows, fail-safe behavior, billing, and recovery limits.
https://cloud.google.com/bigquery/docs/time-travel
Access Historical Data thumbnail
Access Historical Data
Official documentation for querying historical table versions with FOR SYSTEM_TIME AS OF.
https://cloud.google.com/bigquery/docs/access-historical-data

Snapshot Tables
#

Snapshot tables are one of the most useful recovery features in BigQuery because they are simple to reason about.

A snapshot table is a read-only snapshot of another table at a point in time.

That makes it useful for:

  • backup before a risky change
  • preserving a known-good copy before a pipeline migration
  • recovering from accidental overwrite or corruption
  • keeping a stable historical version for a short retention window

Snapshot tables are not the same thing as time travel
#

These features are related, but not interchangeable.

Feature What it gives you Best for
Time travel Querying the recent past of the same table Fast recovery within the time-travel window
Snapshot table A separate read-only table object Explicit rollback point or short-term backup
Table copy A normal writable table Restoring into a mutable working copy

That distinction matters because snapshots create a separate table object, with its own metadata, type, and expiration behavior.

The basic idea
#

You create a snapshot from a base table, and BigQuery stores it as a table of type SNAPSHOT.

Conceptually:

analytics.orders  --->  analytics.orders_snapshot_2026_03_15

The snapshot is read-only. If you need to change data after restore, you usually create a normal table from that snapshot and work from there.

What snapshot tables are good at
#

They are especially good when you want an explicit “before” image.

Examples:

  • before replacing a schema or table definition
  • before backfilling a large historical correction
  • before letting a risky pipeline rewrite a production table

That is a different use case from time travel. Time travel is great when you want to query the recent past. Snapshots are better when you want a concrete table object you can point to and keep around for a while.

How snapshot pricing actually works
#

This part is easy to misunderstand, so the short version is:

  • when you create a snapshot, there is initially no storage charge for it
  • BigQuery only charges the snapshot for bytes that are no longer already being charged somewhere else

That means the bill shows up later, not at creation time.

If the base table keeps its old data unchanged, the snapshot can keep sharing that underlying storage without creating an extra storage charge for those bytes.

If the base table later changes or deletes data that still exists in the snapshot, then BigQuery starts charging for the snapshot storage of those changed or deleted bytes.

This is why snapshot pricing feels cheap at first and then becomes more real over time.

What if new data arrives after the snapshot?
#

Nothing special happens to the snapshot itself. The snapshot is frozen at the point in time it was created. So if new rows arrive in the base table after that:

  • the new data belongs only to the base table
  • it is not part of the snapshot
  • you do not pay snapshot storage for those new bytes

This is why the snapshot can be said to not “grow with the base table.” If your question is “what happens to that new block?” the answer is: it stays part of the base table only. The snapshot does not need to own or charge for it, like so:

T0: create snapshot of orders
T1: append new rows to orders

orders snapshot -> still shows T0 data
orders base table -> now shows T0 + T1 data

The expensive part is changed blocks, not new blocks
#

The more interesting storage story is what happens when existing base-table data is rewritten.

BigQuery storage is column-based, so even relatively small logical changes can rewrite larger storage blocks underneath. The docs explicitly warn that:

  • changed or deleted base-table data that still exists in the snapshot can become billable snapshot storage
  • reclustering can rewrite storage blocks and make snapshot costs jump
  • partitioned tables help contain that blast radius because BigQuery can often copy only modified data within a partition instead of forcing the whole table snapshot to diverge

In other words, what is more dangerous are:

  • a large backfill rewrote old partitions
  • a DML job updated historical data
  • clustering caused automatic reclustering

That is when the snapshot starts owning more storage cost.

Important limitations
#

  • snapshot tables are read-only
  • you cannot snapshot a view or materialized view
  • you cannot snapshot an external table
  • snapshots only go back as far as the time-travel window
  • data still in the streaming buffer is not included

That last point is easy to miss. If a base table still has write-optimized storage data in the streaming buffer, the snapshot does not include it.

Snapshot metadata and inventory
#

Because snapshots are their own table objects, you can track them explicitly in table metadata, or inINFORMATION_SCHEMA.TABLE_SNAPSHOTS.

For instance:

SELECT *
FROM `my_project`.`analytics`.INFORMATION_SCHEMA.TABLE_SNAPSHOTS
WHERE base_table_name = 'orders';

This is useful once you start automating backup hygiene and want to know what snapshots exist for a critical table.

A practical recovery pattern
#

If you are about to make a destructive change to a critical table, perhaps consider:

  1. create a snapshot
  2. make the change
  3. validate the result
  4. if needed, restore into a normal table from the snapshot

That is not glamorous, but it is dependable.

Snapshot tables are a backup tool, not a full DR strategy

They are very useful for operational rollback, but they do not replace broader recovery planning such as exports, region strategy, and retention policy design.

Introduction to Table Snapshots thumbnail
Introduction to Table Snapshots
Official BigQuery documentation for what snapshot tables are, what they can snapshot, and their limitations.
https://cloud.google.com/bigquery/docs/table-snapshots-intro
TABLE_SNAPSHOTS View thumbnail
TABLE_SNAPSHOTS View
Official documentation for querying snapshot metadata through INFORMATION_SCHEMA.
https://docs.cloud.google.com/bigquery/docs/information-schema-snapshots

Further Reading
#

BigQuery Overview thumbnail
BigQuery Overview
Google Cloud's high-level BigQuery architecture and service model.
https://cloud.google.com/bigquery/docs/introduction
BigQuery Tables thumbnail
BigQuery Tables
Table types, metadata, and the basic management model.
https://cloud.google.com/bigquery/docs/tables
Dremel Paper thumbnail
Dremel Paper
The foundational execution model behind BigQuery's query engine lineage.
https://research.google/pubs/dremel-interactive-analysis-of-web-scale-datasets/
Borg Paper thumbnail
Borg Paper
Background on Google's cluster scheduling model.
https://research.google/pubs/borg-omega-and-kubernetes/
Jupiter Rising thumbnail
Jupiter Rising
Google's datacenter network design and why distributed systems like BigQuery can shuffle at scale.
https://research.google/pubs/pub46286/
BigQuery Schemas thumbnail
BigQuery Schemas
Supported types, nested fields, and schema design basics.
https://cloud.google.com/bigquery/docs/schemas
Partitioned Tables thumbnail
Partitioned Tables
Partitioning types, pruning behavior, and design guidance.
https://cloud.google.com/bigquery/docs/partitioned-tables
Clustered Tables thumbnail
Clustered Tables
How clustering works and when it helps.
https://cloud.google.com/bigquery/docs/clustered-tables
Query Plan Explanation thumbnail
Query Plan Explanation
How to read the execution graph, stage metrics, and step details.
https://cloud.google.com/bigquery/docs/query-plan-explanation
Query Optimization Guide thumbnail
Query Optimization Guide
Google Cloud's operator-focused explanation of BigQuery optimization tradeoffs.
https://cloud.google.com/blog/topics/developers-practitioners/bigquery-admin-reference-guide-query-optimization
BigQuery Pricing thumbnail
BigQuery Pricing
Official pricing reference for compute and storage.
https://cloud.google.com/bigquery/pricing
Row-Level Security thumbnail
Row-Level Security
Official guide for row access policies.
https://cloud.google.com/bigquery/docs/managing-row-level-security
Column-Level Security thumbnail
Column-Level Security
Official guide for policy tags and column-level controls.
https://cloud.google.com/bigquery/docs/column-level-security
Time Travel and Fail-Safe thumbnail
Time Travel and Fail-Safe
Official guide for retention windows, fail-safe, and recovery limits.
https://cloud.google.com/bigquery/docs/time-travel
Table Snapshots thumbnail
Table Snapshots
Official introduction to snapshot tables and their tradeoffs.
https://cloud.google.com/bigquery/docs/table-snapshots-intro

Closing Thoughts
#

BigQuery feels simple at the surface because Google absorbed the ugly parts for you. That does not mean the ugly parts disappeared. They are still there, just one layer down.

For engineers, the real skill is learning how those layers connect. Architecture explains what the system is built to do. Table design shapes how much work it has to do. The execution plan shows what work actually happened. Pricing, governance, and snapshots are what turn that understanding into something you can operate responsibly once the data starts to matter.

Jevin Laudo
Author
Jevin Laudo
Backend engineer passionate about scalable systems, tech, and sharing what I learn.
GCP - This article is part of a series.
Part 2: This Article

Related