Fabric capacity SKUs double in cost at every tier. F64 to F128 is a 100% price increase. There is no fine-grained middle ground — if your current SKU cannot keep up, your only option is to pay twice as much.
What many teams do not realise is that poor Delta table maintenance silently inflates capacity consumption over time. As tables accumulate small files, fragmentation builds. Queries that should scan 10 files scan 100. Compaction jobs that should take minutes take hours. Power BI reports that should load from warm cache fall back to cold-state transcoding because deletion vectors have built up and were never cleared. The Spark pool is working harder than it needs to — not because you have added more data or more users, but because the platform is carrying the weight of months of deferred maintenance.
The insidious part is that this degradation is gradual. Creeping capacity usage is easy to attribute to organic growth. By the time you notice, you are staring at a SKU upgrade — when the real fix might have been four lines of Spark configuration and a weekly OPTIMIZE job.
There is a second problem layered on top of this: until recently, there was almost no official Microsoft documentation on Fabric table maintenance. Practitioners were piecing together guidance from community blogs, Databricks documentation, and occasional posts from Microsoft employees on LinkedIn. That has changed — but the documentation is now spread across Microsoft Learn pages, official Fabric team blog posts, personal blogs from Microsoft engineers, and release note archives. Most practitioners have not read all of it.
This guide brings it together. It covers what the documentation now tells us, where to find it, and what your Fabric Spark notebooks should actually look like in 2026.
The Defaults That Catch Practitioners Out
Fabric's Apache Spark runtime ships with defaults that are not widely known — and if you are not aware of them, you may be doing redundant work, paying unnecessary overhead, or missing out on optimisations that are already running. Two in particular catch people out.
V-Order is off by default in Fabric.
spark.sql.parquet.vorder.default = false
This wasn't always the case — historically, V-Order was enabled by default and many existing workspaces may still have this. For all new workspaces, V-Order defaults to false. V-Order applies VertiPaq-compatible encoding to Parquet files at write time, delivering 40–60% cold-cache query improvements for Power BI Direct Lake and approximately 10% improvement for the SQL Analytics Endpoint. The trade-off is 15–33% additional write time. It is off by default for new Fabric workspaces on Runtime 1.3 — you need to explicitly enable it at the layers where you actually have Direct Lake or SQL Endpoint consumers.
One important caveat: Microsoft's own documentation is inconsistent on this point. The Lakehouse and Delta Tables defaults reference page still lists V-Order as true, while the dedicated Delta Optimization and V-Order page explicitly states it is off by default for newly created workspaces and notes that earlier documentation saying otherwise is no longer accurate. I verified this directly on Runtime 1.3 — spark.conf.get("spark.sql.parquet.vorder.default") returns false. Trust the dedicated page and your own environment, not the comparison table.
Optimize Write is on by default in Fabric.
spark.databricks.delta.optimizeWrite.enabled = true
Optimize Write reshuffles in-memory data before writing Parquet files to produce fewer, larger, better-sized output files. For MERGE, UPDATE, and DELETE operations — which touch many files and create fragmentation — this is genuinely useful. For large append-only batch loads that already produce right-sized files, it adds shuffle overhead with no meaningful benefit.
Both of these are documented in Microsoft's Lakehouse and Delta Tables reference. They are easy to miss.
Your Utility Notebook in 2026
Many Fabric practitioners use a shared utility notebook called at the start of each pipeline notebook to standardise Spark session behaviour. Here is what that notebook should look like today, and why.
# Preserve exact table/column name casing — e.g. 'Bronze_TGB_GDT' not 'bronze_tgb_gdt'
spark.conf.set("spark.sql.caseSensitive", "true")
# Enable Auto-compaction — compacts small files automatically after each write
spark.conf.set("spark.databricks.delta.autoCompact.enabled", "true")
# Enable Adaptive Target File Size — adjusts compaction target based on table size and query patterns
spark.conf.set("spark.microsoft.delta.targetFileSize.adaptive.enabled", "true")
# Enable File Level Compaction Target — prevents recompaction of already-optimised files when target sizes change
spark.conf.set("spark.microsoft.delta.optimize.fileLevelTarget.enabled", "true")
# Enable Fast Optimize — skips OPTIMIZE on file groups that won't meaningfully benefit; reduces write amplification
spark.conf.set("spark.microsoft.delta.optimize.fast.enabled", "true")
# Optimize Write: set explicitly — workspace default varies with age and history; this establishes a known baseline
# Override to false in append-only batch notebooks where shuffle overhead is not justified
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "true")
# V-Order: set explicitly to off — workspace default varies with age and history; this establishes a known baseline
# Override to true at session level in Gold notebooks, or per-table via TBLPROPERTIES at Silver
spark.conf.set("spark.sql.parquet.vorder.default", "false")
Optimize Write and V-Order are both set explicitly here rather than left to workspace defaults. Fabric's documented defaults are optimizeWrite = true and vorder = false for new workspaces — but older workspaces may have different values, and relying on defaults means your notebooks behave differently depending on where they run. Setting them explicitly makes the utility notebook workspace-agnostic and self-documenting. Per-notebook overrides then apply on top of this known baseline.
Per-notebook additions
Bronze ingestion notebooks — disable Optimize Write; V-Order already off from utility
# Optimize Write: append-only batch loads produce right-sized files; shuffle overhead not needed
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "false")
# V-Order: set to false in utility notebook — no further action needed at Bronze
Gold notebooks — enable V-Order at session level
# V-Order: set to false in utility notebook; override here for consumer-facing layer
spark.conf.set("spark.sql.parquet.vorder.default", "true")
Append-only batch notebooks at Silver and Gold — disable Optimize Write
spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "false")
Silver tables feeding Direct Lake or SQL Endpoint — enable V-Order via table property
-- Persists across sessions
ALTER TABLE my_catalog.silver.my_table
SET TBLPROPERTIES ('delta.parquet.vorder.enabled' = 'true');
MERGE, UPDATE, or DELETE notebooks at any layer — no overrides needed
Both Fabric defaults are correct for these operations.
Auto-Compaction, Adaptive Target File Size, and Fast Optimize
These three settings, used together, form the foundation of a low-maintenance compaction strategy.
Auto-compaction (spark.databricks.delta.autoCompact.enabled) triggers automatically after each write commit when it detects that a partition has accumulated too many small files. It runs synchronously — in the same Spark session, immediately after the write — eliminating the need for a separate scheduled compaction job for most workloads. It is not on by default in Fabric, which is why it belongs in the utility notebook.
Adaptive Target File Size (spark.microsoft.delta.targetFileSize.adaptive.enabled) replaces static file size configuration. Rather than targeting a fixed 128 MB, it evaluates table size and adjusts the target accordingly: 128 MB for tables under 10 GB, scaling linearly up to 1 GB for tables exceeding 10 TB. Microsoft explicitly recommends enabling this, describing it as eliminating "the complexity related to manually tuning the target file size."
When both are enabled, auto-compaction uses the ATFS-calculated target, meaning inline compaction and any manual OPTIMIZE runs converge on the same optimal file size automatically.
Fast Optimize (spark.microsoft.delta.optimize.fast.enabled) is the third part of this strategy, and the one that makes running OPTIMIZE frequently practical. Before rewriting a bin of files, it evaluates whether doing so would meaningfully improve the current state — checking minimum size thresholds and whether fragmentation is sufficient to justify the operation. If a bin does not meet the bar, it is skipped entirely. The result is that a no-op OPTIMIZE run costs almost nothing: Fast Optimize short-circuits it before any rewriting occurs. Microsoft recommends enabling it, but it is off by default on Runtime 1.3. In a simulated real-world scenario, it reduced compaction time by 80% across 200 ELT cycles with no performance regression.
Two constraints worth noting: Fast Optimize does not apply to Liquid Clustering or Z-Order operations. And Auto-Compaction runs its own independent calculations — Fast Optimize's skip logic does not affect inline compaction triggered after a write.
File Level Compaction Target (spark.microsoft.delta.optimize.fileLevelTarget.enabled) is a newer addition, now officially recommended by Microsoft. It tags each compacted file with the target size at the time of compaction. When target sizes change over time — as they will with ATFS — previously compacted files are not needlessly rewritten if they already meet at least half the new target. This directly reduces write amplification. It is not on by default.
Do You Still Need a Scheduled OPTIMIZE Job?
A reasonable question at this point: if Optimize Write prevents small files being created, and Auto-Compaction cleans up any that slip through, is a scheduled OPTIMIZE job still necessary?
For Bronze-layer, append-only ingestion — mostly no. That combination handles ongoing file management continuously and a separate OPTIMIZE schedule adds little value.
For Silver and Gold layers — yes, and the reasons go beyond file fragmentation. Microsoft's official cross-workload guidance is to run scheduled OPTIMIZE aggressively at these layers. There are three reasons that Auto-Compaction and Optimize Write cannot address:
- File size targets differ by consumer. Auto-compaction defaults to 128 MB. The SQL Analytics Endpoint performs best with files around 400 MB. Power BI Direct Lake prefers 400 MB to 1 GB with row groups of 8 million or more rows. Auto-compaction alone will not get you there without additional configuration.
- Liquid Clustering requires OPTIMIZE to take effect. Regular write operations do not physically cluster data. Clustering is only applied when OPTIMIZE runs. If you have enabled liquid clustering on your Gold tables and are not running OPTIMIZE, your data is not clustered — you are getting none of the file-skipping benefits.
- Deletion vectors accumulate. Deletion vectors (covered below) record soft-deletes without rewriting Parquet files. Direct Lake must load these deletion vectors during cold-state bootstrapping, adding overhead that grows with accumulation. Regular OPTIMIZE physically resolves them.
What does "aggressively" actually mean in practice? The Microsoft documentation uses the word without defining a cadence, which is not particularly helpful. The most practical approach is to not put OPTIMIZE on a separate schedule at all — instead, run it as the last step of your pipeline, after each load completes. That ties it to the natural cadence of your data changes: daily for overnight batch loads, and after each intraday run for more frequent pipelines.
This approach also means you do not need to worry about running it "too often." With Fast Optimize enabled, if OPTIMIZE runs but the files do not genuinely need compaction, it evaluates each candidate bin and skips the ones that would not benefit — so the overhead of a no-op OPTIMIZE run is low.
For Gold tables serving Power BI Direct Lake, align the OPTIMIZE run to complete before your Power BI refresh window. This ensures deletion vectors are cleared and liquid clustering is applied before Direct Lake frames the latest commit version.
Not every table needs OPTIMIZE every run, though. Dimension tables, master data, and lookup tables may rarely change — running OPTIMIZE on them daily is wasted capacity.
The right gate is average file size versus your layer target. Before running OPTIMIZE, read the table's metadata via DESCRIBE DETAIL — it returns the total size in bytes and the current file count, from which you can calculate the average file size in seconds without scanning any data. If the average is already within tolerance of your target, skip it. If it is meaningfully below, run it.
At Silver, target 256 MB. At Gold, target 400 MB. With Fast Optimize enabled, once OPTIMIZE does run it handles bin-level evaluation internally — skipping individual file groups that do not genuinely need compaction. The file size check gates whether to start the command at all; Fast Optimize handles the nuance within the run.
The result is a maintenance pipeline step that runs on every execution, costs almost nothing when tables are healthy, and kicks in automatically when they are not — without any manual scheduling or intervention.
If you would like the reusable PySpark implementation of this pattern, feel free to connect with me on LinkedIn or reach out via Synechron Australia.
V-Order by Layer
Since V-Order is off by default in new Fabric workspaces, the question is not where to disable it — it is where to explicitly enable it.
| Layer | Recommendation | Reason |
|---|---|---|
| Bronze | No action needed | Off by default — raw ingestion; no Direct Lake or SQL Endpoint consumers |
| Silver | Selective | Enable via table property for tables feeding Direct Lake or SQL Endpoint; leave off for Spark-only intermediate tables |
| Gold | Enable at session level | Consumer-facing layer — V-Order's read performance gains are fully realised here |
For Gold notebooks, enable at session level:
spark.conf.set("spark.sql.parquet.vorder.default", "true")
For individual Silver tables feeding Direct Lake or SQL Endpoint:
-- Table properties persist across sessions — use for per-table decisions at Silver
ALTER TABLE my_catalog.silver.my_table
SET TBLPROPERTIES ('delta.parquet.vorder.enabled' = 'true');
Liquid Clustering Over Partitioning
I want to address something directly: most data engineers I know still default to partitioning Delta tables. What many people learned on Databricks or Hadoop feels intuitive — divide your data by date or region, and queries that filter on those columns only scan the relevant folder. In the right circumstances, it works well.
Microsoft Fabric's current documentation makes no mention of partitioning as a recommended data organisation strategy. That is not an oversight. There are real reasons why the guidance has moved on, and they are particularly relevant in Fabric.
The problems with partitioning in Fabric:
Partitioning creates a fixed physical directory structure. Choose a partition column with too many distinct values (say, a date column on a table loaded daily for five years) and you end up with thousands of small partition directories — each containing a handful of Parquet files. That is the small files problem recreated at a structural level, baked permanently into your table layout.
For Power BI Direct Lake, this is especially damaging. Direct Lake performance depends on large row groups and a small total file count. The Direct Lake documentation explicitly warns that "extensive partitioning could result in a substantially increased number of Parquet files and row groups, thereby causing an excessive increase in column segments within the Direct Lake semantic model, negatively affecting query performance." Partitioning on a high-cardinality column — even one that feels natural, like a date — can degrade the exact workload you are optimising for.
And if you later realise your partition column was the wrong choice, you cannot easily change it. Repartitioning a large table means rewriting all the data.
Liquid Clustering solves these problems:
Liquid clustering co-locates rows with similar values in the same files, enabling Delta's file-skipping to eliminate large portions of a scan when queries filter on the cluster key — the same benefit partitioning aims for, without the structural drawbacks. Critically:
- The clustering policy can be changed with
ALTER TABLEwithout rewriting all existing data - It is not tied to a fixed directory structure
- It works with Adaptive Target File Size to produce well-sized files regardless of data volume
- It supports multiple cluster columns without the ordering sensitivity of Z-Order
-- New table
CREATE TABLE my_catalog.gold.my_table (...)
CLUSTER BY (customer_id, event_date);
-- Existing table — change clustering policy without a full rewrite
ALTER TABLE my_catalog.gold.my_table
CLUSTER BY (customer_id, event_date);
For teams with existing partitioned tables: there is no need to immediately rewrite everything. Liquid clustering is the right default for new tables, and migration can happen incrementally.
A practical note on when to apply it: liquid clustering adds value when a table is large enough to span multiple Parquet files. For small static reference tables — country codes, currency codes, status types — all data fits in one or two files and there is nothing to skip. Applying clustering there is harmless but pointless. The Databricks documentation on automatic clustering states it will not select keys when "the table is too small to benefit." Note that Databricks documentation references Unity Catalog — Databricks' catalogue model — which is not the same as Microsoft Fabric Lakehouses. No equivalent Fabric-specific threshold is published, but the underlying principle is the same.
The critical operational point bears repeating: data is only physically clustered when OPTIMIZE runs. Writes alone do not cluster. This is another reason why scheduled OPTIMIZE matters for Silver and Gold — if you enable liquid clustering and never run OPTIMIZE, your data is not clustered, and you are getting none of the file-skipping benefit.
Deletion Vectors
Deletion vectors have been referenced across multiple Microsoft Fabric documentation pages, release notes, and Lakehouse connector documentation — but never in a single dedicated Fabric page. Here is what you need to know.
By default, any DELETE, UPDATE, or MERGE that modifies a row requires rewriting the entire Parquet file containing that row. Deletion vectors avoid this by recording affected rows as soft-deletes in a small sidecar file. The Parquet file is left unchanged; reads resolve the current state by applying the deletion vector. Physical rewriting is deferred until compaction.
When to enable, by layer:
- Bronze: Enable for tables with MERGE patterns
- Silver: Enable for tables with frequent updates
- Gold: Enable; manage accumulation through regular compaction
ALTER TABLE my_catalog.silver.my_table
SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);
The Direct Lake impact is the most important operational consideration. During cold-state bootstrapping, Power BI Direct Lake must load all deletion vectors for each table to correctly exclude soft-deleted rows. The more deletion vectors accumulated, the greater this overhead — resulting in slower cold-cache query performance. For Gold tables serving Direct Lake, regular OPTIMIZE to physically resolve deletion vectors is not just housekeeping; it directly affects the query experience your end users see.
Warning: Enabling deletion vectors upgrades the Delta table protocol. The table will not be readable by clients that do not support deletion vectors. Verify client compatibility before enabling on a table.
The Cheat Sheet
| Layer | Auto-Compaction | Optimize Write | V-Order | Liquid Clustering | Deletion Vectors | Scheduled OPTIMIZE |
|---|---|---|---|---|---|---|
| Bronze | Enable | Disable (append) / Default (MERGE) | No action (off by default) | No | Enable (MERGE tables) | After MERGE-heavy loads |
| Silver | Enable | Disable (append) / Default (MERGE) | Enable per-table for Direct Lake / SQL Endpoint tables | Recommended | Enable (frequent updates) | Run aggressively |
| Gold | Enable | Disable (append) / Default (MERGE) | Enable at session level | Required | Enable; minimise via OPTIMIZE | Run aggressively |
Where the Documentation Lives
For those who want to go deeper, here is where the official guidance is now spread across:
- Fabric default configurations: Lakehouse and Delta Tables
- V-Order defaults and controls: Delta Optimization and V-Order
- Auto-compaction, Fast Optimize, File Level Compaction Target: Table Compaction
- ATFS, Optimize Write, File Level Compaction Target: Tune File Size
- Cross-workload strategy, medallion recommendations: Cross-Workload Table Maintenance and Optimization
- Direct Lake and deletion vectors: Understand Direct Lake Query Performance
- Fast Optimize deep-dive: Introducing Optimised Compaction in Fabric Spark — Miles Cole, Microsoft (Oct 2025)
- Cross-workload context: Microsoft Fabric Table Maintenance Optimization: A Cross-Workload Survival Guide — Christopher Finlan, Microsoft (Feb 2026)
- Deletion vector support in connectors: Lakehouse Copy Activity
- Liquid clustering concepts: Delta Lake Liquid Clustering — Delta Lake open source (note: Databricks-specific features such as Unity Catalog thresholds do not apply to Fabric)
What to Watch: March 2026 Preview Features
The March 2026 Fabric release introduced three features directly relevant to the topics in this guide. I have not tested any of them yet — all are in Preview — but they are worth flagging.
Lakehouse Maintenance Activity in Pipelines (Preview)
There is now a native Lakehouse Maintenance activity in Data Factory Pipelines. If it holds up to scrutiny, this could simplify — or eventually replace — the Notebook activity + parameterised maintenance notebook pattern described earlier. The key questions I would want to answer before adopting it: does it support file-size gating so healthy tables are skipped? Does it respect Fast Optimize? How does it handle VACUUM retention and the Direct Lake framing requirement? Worth evaluating when it reaches GA.
SQL Endpoint Refresh Activity in Pipelines (Preview)
A dedicated pipeline activity to trigger a SQL Analytics Endpoint refresh. The current gap in most maintenance pipelines is that the Direct Lake framing step — ensuring the semantic model re-frames to the current Delta commit before VACUUM runs — relies on timing and coordination. A dedicated activity for this could make that step explicit and reliable. Again, untested.
Fabric Runtime 2.0 (Preview) — Spark 4.0, Delta Lake 4.0
Runtime 2.0 brings Apache Spark 4.0, Python 3.12, and Delta Lake 4.0. Everything in this guide — default configurations, V-Order behaviour, compaction settings — was verified against Runtime 1.3. Delta Lake 4.0 in particular may change defaults or introduce new behaviours. Verify your utility notebook config against your own environment before adopting Runtime 2.0 in production.
Final Thoughts
Microsoft Fabric's Delta table maintenance story has matured significantly in a short time. The platform has sensible defaults, powerful automatic features, and — now — a growing body of official documentation to back it up. The work is in knowing which defaults to trust, which to override, and where the gaps between the documentation and your specific workload lie.
The three things I would leave you with:
First, check your utility notebook against Fabric's actual defaults before assuming you need to set everything explicitly. You may be doing redundant work — or worse, overriding defaults that are now correct.
Second, do not skip scheduled OPTIMIZE on your Silver and Gold tables just because auto-compaction is running. Auto-compaction and scheduled OPTIMIZE solve different problems. Auto-compaction prevents fragmentation building up. Scheduled OPTIMIZE hits the file size targets your SQL and Power BI consumers actually need, physically applies liquid clustering, and clears deletion vectors before they accumulate.
Third, and most importantly: treat maintenance as a cost control measure, not just a performance one. Fabric capacity SKUs double with each tier. The difference between a well-maintained platform and a neglected one may well be the difference between your current SKU and the next one up — a 100% cost increase that no amount of additional workload justifies if the root cause is simply deferred maintenance.
Brad Coles is a Senior Consultant and Data Engineering Capability Lead at Synechron Australia, specialising in Microsoft Fabric and modern data platform engineering. Connect on LinkedIn.