Back to Portfolio
May 2026 Data Engineering Microsoft Fabric Architecture

Microsoft Fabric Mirroring: Before You Commit

Connectors, limits, cost model, and the architectural questions that determine whether it fits.

Accuracy notice: This post was written in May 2026 and reflects the state of Microsoft Fabric mirroring at that time. Fabric is a rapidly evolving platform - features in Preview today may reach GA with changed behaviour, documented limitations may be resolved, and new connectors or capabilities may be added. Before making architectural decisions based on this post, verify current behaviour against the official Microsoft documentation. Where this post identifies gaps or limitations, check whether they have been addressed in a recent Fabric release.

Near-real-time replication at near-zero ingestion cost. No ETL pipelines to orchestrate. No ingestion compute to pay for. The pitch for Microsoft Fabric mirroring is genuinely compelling - and for a well-defined class of use cases, it delivers.

The question worth asking before you commit is not whether mirroring works. It is what kind of data layer mirroring produces, and whether that matches what your architecture actually requires.

Mirroring produces a current-state replica - a continuously updated copy of the source's present state, in Delta/Parquet format, in OneLake. It is not a historical record, a replayable event stream, or an immutable raw layer. For some architectures, that distinction is irrelevant. For others - particularly those where replayability, historical fidelity, or audit immutability matter - it is critical. This guide works through both sides.

This is a depth guide for data engineers and architects evaluating mirroring for production use. It is not a getting-started tutorial - the official documentation covers that ground. It covers connector-specific behaviour, architectural trade-offs, operational limits, and edge cases that surface in production. The appendix consolidates capability support across all ten connectors into a single reference matrix - including the gaps where Microsoft has not published documentation - so you can evaluate your specific source without cross-referencing ten separate limitations pages.


The three types of mirroring

The word "mirroring" covers three architecturally distinct mechanisms. Choosing the wrong type is the first mistake.

Database Mirroring Source Database Azure SQL · SQL Server · PG · MySQL Change Feed · CDC · Streams LogMiner · binary logs data moves OneLake Delta / Parquet — queries hit Fabric Source not queried at report time 10 native connectors Metadata Mirroring Source Catalog Azure Databricks · Dremio shortcuts created ↓ queries ↑ return to source OneLake Shortcuts — no data copied Query perf = source speed Databricks & Dremio only Open Mirroring Any Source no native connector required Custom Writer Parquet → Landing Zone Fabric merges OneLake Delta tables You control change detection Any source can feed the pipeline
Database mirroring continuously replicates data to OneLake via a native connector — queries hit Fabric, not the source. Metadata mirroring creates OneLake shortcuts that point to the source catalog; no data is copied, and queries return to the source at runtime. Open mirroring accepts Parquet files written to a landing zone by any custom writer — no native connector required.
Type Data movement Requires Best for
Database mirroring Replicated to OneLake as Delta/Parquet Native connector for your source Low-latency analytics without querying the source directly
Metadata mirroring Stays in source; OneLake shortcuts created Azure Databricks or Dremio Unified Fabric experience over data already in a supported catalog
Open mirroring You write files; Fabric merges them into Delta Any system that can write Parquet or delimited text to OneLake No native connector exists, or you need full control over change tracking

1. Database mirroring - data moves

The source database's change log (CDC, Change Tracking, or Fabric's Change Event Stream) is continuously read, and changes land as Delta/Parquet files in OneLake. The data lives in Fabric. Queries hit OneLake, not the source.

Use this when you want low-latency analytics without hitting the production database, and your source is one of the ten supported connectors: Azure SQL Database, Azure SQL Managed Instance, SQL Server, PostgreSQL, MySQL, Oracle, Snowflake, Cosmos DB, Google BigQuery, or SAP. One important exception: SAP is not a direct connection. It routes through SAP Datasphere as an intermediary, which must be separately licensed and configured.

One special case: Fabric SQL database - Microsoft's own managed SQL database within Fabric - is mirrored to OneLake automatically, with no setup, no connector configuration, and no source-side changes required. If your operational workload runs on Fabric SQL database, the analytics layer comes for free.

2. Metadata mirroring - data stays put

The source data is not copied. Instead, OneLake shortcuts are created that point to the source. Querying the mirrored item queries the source system in real time - query performance depends on the source, not OneLake.

Use this when you want the Fabric SQL analytics experience over data that already lives in a supported catalog: Azure Databricks or Dremio. A caveat applies to both: neither vendor documents the integration from their own side - it is covered only in Microsoft's documentation. Dremio carries an additional flag: Microsoft's own page marks it as Preview while the mirroring overview table does not. Treat both connectors as early-stage and validate thoroughly before committing.

3. Open mirroring - you write the data

You deliver files (Parquet or delimited text) to a OneLake landing zone in a prescribed format. Fabric's replication engine picks them up and merges them into Delta tables. No Microsoft connector required - any system that can write files can feed it.

Use this when no native connector exists for your source, or when you need full control over what is replicated and how changes are tracked. It has the steepest setup curve of the three types but the widest applicability. The partners ecosystem lists pre-built integrations for common systems.

Each connector is its own evaluation

The three-type taxonomy understates the variation within database mirroring. Each connector uses a different change capture mechanism:

SQL Change Feed Azure SQL Database Change Event Stream SQL Server 2025 CDC SQL Server 2016–2022 Logical replication PostgreSQL Binary logs MySQL LogMiner Oracle Snowflake Streams Snowflake Built-in change feed Cosmos DB

Each has different data type support, different DDL handling, different source-side prerequisites, and different maturity. Nothing generalises cleanly across connectors unless the documentation explicitly says it does. A confirmed behaviour on Azure SQL Database tells you nothing about BigQuery.

The table below summarises connector maturity and documentation coverage. The gap count reflects the number of - cells in the appendix - behaviours Microsoft has not documented. Azure SQL Database and SQL Server score 4 each due to undocumented DDL edge cases. BigQuery's 9 gaps include basic operational questions like whether ADD COLUMN is handled at all.

Connector Maturity Documentation gaps
Azure SQL Database Mature 4
Azure SQL MI Mature 2
SQL Server Mature 4
PostgreSQL Mature 1
MySQL Established 3
Cosmos DB Established 1
Snowflake Established 6
Oracle Sparse 3
BigQuery Sparse 9
SAP Sparse 7

What mirroring delivers

Where mirroring operates within its intended scope, it delivers well.

Ingestion compute is genuinely free. The replication engine runs off Fabric capacity - documented and confirmed by independent benchmarking. On an F64, practitioners report background CU usage around 3–4% for half a billion rows across 50 tables. Storage is free up to 1 TB per capacity unit. The advantage is not raw CU savings over a carefully built pipeline - it is continuous near-real-time replication at near-zero ingestion cost, a cadence that would be unworkable via scheduled pipelines on any reasonable SKU. The cost model section below covers the full breakdown, including caveats.

The Open Mirroring ingestion engine scales irrespective of capacity SKU. The ingestion engine that processes Open Mirroring landing zone files is an off-capacity service run by Microsoft. It does not matter whether you are on an F2 or an F256 - ingestion throughput is the same. One Microsoft engineer demonstrated 1.2 billion rows per minute ingestion on an F2 in a published benchmark. Two caveats: the write-side compute - generating and uploading files - is not reflected in Fabric capacity usage, and the conditions were artificial. The "free" part is Fabric's ingestion engine; whatever writes files to the landing zone bears its own cost.

Mirroring eliminates analytical query pressure on the source. For organisations with access only to a production source, a mirrored database offloads all reporting traffic to OneLake. The SQL analytics endpoint provides a T-SQL-queryable replica; Direct Lake provides a Power BI-queryable one. Neither touches the source at query time.

This works best with Azure SQL Database (Change Feed) and SQL Server 2025 (Change Event Stream), where source-side replication overhead is minimal. SQL Server 2016–2022 with CDC does add overhead - capture jobs and log space - which partially offsets the protection benefit, though analytical query traffic is still eliminated. One hard constraint: mirroring requires the primary of an Always On availability group. Connecting to a readable secondary is not supported. If your organisation already routes reporting traffic to a secondary, mirroring does not help for that path.

Cross-database queries work out of the box. The SQL analytics endpoint for each mirrored database supports cross-database queries using three-part naming, across multiple mirrored databases in the same workspace. No additional configuration required.

CI/CD is supported (GA), with one manual step — and likely a full initial load. Mirrored databases can be committed to Git and deployed through Fabric deployment pipelines. The database definition is stored as a {name}.MirroredDatabase folder containing mirroring.json. The caveat: mirroring does not start automatically after a deployment pipeline runs. You must manually start replication in each target workspace.

Deployment pipelines deploy the item definition — connection settings and table selection — not the underlying Delta data in OneLake. The target workspace receives a configured but empty mirrored database item with no existing replication state. Starting mirroring after deployment triggers a full initial load from current source state — not an incremental pickup from where the source workspace left off. This is confirmed behaviour: Microsoft's documentation states that restarting mirroring "results in all data being replicated from the start" and that "each time you stop and start, the entire table is fetched again" (confirmed in the Azure SQL MI FAQ and consistent with documented reseed behaviour across other connectors). For large databases, this means the target workspace has no data until the initial seed completes — factor this into promotion planning.

The downstream payoff: Direct Lake integration. Because mirrored data lands in OneLake as Delta tables, Power BI Semantic Models can connect via Direct Lake mode - reading Parquet files directly from OneLake without importing data or querying the SQL analytics endpoint at query time. As new changes land from the source, reports reflect them without a scheduled refresh. For large datasets, this is substantially faster than DirectQuery and avoids Import mode's storage cost.

The clearest downstream win: source-to-semantic with no transform layer. Not every reporting use case requires transforms. Operational dashboards, self-service analytics on source tables that are useful as-is, metrics that don't require heavy cross-domain joins - these can go from mirrored Delta tables directly to a Direct Lake Semantic Model. Source → mirrored database → Direct Lake Semantic Model: near-real-time, near-zero capacity cost, no Spark jobs or pipeline runs.

For use cases that fit this pattern, mirroring delivers on its promise. If your instinct is to add transform layers on top of mirrored data, ask first whether they serve a genuine reporting requirement or carry over a pipeline pattern that mirroring has made redundant.


Latency - what "near real time" actually means

The mirroring overview describes replication as "near real-time". No specific latency SLA is published.

Source commit Delta table updated SQL endpoint ready 60–90s typical (CDC-based) Replication layer +20–60s Endpoint layer Row committed at source Parquet files written to OneLake SQL analytics endpoint syncs Best-case benchmark (append-only, optimal conditions): Delta 11.3s · SQL endpoint +30–60s
Two latency layers separate a source commit from a queryable result. The replication layer (blue) is the time for the mirroring engine to write changes as Parquet files to OneLake. The endpoint layer (amber) is the additional sync time before the SQL analytics endpoint reflects those changes. For minimum latency, read directly from the Delta layer via Spark rather than via the SQL endpoint.

Microsoft describes replication latency as 15–60 seconds. For CDC-based connectors (Azure SQL, SQL Server, PostgreSQL, MySQL, Cosmos DB), practitioners report typical latency in the 60–90 second range under normal load - the lower end of Microsoft's stated range appears to reflect optimal conditions rather than typical deployments. Latency can fall further under heavy source write pressure. BigQuery uses a different change capture mechanism (Storage Write API) and may have different latency characteristics; reliable published figures are not available.

There are two distinct latency layers the documentation does not make explicit:

  1. Delta table latency - time from source change to the Delta table in OneLake reflecting it. This is what the replication engine controls.
  2. SQL analytics endpoint latency - time from Delta table update to queries via the SQL endpoint seeing new data. The SQL endpoint always lags the Delta layer.

For applications requiring the freshest possible data, read directly from the Delta layer via Spark rather than via the SQL analytics endpoint.

CDC semantics and late-arriving data. For CDC-based mirroring, watermarks are largely irrelevant - and that is an advantage. CDC captures changes in commit order at the source, regardless of business timestamps on the rows. A batch job inserting six-month-old records into the source today will be replicated when those rows commit; no high-water mark is needed to detect them. The edge case to watch: CDC captures changes at commit time, not statement time. A transaction running for two hours before committing appears in the mirror as a burst at commit rather than spread across the transaction duration.

A published benchmark by a Microsoft engineer under optimal append-only conditions (5 × 32-core machines writing aggressively to an F2 capacity) measured Delta table lag at 11.3 seconds and SQL endpoint availability at 30–60 seconds after file upload. These are best-case figures under a specific workload configuration - the author explicitly describes them as "empirical results from hacking around tunables" rather than a reproducible benchmark. Read them as an upper bound on what the system can achieve, not as typical deployment expectations. The benchmark covers appends only - no updates or deletes.


Cost model

One disclosure worth making explicit: the Fabric mirroring overview states that compute and storage are free, with no exceptions noted. In practice, Google BigQuery charges for CDC compute, Storage Write API, and BigQuery storage, and SAP charges SAP Datasphere Premium Outbound Integration pricing. These are charged by the source-side provider, not Microsoft, but they are real costs that belong in your model.

Cost component Charged? Notes
Replication compute No Off-capacity; free for all mirroring types
OneLake storage for mirrored data No (up to limit) 1 TB free per capacity unit (e.g. 64 TB free on F64); excess charged at standard OneLake rates
OneLake write transactions (landing zone writes) Yes Applies to Open Mirroring; files < 4 MB cost one transaction each — see below
Query compute (SQL endpoint, Spark, Power BI) Yes Charged at standard Fabric capacity rates
Source-side costs (BigQuery, SAP, Snowflake) Yes (varies) Charged by the source provider, not Microsoft

What "free ingestion" means in practice. The replication engine runs off-capacity - confirmed by independent benchmarking and a Microsoft PM. On an F64, practitioners have reported background CU usage around 3–4% for half a billion rows across 50 tables. Storage is free at 1 TB per capacity unit; an F64 gives 64 TB. Once configured, the replication engine continues running even if capacity is paused.

To contextualise against a pipeline-based alternative: a well-optimised incremental pipeline (parameterised loads from multiple SQL sources, landing as Parquet and merging into Delta) can reach a similar absolute CU figure. But that pipeline runs once a day. Pipeline activities and Spark notebooks are background operations in Fabric's CU model, smoothed across a 24-hour window. Running the same pipeline five times a day compresses five times the background CU consumption into the same budget - already saturating an F16 at that cadence. At sub-minute frequency it becomes unworkable on any reasonable SKU.

Mirroring sidesteps this entirely: continuous replication at near-zero background CU cost. For teams running daily incremental pipelines, adopting mirroring is not primarily a cost trade-off - it is a step change in data freshness at no additional ingestion cost.

For a typical database mirroring setup on Azure SQL, SQL Server, PostgreSQL, MySQL, or Cosmos DB queried at modest frequency, Fabric cost is close to zero beyond storage. The cost grows with query scale: ingestion compute is eliminated, not the cost of reading or transforming data.

Open Mirroring write transactions. The ingestion compute is free, but writing files to the landing zone consumes OneLake write transactions, which are charged. One transaction per file smaller than 4 MB; one per 4 MB block for larger files. At 100–150 KB per file, writing one million files costs one million transactions - four times more than batching those records into 4 MB files. A community report documented consuming over 3.7 million CU-seconds on an F64 (more than half the capacity's daily budget) from a single initial load of approximately 1.5 million small Parquet files; a Microsoft engineer confirmed this as expected behaviour, not a bug. If you control the writer, target file sizes of 4 MB or larger and batch aggressively for initial loads. Very large numbers of small files can also cause OOM conditions in the mirroring backend - a Microsoft engineer acknowledged this as a known issue as of late 2024; verify current status before pushing millions of small files.


Mirroring is a current-state replica

This is the most important architectural property to understand before committing.

Mirroring produces a continuously updated copy of what the source looks like right now. That is a legitimate and valuable thing - operational reporting, query offload, Direct Lake integration all rest on it. But it is not a historical record, and it cannot be used as one.

Stop/restart reseeds from current state. If you stop mirroring and restart it - or delete and recreate the mirrored item - Fabric reseeds from the source's current state. You do not get a replay of historical changes; you get today's data. This behaviour is documented across all SQL-family sources and Cosmos DB. A practitioner on the community forums described stop-and-restart as "generally a bad idea, since the Parquet files that make up the table are no longer kept."

Deletes are applied to the mirror. When a row is deleted at the source, it is deleted from the Delta table in OneLake. It does not persist as a tombstone or soft-delete record. Delta time travel provides a short recovery window (1 day for mirrored databases created after mid-June 2025; 7 days for older ones by default; configurable via the portal or API) - but this is not a rebuild window for most organisations.

PostgreSQL TRUNCATE is not replicated. If the source table is truncated, the mirror retains the pre-truncation data. The mirror silently diverges from the source - it is not even a faithful current-state replica for PostgreSQL workloads that use TRUNCATE.

Cosmos DB TTL-deleted rows are not replicated. Rows deleted via Cosmos DB's TTL mechanism do not appear as deletes in the mirror. Those rows remain in the mirrored table indefinitely.


Mirroring and the raw layer question

Whether mirroring is an appropriate implementation of a raw or bronze ingestion layer is one of the more consequential architectural questions it raises. The answer is not universal - it depends on what your raw layer is actually for.

The term "raw layer" - also called bronze, landing, or ingestion depending on organisational convention - describes different things in different teams. Three distinct definitions are worth separating before forming an opinion:

Definition 1 - Current-state replica. A continuously updated copy of the source's present state, queryable as a table. This is what mirroring provides. It faithfully reflects current source data and is available immediately via Spark, the SQL analytics endpoint, or Power BI Direct Lake.

Definition 2 - Change event stream. A sequence of all source changes (inserts, updates, deletes) in commit order, preserved indefinitely as a queryable or replayable record. CDC platforms - Debezium, Azure Data Factory CDC, event streaming pipelines - produce this. Mirroring uses CDC-equivalent mechanisms under the hood to keep the replica current, but it exposes only the resulting state, not the raw change events. The Delta change data feed — an opt-in paid extended capability, now billed at standard Fabric capacity rates — moves somewhat closer to this: it captures inserts, updates, and deletes incrementally and is available across all mirroring sources including Oracle, Snowflake, Azure SQL, and Open Mirroring. It is still a derived feature rather than the raw underlying change stream, but it is a meaningful step toward incremental change processing without a separate CDC pipeline.

Definition 3 - Immutable append-only store. A write-once, never-delete collection of extracted records, typically with extraction timestamps. Source deletes and updates do not remove historical records; they are appended as new versions alongside originals. This is what "bronze" means in recovery-oriented architectures - a layer from which any downstream state can be reconstructed at any point in time. Mirroring explicitly does not provide this: source deletes are applied to the mirror, and stop/restart reseeds from current state.

Current-state replica Change event stream Immutable store Mirroring provides this Data model Mutability Query model Upserted rows reflecting source now Insert / update / delete events in commit order Appended extracts; all versions retained Mutable Source deletes applied; reseeds replace all Append-only Source deletes become tombstone event records Immutable Source changes add new records; none deleted Snapshot today; Delta time travel: ~1-7 day lookback only Replayable Replay events to rebuild any historical state Full history Rebuild any downstream state at any point
Three interpretations of a raw layer. Current-state replica (blue, what mirroring delivers): mutable, snapshot-only. Change event stream (amber): append-only event log, replayable to any past state. Immutable store (green): no rows ever deleted, full history available for reconstruction. Mirroring's mutability is the key constraint that rules it out for recovery-oriented architectures.

When mirroring is sufficient as an ingestion layer

When mirroring is insufficient

The architectural response

These are not mutually exclusive scenarios. Using mirroring for current-state operational reporting does not prevent a separate, lightweight append-only capture layer from handling historical fidelity in parallel. A pipeline that appends extracted records to a separate OneLake path - without transformation, without schema enforcement, just faithful extraction - provides the raw archive that mirroring cannot. The two run independently: mirroring handles the real-time replica; the capture layer handles the immutable record. For many organisations, building this alongside mirroring is less work than it sounds, because a raw capture layer has no transformation requirements.

Source Database operational system of record Mirroring Fabric connector current-state replica run independently Append-only pipeline lightweight extract raw archive OneLake mirror table upserted · current state raw archive append-only · full history
Mirroring and a raw capture layer are not mutually exclusive. The source feeds both paths independently - Fabric mirroring continuously replicates current state via the native connector; a lightweight append-only pipeline writes immutable extracted records alongside it. Both land in OneLake as separate tables with different characteristics: mirror table for current-state queries and Direct Lake; raw archive for historical reconstruction and reprocessing.

The framing to avoid is "mirroring eliminates ingestion infrastructure." What it eliminates is orchestrated incremental pipeline infrastructure for current-state reporting. That is a meaningful and real reduction in complexity. It is not a replacement for a raw archive if your architecture requires one.

A note on disaster recovery

A mirrored database is a derived asset. It can be recreated from the source at any time by deleting and re-mirroring. This makes it a query offload layer, not a backup:

None of this is a flaw in mirroring. It is the correct framing of what the tool is. Architectures that treat the mirror as a backup are making a design error, not an engineering one.


What actually replicates - the data type story

Unsupported data types are the most common unexpected blocker when setting up mirroring. The critical point the documentation does not emphasise up front: unsupported types do not always block the table. Impact depends on where the unsupported type appears.

Scenario Impact
Unsupported type in a regular column Column is silently excluded; the rest of the table mirrors
Unsupported type in a primary key or clustered index column Entire table is blocked
Unsupported type used as a table feature (Always Encrypted, in-memory, etc.) Entire table is blocked

For SQL-family sources, the full blocklist is documented across the individual limitations pages (Azure SQL Database, SQL MI, SQL Server, PostgreSQL, MySQL). Key highlights:

json and vector columns block entire tables for Azure SQL Database. This is a table-level exclusion, not a column skip - the table cannot be mirrored at all. hierarchyid, datetime2(7) as a primary key, and datetimeoffset(7) as a primary key have the same effect.

PostgreSQL silently excludes a long list of column types, including all geometric types, all network address types, all range types, json, jsonb, xml, and interval. None block the table - they disappear from the replica without warning. The PostgreSQL limitations page documents the full list.

Oracle uses an allowlist, not a blocklist. Only fifteen specific types are supported. Everything else - including CLOB, BLOB, XMLTYPE, and SDO_GEOMETRY - is implicitly excluded.

Oracle NUMBER without explicit precision or scale causes hard failures. The common Oracle pattern col NUMBER rather than col NUMBER(10,2) causes the error Invalid Decimal Precision or Scale. Precision: 38, Scale: 127, blocking the entire table. There is no workaround within mirroring - it does not support custom SELECT queries that could add an explicit cast. Changing a production Oracle column type is typically not feasible, meaning affected tables must be handled outside mirroring via pipelines or copy jobs.

Columns with spaces or special characters are supported via Delta column mapping. Previously a replication blocker, these column names are now handled through Delta's column mapping feature and replicate correctly.

Source schema hierarchy is preserved. The source database's schema structure (e.g. dbo, sales, hr) is maintained in the mirrored database and is consistent across the SQL analytics endpoint, Spark, and semantic models.

DDL changes behave differently across sources. Azure SQL picks up ADD COLUMN automatically. PostgreSQL requires a stop-and-restart of replication for any schema change - and stop/restart reseeds from current state (see the raw layer section above). MySQL replication is disrupted by DDL changes. SQL MI does not support ALTER COLUMN or RENAME COLUMN while a table is being mirrored; those operations are blocked outright. Check the limitations page for your specific source before planning any schema migration on a mirrored table.


Operational limits and gotchas

Use a service principal or shared service account - never an individual. A mirrored database is permanently tied to the user who created it. There is no ownership transfer mechanism. If the owner leaves the organisation, the item must be deleted and recreated - which means a reseed from current state. The connection is tied to the owner's Entra ID authentication: if their token expires due to MFA re-prompt, device compliance failure, or account deactivation, replication stops. A Microsoft MVP documented (Jul 2024) that workspace and tenant admins cannot access the connection unless the original owner has shared it first. Create all mirrored database items using a service principal or a shared service account, and share the connection with an admin group immediately after creation.

Note: workspace managed identity cannot create or own mirrored database items per the workspace identity documentation.

Private link support is partial - and most connectors are blocked. When Fabric's "Block Public Internet Access" tenant setting is enabled, most database mirroring connectors are unsupported: active mirrored databases enter a paused state and mirroring cannot be started.

Status Connectors
Supported Open Mirroring, Azure Cosmos DB, Azure SQL Managed Instance, SQL Server 2025
Blocked Azure SQL Database, PostgreSQL, MySQL, Oracle, Snowflake, BigQuery, SAP

On-premises data gateways (required for Oracle, and for SQL Server in many on-premises deployments) fail to register when private link is enabled; VNet data gateways work as a substitute but require separate provisioning. If private link is enabled or planned, verify connector support before building a mirroring architecture around a connector that will not function in that network configuration.

Gateway support for sources behind firewalls (distinct from the Block Internet scenario). Separately from the private link constraint above, Azure SQL Database and Snowflake now support replication through On-Premises Data Gateway and VNet Data Gateway for sources that are behind a firewall but where the tenant-level "Block Public Internet Access" setting is not enabled. This expands gateway support beyond Oracle and SQL Server, which were previously the only connectors that used a gateway path. Azure SQL Managed Instance gateway support has since shipped — VNet data gateway or on-premises data gateway can be used when the instance is not publicly accessible, per the SQL MI FAQ.

CDC requirements vary significantly by source.

Source Mechanism used Source-side requirement
Azure SQL Database SQL Change Feed None — no CDC required
Azure SQL MI SQL Change Feed (newer) / CDC (SQL 2022 policy) Depends on update policy
SQL Server 2016–2022 CDC CDC must be enabled on the source database
SQL Server 2025 Change Event Stream CDC must not be enabled — incompatible
PostgreSQL Logical replication wal_level = logical required
MySQL Binary log replication Binary logging required
Oracle LogMiner Archive log mode and supplemental logging required
Snowflake Snowflake Streams Change tracking on tables
Cosmos DB Cosmos DB change feed Built-in — no configuration required
BigQuery BigQuery CDC (Storage Write API) CDC must be enabled per table; Google charges for CDC compute and Storage Write API
SAP SAP Datasphere change capture Handled by SAP Datasphere — no direct source-side config; Datasphere must be separately licensed

The SQL Server 2016–2022 CDC requirement carries the most operational risk. CDC writes captured changes to dedicated tables in the source database, consumes transaction log space, and requires active management to prevent log fill. On a busy SQL Server, DBAs often resist enabling it. If the transaction log fills and is truncated before mirroring can process it, the mirror reseeds from current state rather than resuming incrementally. Azure SQL Database and SQL Server 2025 use newer architectures (Change Feed and Change Event Stream respectively) that carry meaningfully lower source-side overhead. Where there is a choice of source version, the newer architecture is worth preferring.

Availability group secondaries are not supported for SQL Server. Fabric Mirroring for SQL Server requires connection to the primary of an Always On availability group. Connecting to a secondary - even a designated readable secondary - is not supported. Organisations that deliberately route external connections to the secondary for load isolation will find this a hard blocker.

Deletion vectors break some Python Delta readers. When rows are deleted from a source and replicated to Fabric, the Delta table uses deletion vectors - marking deleted rows without rewriting Parquet files. Spark handles this correctly, as does DuckDB 1.2 and above. Polars (via delta-rs) does not: it raises DeltaProtocolError: The table has set these reader features: {'deletionVectors'} but these are not yet supported. This was confirmed by community reports and acknowledged by Microsoft engineers in May 2025. Reads succeed until the first delete is replicated, then fail. Use Spark or DuckDB for Delta reads against mirrored tables.

Reseed behaviour and automatic recovery. If Fabric capacity is paused for an extended period and the source database's transaction log is truncated before mirroring can resume, a reseed from current source state is triggered. For Azure SQL Database and Azure SQL Managed Instance, automatic reseed is enabled by default - mirroring reinitialises automatically rather than staying broken. SQL Server 2025 supports it but it is disabled by default. Two reseed triggers exist: table-level (DDL changes, truncate, rename) and database-level (transaction log exceeds a configured threshold). In all cases, reseed reinitialises from current source state - the improvement is operational resilience, not historical fidelity.

Structured replication logs via Workspace Monitoring. Enable Workspace Monitoring and replication events are written to an Eventhouse KQL database automatically. The MirroredDatabaseTableExecution table records ProcessedRows, ProcessedBytes, ReplicatorBatchLatency, OperationStartTime, OperationEndTime, and ErrorMessage per operation - enough to query replication history, measure lag over time, and surface failures. Workspace Monitoring is not enabled by default and is charged at standard Eventhouse rates.

Minor configuration limits. Maximum table count per mirrored database is 1,000 - sources above this threshold must be split across multiple items. Every mirrored database automatically creates a paired SQL analytics endpoint that cannot be deleted or disabled independently. For MySQL, only one database per server instance can be mirrored, and tables cannot be added or removed after initial setup.


Security and governance

Row-level security (RLS), column-level security (OLS), and dynamic data masking configured in the source system are not propagated to the mirrored database. This is documented for all SQL-family sources and applies universally. A user restricted from seeing certain rows or columns in the source will have no such restrictions applied when querying the mirrored replica via the SQL analytics endpoint or Spark, unless those controls are manually re-implemented at the Fabric layer.

This is an architectural consequence of the data moving to a different system, not a mirroring bug. It is a significant governance consideration for organisations with row-level security in source databases. Any security model that relies on source-database RLS must be rebuilt entirely in Fabric.

The documentation on RLS behaviour at the SQL analytics endpoint, cross-tenant sharing scenarios, and the interaction between workspace permissions and item-level permissions is thin. Before deploying mirroring for data with regulatory or contractual access controls, test the security model explicitly - do not infer from the source database's configuration.


Choosing your path

The evaluation is two-dimensional. Your source constrains what is technically possible; your requirements determine whether what is possible is actually suitable. Most practitioners inherit their source - they do not choose it. Start there.

Phase 1 — Source Assessment Has native connector? Yes None Hard blockers? No Yes Config approved? No Phase 1 exits Open Mirroring viable? No Traditional pipeline only Yes Yes Phase 2 — Requirements Fit All requirements met? (steps 5–8) current-state only · no replay · no RLS · no SLA guarantee Yes No (any) ✓ Mirroring is a fit native or open mirroring depending on Phase 1 path Traditional pipeline or hybrid approach mirroring does not fit
Follow the blue path down Phase 1: if your source has a native connector and clears blockers and config approval, proceed to Phase 2. Any "No" exit routes via the amber bus to the Open Mirroring viability check (purple diamond). If Open Mirroring is also not viable, the outcome is traditional pipeline only. Phase 2 requirements — current-state reporting, no replay, no RLS auto-propagation, no SLA — apply equally to both paths.

Phase 1: Source assessment

1. Does your source have a native connector, and how mature is it?

Connector tier Sources Guidance
Mature Azure SQL Database, Azure SQL MI, SQL Server, PostgreSQL Well-documented, production-validated; proceed to step 2
Established with known limits MySQL, Cosmos DB, Snowflake Documented with specific constraints; review appendix before proceeding
Sparse documentation Oracle, BigQuery, SAP, Dremio Significant gaps in published behaviour; treat - cells in appendix as unknowns requiring direct testing; budget validation time before committing
No native connector Anything else Jump to Phase 1, step 4

2. Are there hard blockers for your specific source?

Check these before going further. Any one of them may end the native mirroring evaluation:

3. Can you get source-side configuration approved?

Configuration requirements vary significantly by source and typically require DBA or infrastructure team approval:

Source Requirement Overhead
Azure SQL Database None — Change Feed is built-in Zero
SQL Server 2025 None — Change Event Stream is built-in Zero
Cosmos DB None — change feed is built-in Zero
SQL Server 2016–2022 CDC must be enabled on the database Moderate — log space, active management
PostgreSQL wal_level = logical required Low — server restart may be required
MySQL Binary logging required Low
Oracle Archive log mode + supplemental logging + LogMiner access High — significant DBA engagement typically required
BigQuery CDC must be enabled per table Low — but Google charges for CDC compute and Storage Write API
SAP Handled by SAP Datasphere High — Datasphere must be separately licensed and configured

If approval is uncertain or refused, native mirroring is blocked regardless of requirements. Go to step 4.

4. If no native connector, or source-side config not approvable: can you use Open Mirroring?


Phase 2: Requirements fit

Only reach this phase if Phase 1 confirmed a viable source. These questions determine whether what is technically available is actually suitable.

5. What is the reporting use case?

6. Do you need a replayable raw layer?

7. Do you need source-level RLS or column security to propagate automatically?

8. Do you need a guaranteed, SLA-backed latency?


Summary comparison

Native mirroring Open Mirroring Traditional pipeline
Native connector required Yes No No
Source-side config required Yes (varies by source) No Depends on approach
Near-real-time freshness Strong fit Strong fit Harder to achieve
Source-to-semantic, no transforms Ideal Viable Overkill
Transform layers (clean, join, aggregate) Ingestion only; transforms extra Ingestion only; transforms extra Full stack
Replayable raw layer Not provided Not provided Built-in
Immutable append-only history Not provided Not provided Built-in if designed for it
Source RLS / column security propagated No No Depends
Private link (Block Internet enabled) Most connectors unsupported Supported Supported
Guaranteed latency SLA No No Yes (schedulable)
Engineering overhead Low Medium High
Ingestion cost Free Free (write transactions charged) Consumes capacity

Verdict

Mirroring is one of the more genuinely compelling things in the Fabric toolkit - continuous ingestion at near-zero cost is a meaningful offer, and for a large class of operational reporting use cases it works exactly as described. The qualification is precision: it solves a specific problem, and the more clearly that problem is defined before committing, the less likely its limits are to surface in production.

Where it fits well:

Where it does not fit:

On the raw layer question. Whether mirroring is an appropriate "bronze" or raw layer comes down to what that layer needs to guarantee. If it needs to be a current-state replica - a query target for operational reporting - mirroring is an excellent implementation of it. If it needs to be an immutable historical store, a replayable event stream, or a disaster recovery asset, mirroring is not a substitute. These two roles are not in competition; many production architectures will benefit from both running in parallel, with mirroring handling the real-time operational layer and a separate lightweight capture handling the historical record.

Documentation maturity as a risk signal. The - cells in the appendix matrix are not editorial omissions - they are behaviours Microsoft has not published documentation for. If the vendor of a feature cannot tell you whether TRUNCATE is replicated, or how a DDL change is handled, you will find out in production. The connectors with the most - entries - BigQuery, SAP, Oracle, Dremio - are also generally the less mature ones. Factor testing time into your evaluation, and do not assume that absence of a documented limitation means the limitation does not exist.

What to watch. The extended capabilities (Delta change data feed, view mirroring) are now billed at standard Fabric capacity rates — billing was enabled as of March 2026. These are opt-in features; charges apply only when enabled and only for actual processing activity. If you built on these capabilities when they were free, factor the ongoing cost into your capacity budget. SQL Server 2025 and the Change Event Stream architecture (available on Azure SQL Database and SQL MI) are meaningfully better than CDC-based mirroring for older SQL Server versions - prefer the newer architecture where there is a choice of source version.


Appendix: Capability limitations by source

The table below covers feature and capability support across all ten mirroring sources. It is compiled from the official Microsoft documentation limitations pages for each source, linked in the sources column.

A significant number of cells are marked -. These are not gaps in my research - in every case I checked the official Microsoft documentation and found no published guidance. For several sources, including BigQuery, SAP, and Oracle, fundamental operational questions go unanswered: does the connector handle DDL changes? Is TRUNCATE replicated? Can you mirror partitioned tables? Microsoft simply has not documented the answers. The volume of - entries is itself a signal: the less a connector is documented, the less mature it should be assumed to be. For any - cell that matters to your use case, treat it as test before you rely on it in production - you will not find the answer in the docs.

Column key

Abbrev Source Limitations page
SQL DB Azure SQL Database link
SQL MI Azure SQL Managed Instance link
SQL Svr SQL Server (on-premises) link
PG Azure Database for PostgreSQL link
MySQL Azure Database for MySQL link
Oracle Oracle (via On-Premises Data Gateway) link
SF Snowflake link
CDB Azure Cosmos DB link
BQ Google BigQuery link
SAP SAP (via SAP Datasphere + ADLS Gen2) link

Legend

Symbol Meaning
No (TABLE) Not supported; table is entirely blocked from mirroring
No Not supported; feature does not apply or is explicitly excluded
Supported
N/A Concept does not exist in this source
- Microsoft has not published documentation for this combination; behaviour is unknown without direct testing

Table and feature support

Limitation SQL DB SQL MI SQL Svr PG MySQL Oracle SF CDB BQ SAP
Primary key required? No [a] No [b] No (2025) / Yes (2016–22) No Yes No [c] No N/A No -
Max tables per mirrored DB 1,000 1,000 1,000 1,000 1,000 1,000 N/A N/A N/A N/A
Views replicated? No No No No No No No [d] N/A - -
Partitioned tables - - - No (TABLE) - Yes - N/A - -
Materialized views No No No No (TABLE) N/A - - N/A - -
External tables No No No No (TABLE) N/A N/A No N/A - N/A
In-memory OLTP tables No (TABLE) No (TABLE) No (TABLE) N/A N/A N/A N/A N/A N/A N/A
Always Encrypted tables No (TABLE) No (TABLE) No (TABLE) N/A N/A N/A N/A N/A N/A N/A
Temporal history tables No (TABLE) No (TABLE) No (TABLE) N/A N/A N/A N/A N/A N/A N/A
Graph tables No (TABLE) No (TABLE) No (TABLE) N/A N/A N/A N/A N/A N/A N/A
Clustered columnstore index No (TABLE) No (TABLE) No (TABLE) N/A N/A N/A N/A N/A N/A N/A

DDL and schema change behaviour

Operation SQL DB SQL MI SQL Svr PG MySQL Oracle SF CDB BQ SAP
Add column Auto Auto Auto (2025) / Error (2016–22) Stop/restart Disrupts replication ✓ (partial) Delayed - -
Rename column - Blocked - Stop/restart Disrupts replication - ✓ (old + new col retained) - -
Change column type - Blocked - Stop/restart Disrupts replication Blocked - Compatible types only - -
Alter / add primary key Blocked Blocked Blocked - - N/A - N/A N/A N/A
TRUNCATE replicated? - - - No - - - N/A - N/A

Security and connectivity

Limitation SQL DB SQL MI SQL Svr PG MySQL Oracle SF CDB BQ SAP
RLS propagated from source No No No No No N/A N/A N/A N/A N/A
Column / OLS propagated No No No No N/A N/A N/A N/A N/A N/A
Dynamic data masking propagated No No No N/A N/A N/A N/A N/A N/A N/A
On-Premises Data Gateway required No (optional) Depends [e] Always No (optional) No (optional) Always No (optional) No [f] Always N/A [g]
Source-side costs during mirroring No No No No No No Yes (compute) No [h] Yes (CDC compute) Yes (Datasphere pricing)
Burstable compute tier supported N/A No No N/A N/A N/A N/A N/A

Item and configuration limits

Limitation SQL DB SQL MI SQL Svr PG MySQL Oracle SF CDB BQ SAP
Multiple databases per connection No [i] N/A
Can mirror to multiple workspaces No No No No N/A - - - -
Ownership change supported No No No No No No No No No No
Can change source after setup No No No No No No No No No No

Notes

[a] No primary key required as of April 2025. Existing keyless tables that were excluded before this date must be manually re-added to the mirror.

[b] No primary key required as of May 2025. Same re-add caveat applies.

[c] Oracle supports tables without a primary key if a unique index exists. Tables with neither a primary key nor a unique index cannot be mirrored.

[d] View mirroring is available for Snowflake only, as a paid extended capability. Billing is currently disabled; the feature must be enabled via API as the UX toggle is temporarily unavailable.

[e] SQL MI on the SQL Server 2022 update policy requires a data gateway. Always-up-to-date and 2025 update policies do not.

[f] Cosmos DB uses Network ACL Bypass, so no gateway is required even for accounts on VNets or private endpoints.

[g] SAP mirroring routes data through SAP Datasphere into ADLS Gen2. Fabric connects to the ADLS Gen2 container, not to SAP directly. The gateway question does not apply in the same sense.

[h] Cosmos DB Data Explorer queries initiated from within the Fabric experience consume RUs from the source Cosmos DB account.

[i] Only one MySQL database per server instance can be mirrored. Tables cannot be added or removed after the initial mirror configuration is set up.


All factual claims in this post are linked to their source. Claims sourced from community reports are identified as such and may not reflect the current state of the product.


Brad Coles is a Senior Consultant and Data Engineering Capability Lead at Synechron Australia, specialising in Microsoft Fabric and modern data platform engineering. Connect on LinkedIn.