Microsoft Integrates Cosmos DB into Fabric for Smarter Data Analysis
Microsoft has made a big move in its data platform. It has added support for Cosmos DB inside its Fabric data platform. This means users can now analyze Cosmos DB data at scale, which is great for enterprise AI projects.
Recently, Microsoft announced this support during Build 2025 and opened a public preview. Now, Cosmos DB joins Fabric’s list of operational data sources. Fabric is changing how we work with data. It handles not just traditional relational data but also unstructured data stored in data lakes. It supports structured, unstructured, and relational data, including NoSQL databases. If data can be stored in Fabric, it can be analyzed there.
This integration aims to give organizations a powerful tool for large-scale data analysis. Fabric’s data agents collect data from various sources, making it easier to build AI models. Having more operational data helps improve AI accuracy and tuning.
Bringing Cosmos DB into Fabric’s Data Ecosystem
Adding Cosmos DB to Fabric means all of Microsoft’s major big data sources are now part of its platform. Users can build queries that span multiple tables and formats. This is useful for complex data projects, especially when working with different APIs Cosmos DB offers. You can manage complex data types and still use familiar tools like Python notebooks for analysis.
It’s important to note that this isn’t the same as embedding the standalone Cosmos DB variant called DocumentDB into Fabric. Instead, Microsoft is offering a new way to use the full Cosmos DB service within Fabric. This allows you to benefit from Cosmos DB’s scalability and high-availability features while working within Fabric’s lakehouse environment. The goal is to support semi-structured data analysis from existing Cosmos DB applications and future projects.
Enhanced Search with Vector Indexing
Fabric-hosted Cosmos DB data can now take advantage of vector indexing features. These allow documents to be stored alongside their vector representations, making searches faster and more efficient. You can choose from different indexing techniques based on your needs, from simple flat indexes to advanced options like DiskANN, which is optimized for large datasets.
With vector indexes, searches can be based on similarity rather than just exact matches. This is similar to how search engines work. It’s especially useful for finding related items, such as reviews or images, in large datasets. To use this, you’ll need to set a vector policy for each Cosmos DB container, defining details like size and the distance function used for searching.
Querying Cosmos DB Data in Fabric
When you query Cosmos DB data through Fabric’s OneLake, you’re working with a mirrored copy stored in Delta Parquet format. This setup allows you to use various query tools, including Power BI, for ad hoc analysis. You can run queries across all your operational data, not just Cosmos DB, creating a unified view.
This setup also makes it easy to add features like embeddings and vector indexes to your Cosmos DB data. These can then be used as part of AI models, especially retrieval-augmented generation (RAG) systems. Managing your Cosmos DB databases is straightforward through the Fabric portal, where you can create new databases or mirror existing ones.
Even if you mainly use Cosmos DB for JSON documents, you can query the data with SQL within Fabric. You can also connect multiple databases and enable cross-database joins, including between SQL and NoSQL sources.
Using Cosmos DB in Lakehouses
One of Fabric’s most powerful features is its ability to create lakehouses. These act as a bridge between data lakes and data warehouses. A lakehouse provides a single SQL endpoint, making it easier to analyze diverse data types stored in the lake.
You can connect your Cosmos DB data to a Fabric lakehouse, either by creating a new one or adding to an existing lakehouse. The data is stored in Delta format, which makes it easy to run SQL queries and visualize results with built-in tools like notebooks. Since Cosmos DB supports in-database processing, you can also use Fabric’s Git integration to link your databases with Azure DevOps or GitHub, facilitating collaboration and deployment.
This setup supports modern development workflows, including CI/CD pipelines, making it easy to test and deploy updates quickly.
Cost and Resource Considerations
Switching to Fabric’s capacity model changes how you pay for Cosmos DB. Instead of the traditional RU (Request Units) model, you now pay based on Fabric’s capacity units (CU) per hour. Microsoft provides a conversion table to help estimate costs, with 100 RU/s roughly equal to 0.067 CU/hr. If you use autoscaling, it’s good to set limits to control costs.
Managing these resources might require using the Fabric SDK, especially if you want more control over scaling and costs. This integration of Cosmos DB and Fabric is a significant upgrade, combining the strengths of both platforms. It makes large-scale, complex data analytics easier and more flexible, paving the way for more advanced AI applications on Microsoft’s cloud.
Microsoft’s move to unify these tools reflects its commitment to providing a comprehensive, scalable data platform. For organizations working with big data and AI, this integration offers more options and better performance, all within a familiar environment.












What do you think?
It is nice to know your opinion. Leave a comment.