Top 10: Data Lake platforms

Updated: July 30, 2023

Data lake platforms are robust and scalable solutions that enable organizations to store and manage vast amounts of structured and unstructured data in its raw form. These specialized platforms provide a centralized repository where data from various sources can be ingested, stored, and processed without the need for upfront data modeling or transformation. Data lake platforms can handle diverse data types, including text, images, videos, and sensor data, making them ideal for big data and IoT applications. With built-in data governance and access controls, data lake platforms ensure data security and compliance with privacy regulations. By utilizing data lake platforms, businesses can leverage the power of big data analytics, machine learning, and artificial intelligence to gain valuable insights and make data-driven decisions. Data lake platforms play a pivotal role in modern data architecture, enabling organizations to store and process vast amounts of data efficiently and unleash the full potential of their data assets for improved business intelligence and innovation. Some of the most popular data lake platforms are listed below.

See also: Top 10 Big Data platforms

2023. Microsoft launches Fabric, a new end-to-end data and analytics platform



Microsoft has introduced Microsoft Fabric, a comprehensive data and analytics platform (distinct from Azure Service Fabric). This new platform revolves around Microsoft's OneLake data lake, while also enabling data retrieval from Amazon S3 and, soon, Google Cloud Platform. It encompasses a wide range of functionalities, including integration tools, a data engineering platform based on Spark, a real-time analytics platform, and an enhanced Power BI for intuitive visualization and AI-driven analytics. Additionally, a new no-code developer experience empowers users to monitor their data in real-time, triggering actions and notifications based on it. These tools are seamlessly interconnected, and Microsoft has integrated its AI Copilot into the Fabric framework as well.


2022. Salesforce built a data lake to transform how customer data moves on the platform



The primary objective of consolidating customer data into a customer data platform (CDP) is to create more meaningful customer experiences in real time. Salesforce is taking a step towards achieving this goal with the introduction of Genie, a real-time data integration platform. Genie acts as a data lake that supports the entire Salesforce platform, facilitating the rapid and efficient movement of data to where it is most needed. With Genie in place, Customer 360 applications encompassing sales, service, commerce, and marketing gain access to a new and powerful method of acquiring data in real time and at scale. However, Genie offers more than just data integration capabilities. It paves the way for various automation opportunities, particularly when combined with Salesforce's AI and machine learning tool, Einstein, and the company's workflow tool, Salesforce Flow. By enabling faster and smoother data flow, Genie enhances the potential for automating processes and optimizing customer experiences.


2022. Cloudera launches its all-in-one SaaS data lakehouse



Cloudera, the company that specializes in big data with a focus on Hadoop, is now shifting its focus towards becoming the unified data fabric for hybrid data platforms. Taking a step further in this direction, the company recently launched its Cloudera Data Platform (CDP) One, a data lakehouse as a service (LaaS). This managed offering aims to provide enterprises with a platform that enables self-service analytics and data access for a broader range of employees. While Databricks, known for popularizing the lakehouse concept, also offers SaaS-based solutions, Cloudera positions its service as the "first all-in-one data lakehouse SaaS offering." Cloudera emphasizes that its service combines compute, storage, machine learning, streaming analytics, and enterprise security, making it a comprehensive solution for organizations.


2022. Dremio raises $160M for its data lake platform



Data lake platform Dremio has recently secured a substantial Series E funding round of $160 million. Dremio, alongside competitors like Databricks, operates within a rapidly growing market. In the past, data lakes and data warehouses were primarily associated with a narrow range of use cases. However, the emergence of the "lake house" concept, initially popularized by Databricks, signifies a significant shift in possibilities. This new breed of technologies empowers enterprises to leverage data in unprecedented ways, expanding the potential for innovation and insight. With this latest funding, Dremio is well-positioned to continue driving advancements in the data lake space and enabling organizations to unlock the full value of their data assets.