Top 10 Big Data platforms

Last updated: September 10, 2022

Big Data platforms allow to manage and analyse data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications. They use a network to solve problems involving massive amounts of data and computation. Big Data platforms can be deployed in local data center or used from the Cloud (Big Data as a Service).
Snowflake is the only data platform built for the cloud for all your data & all your users. Learn more about our purpose-built SQL cloud data warehouse.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Apache Spark is a fast and general engine for large-scale data processing. Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Write applications quickly in Java, Scala or Python. Combine SQL, streaming, and complex analytics.
The Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
Cloudera helps you become information-driven by leveraging the best of the open source community with the enterprise capabilities you need to succeed with Apache Hadoop in your organization. Designed specifically for mission-critical environments, Cloudera Enterprise includes CDH, the world’s most popular open source Hadoop-based platform, as well as advanced system management and data management tools plus dedicated support and community advocacy from our world-class team of Hadoop developers and experts. Cloudera is your partner on the path to big data.
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. You can start small for just $0.25 per hour with no commitments or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year, less than a tenth of most other data warehousing solutions.
Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.
Teradata Aster features Teradata Aster SQL-GR analytic engine which is a native graph processing engine for Graph Analysis across big data sets. Using this next generation analytic engine, organizations can easily solve complex business problems such as social network/influencer analysis, fraud detection, supply chain management, network analysis and threat detection, and money laundering.
Unified Data Analytics Platform - One cloud platform for massive scale data engineering and collaborative data science.
Amazon EMR is a service that uses Apache Spark and Hadoop, open-source frameworks, to quickly & cost-effectively process and analyze vast amounts of data.
  on Live Enterprise
Presto is a highly parallel and distributed query engine for big data, that is built from the ground up for efficient, low latency analytics.
BigQuery is a serverless, highly-scalable, and cost-effective cloud data warehouse with an in-memory BI Engine and AI Platform built in.
Apache Impala is a modern, open source, distributed SQL query engine for Apache Hadoop.
HDInsight is a Hadoop distribution powered by the cloud. This means HDInsight was architected to handle any amount of data, scaling from terabytes to petabytes on demand. You can spin up any number of nodes at anytime. We charge only for the compute and storage you actually use.
Vertica offers organizations new and faster ways to store, explore and serve more data. Vertica lets organizations store data in a cost-effectively, explore it quickly and leverage well-known SQL-based tools to get customer insights. By offering blazingly-fast speed, accuracy and security, it offers operational advantages to the entire organization.
Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage. Get faster insights without the overhead (data loading, schema creation and maintenance, transformations, etc.). Analyze the multi-structured and nested data in non-relational datastores directly without transforming or restricting the data
Qubole is a Big Data as a Service (BDaas) Platform Running on Leading Cloud Offerings Like AWS. Qubole enables you to utilize a variety of Cloud Databases and Sources, including S3, MySQL, Postgres, Oracle, RedShift, MongoDB, Vertica, Omniture, Google Analytics, and your on-premise data
The MapR Distribution for Apache Hadoop provides organizations with an enterprise-grade distributed data platform to reliably store and process big data. MapR packages a broad set of Apache open source ecosystem projects enabling batch, interactive, or real-time applications. The data platform and the projects are all tied together through an advanced management console to monitor and manage the entire system.
Build, deploy, and run data processing pipelines that scale to solve your key business challenges. Google Cloud Dataflow enables reliable execution for large scale data processing scenarios such as ETL, analytics, real-time computation, and process orchestration.
Google Cloud Dataproc is a managed Hadoop MapReduce, Spark, Pig, and Hive service designed to easily and cost effectively process big datasets. You can quickly create managed clusters of any size and turn them off when you are finished, so you only pay for what you need. Cloud Dataproc is integrated across several Google Cloud Platform products, so you have access to a simple, powerful, and complete data processing platform.
SAP HANA converges database and application platform capabilities in-memory to transform transactions, analytics, text analysis, predictive and spatial processing so businesses can operate in real-time.
IBM Netezza appliances - expert integrated systems with built in expertise, integration by design and a simplified user experience. With simple deployment, out-of-the-box optimization, no tuning and minimal on-going maintenance, the IBM PureData System for Analytics has the industry’s fastest time-to-value and lowest total-cost-of-ownership.
1010data provides a cloud-based platform for big data discovery and data sharing that delivers actionable, data-driven insights quickly and easily. 1010data offers a complete suite of products for big data discovery and data sharing for both business and technical users. Companies look to 1010data to help them become data-driven enterprises.

Latest news about Big Data platforms

2022. Cloudera launches its all-in-one SaaS data lakehouse

Cloudera, the Hadoop-centric big data company is now putting its emphasis on becoming the unified data fabric for hybrid data platforms. The company took a next step in this direction with the launch of its Cloudera Data Platform (CDP) One data lakehouse as a service (LaaS?). This managed offering is meant to give enterprises a platform to enable self-service analytics and data access for more of their employees. The company calls it the “first all-in-one data lakehouse SaaS offering”, though Databricks, which popularized the lakehouse concept, also offers SaaS-based solutions. It makes for good marketing copy, though, and Cloudera argues that its service is the first to combine compute, storage, ML, streaming analytics and enterprise security.

2021. Firebolt raises $127M for its new approach to cheaper and more efficient big data analytics

Snowflake changed the conversation for many companies when it comes to the potentials of data warehousing. Now one of the startups that’s hoping to disrupt the disruptor, Firebolt, has raised $127 million Series B. The startup claims that its performance is up to 182 times faster than that of other data warehouses with a SQL-based system that works on academic research that had yet to be applied anywhere, around how to handle data in a lighter way, using new techniques in compression and how data is parsed. Data lakes in turn can be connected with a wider data ecosystem, and what it translates to is a much smaller requirement for cloud capacity. And lower costs.

2020. Panoply raises $10M for its cloud data platform

Panoply, a platform that makes it easier for businesses to set up a data warehouse and analyze that data with standard SQL queries, has raised $10M. The company, which launched back in 2015, has mostly stuck to its original vision, which was always about democratizing access to data warehousing and the analytics capabilities that go hand-in-hand with that. Over the last few years, it also built more code-free data integrations into the platform that make it easier for businesses to pull in data from a wide variety of sources, including the likes of Salesforce, HubSpot, NetSuite, Xero, Quickbooks, Freshworks and others. It also integrates with other data warehousing services like Google’s BigQuery and Amazon’s Redshift and all of the major BI and analytics tools.

2020. Altinity grabs $4M to build cloud version of ClickHouse open-source data warehouse

Altinity, the commercial company behind the open-source ClickHouse data warehouse, announced a $4 million seed round along with a new cloud service, Altinity.Cloud.. Altinity.Cloud offers immediate access to production-ready ClickHouse clusters with expert enterprise support during every aspect of the application life cycle. It also helps with application design and implementation and production assistance, in essence combining the consulting side of the house with the cloud service.

2020. InfoSum raises $15.1M for its privacy-first, federated approach to big data analytics

InfoSum, a London startup that has built a way for organizations to share their data with each other without passing it on to each other — by way of a federated, decentralized architecture that uses mathematical representations to organise, “read” and query the data — is today announcing that it has raised $15.1 million. InfoSum’s solution today may be aimed at martech, but it is something that affects a number of industries. Indeed, the decision to focus on marketing technology, he said, was partly because that is the industry that Halstead worked most closely with at DataSift, although the plan is to expand to other verticals as well.

2020. Collibra nabs $112.5M for its big data management platform

Collibra, which provides tools to manage, warehouse, store and analyse data troves, has raised $112.5M Series F funding, at valuation $2.3 billion. Collibra was originally a spin-out from Vrije Universiteit in Brussels, Belgium and today it works with some 450 enterprises and other large organizations. Customers include Adobe, Verizon, insurers AXA and a number of healthcare providers. Its products cover a range of services focused around company data, including tools to help customers comply with local data protection policies and store it securely, and tools (and plug-ins) to run analytics and more.

2020. BackboneAI scores $4.7M seed to bring order to intercompany data sharing

BackboneAI, an early-stage startup that wants to help companies dealing with lots of data, particularly coming from a variety of external sources, announced a $4.7 million seed investment. BackboneAI is an AI platform specifically built for automating data flows within and between companies. This could involve any number of scenarios from keeping large, complex data catalogues up-to-date to coordinating the intricate flow of construction materials between companies or content rights management across an entertainment industry.

2019. Starburst raises $22M to modernize data analytics with Presto

Starburst, the company that’s looking to monetize the open-source Presto distributed query engine for big data (which was originally developed at Facebook), has announced that it has raised a $22 million funding round. The general idea behind Presto is to allow anybody to use the standard SQL query language to run interactive queries against a vast amount of data that can sit in a variety of sources. Starburst plans to monetize Presto by adding a number of enterprise-centric features on top, with the obvious focus being security features like role-based access control, as well as connectors to enterprise systems like Teradata, Snowflake and DB2, and a management console where users can configure the cluster to auto-scale, for example.

2019. HPE acquires big data platform MapR

Hewlett Packard Enterprises has acquired MapR Technologies, the distributor of a Hadoop-based data analytics platform. The deal includes MapR’s technology, intellectual property, and domain expertise in AI, machine learning, and analytics data management. The MapR portfolio will bolster HPE’s existing big data offerings, which includes the BlueData software it acquired in November. BlueData’s software delivers a container-based approach for spinning up and managing Hadoop, Spark, and other environments on bare metal, cloud, or hybrid platforms. The MapR platform provides a number of capabilities for running distributed applications. The software exposes storage APIs for e S3 API, to go along with APIs for HDFS, POISX, NFS, and Kafka.

2018. Big Data platforms Cloudera and Hortonworks merge

Over the years, Hadoop, the once high-flying open-source platform, gave rise to many companies and an ecosystem of vendors emerged. The problem with Hadoop was the sheer complexity of it. That’s where companies like Hortonworks and Cloudera came in. They packaged it for IT departments that wanted the advantage of a big data processing platform, but didn’t necessarily want to build Hadoop from scratch. These companies offered different ways of helping to attack that complexity, but over time, with all the cloud-based big data solutions, rolling a Hadoop system seemed futile, even with the help of companies like Cloudera and Hortonworks. Today the two companies announced are merging in a deal worth $5.2 billion. The combined companies will boast 2,500 customers, $720 million in revenue and $500 million in cash with no debt, according to the companies.

2016. HP to sell its software business to Micro Focus

Hewlett-Packard Enterprise agreed to sell its software business to Micro Focus in a $8.8 billion deal. The quarter of the value of HP Enterprise software is Autonomy that HP acquired for $11 billion in 2011. The business also includes Mercury Interactive, which cost HP $4.5 billion in 2006, Vertica which cost $320m, and ArcSight, which it bought for $1.5 billion in 2010. HPE Chief Executive Meg Whitman will focus the company on other areas such as networking, storage and technology services since it separated last year from computer and printer maker HP Inc.

2015. MapR tries to separate from Hadoop

MapR is one of several companies built on the open source Hadoop platform, and as such it has a bit of competition in the space. In an effort to create some separation from its better heeled rivals, it announced a new product called MapR Streams. This new product takes a constant stream of data like feeding consumer data to advertisers to create custom offers or distributing health data to medical professionals to tailor medication or treatment options — all of this in near real-time. Streams let customers share data sources with people or machines that need to make use of that information in a subscription-style model. A maintenance program could subscribe to the data coming from the shop floor of a manufacturer and learn about usage, production, bottlenecks and wear and tear, or IT could subscribe to a data stream with log information looking for anomalies that signal maintenance issues or a security breach.

2015. Google launched new managed Big Data service Cloud Dataproc

Google is adding another product in its range of big data services on the Google Cloud Platform - Cloud Dataproc service, that sits between managing the Spark data processing engine or Hadoop framework directly on virtual machines and a fully managed service like Cloud Dataflow, which lets you orchestrate your data pipelines on Google’s platform. Dataproc users will be able to spin up a Hadoop cluster in under 90 seconds — significantly faster than other services — and Google will only charge 1 cent per virtual CPU/hour in the cluster. That’s on top of the usual cost of running virtual machines and data storage, but you can add Google’s cheaper preemptible instances to your cluster to save a bit on compute costs. Billing is per-minute, with a 10-minute minimum. Because Dataproc can spin up clusters this fast, users will be able to set up ad-hoc clusters when needed and because it is managed, Google will handle the administration for them.

2015. Hortonworks acquired dataflow solutions developer Onyara

Hortonworks, a publicly traded company selling a commercial distribution of the Hadoop open-source big data software, announced today that it has acquired Onyara, an early-stage startup whose employees developed Apache NiFi, a piece of open-source software that was first used inside the National Security Agency (NSA). Apache NiFi allows to to deliver sensor data to the right systems and keep track of what was happening to the data. Hortonworks, which itself spun out of Yahoo, has previously acquired XA Secure and SequenceIQ. Now Hortonworks will be selling a new subscription based on the Apache NiFi software, under the name Hortonworks DataFlow.

2015. Data transformation service Tamr raised $25.2 million

Tamr, the startup that helps companies understand and unify all of the disparate databases across a company, announced a $25.2 million Series B round today. Tamr wants to have the same impact on the enterprise that Google had on the web. Instead of having an algorithm that goes out and finds web pages, the Tamr algorithm goes out and finds databases. The problem for larger companies today is that they have all of these databases and have no idea what data they have. Not knowing what you have is a dangerous situation because data can walk out the door and the company will have no idea it happened. Tamr can create a central catalogue of all these data sources (and spreadsheets and logs) spread out across the company and give greater visibility into what exactly a company has. This has value on so many levels, but especially on a security level in light of all the recent high-profile breaches.

2015. IBM bets on big data Apache Spark project

IBM has announced that it would devote 3500 researchers to the open source big data project  Apache Spark. It also announced that it was open sourcing its own IBM SystemML machine learning technology in a move designed to help push it to the forefront of big data and machine learning. These two technologies are part of the IBM transformation strategy that includes cloud, big data, analytics and security as its pillars. As part of today’s announcement, IBM has pledged to build Spark into the core of its analytics products and will work with Databricks, the commercial entity created to support the open source Spark project. IBM isn’t just giving all of these resources away out of largesse. It wants to be a part of this community because it sees these tools as the foundation for big data moving forward. If it can show itself to be a committed member to the open source project, it gives it clout with companies who are working on big data and machine learning projects using open source tools — and that opens the door to consulting services and other business opportunities for Big Blue.

2015. MapR adds Apache Drill to its Hadoop distribution

MapR announced that its Hadoop distribution now ships with Apache Drill - an open source, low latency SQL query engine for Hadoop and NoSQL. Its promise is that it makes it easier for end users to interact with data from both legacy transactional systems and new data sources, such as Internet of Things (IoT) sensors, web click-streams and other semi-structured data, along with support for popular business intelligence (BI) and data visualization tools. Apache Drill 1.0, which is now included in MapR’s distro, is free for the taking. So should a competitor, like Hortonworks, who has at least one contributor on the project, find it extremely valuable, they can engineer it into their distro as well.

2015. Google launched NoSQL database Cloud Bigtable

Google is launching a new NoSQL database Cloud Bigtable, based on the company’s Bigtable data storage system that powers the likes of Gmail, Google Search and Google Analytics, so this is definitely a battle-tested service. Google promises that Cloud Bigtable will offer single-digit millisecond latency and 2x the performance per dollar when compared to the likes of HBase and Cassandra. Because it supports the HBase API, Cloud Bigtable can be integrated with all the existing applications in the Hadoop ecosystem, but it also supports Google’s Cloud Dataflow. This is not Google’s first cloud-based NoSQL database product. With Cloud Datastore, Google already offers a high-availability NoSQL datastore for developers on its App Engine platform. That service, too, is based on Bigtable. Cory O’Connor, a Google Cloud Platform product manager, tells me Cloud Datastore focuses on read-heavy workload for web apps and mobile apps.

2015. MapR revamps its Hadoop platform with more real-time analytics

The latest release of MapR enterprise-grade distributed Hadoop data platform is built for the real time, data-centric enterprise. It leverages table replication features designed to extend access to “big and fast” data enabling multiple instances to be updated in different locations, with all the changes synchronized across them. Reacting to business as it happens with the right offer is a must. Wrong offers are not only missed opportunities but put enough of them together and they could threaten a company’s viability. That’s one of the reasons why some enterprises are ditching their RDBMS and going with MapR. It offers both a top-rated NoSQL database and Hadoop in nicely bundled solution. MapR, unlike its competitors Hortonworks and Cloudera, is a software company whose aim is to make big data plug and play.

2015. Google partners with Cloudera to bring Cloud Dataflow to Apache Spark

Google announced that it has teamed up with the Hadoop specialists at Cloudera to bring its Cloud Dataflow programming model to Apache’s Spark data processing engine. With Google Cloud Dataflow, developers can create and monitor data processing pipelines without having to worry about the underlying data processing cluster. As Google likes to stress, the service evolved out of the company’s internal tools for processing large datasets at Internet scale. Not all data processing tasks are the same, though, and sometimes you may want to run a task in the cloud or on premise or on different processing engines. With Cloud Dataflow — in its ideal state — data analysts will be able use the same system for creating their pipelines, no matter the underlying architecture they want to run them on.

2015. Teradata acquired app marketing platform Appoxee

Analytics company Teradata acquired (for about $20 million) Appoxee, an Israeli push-messaging startup aimed at publishers and developers that want to increase user engagement in their apps. Appoxee’s business addresses one of the bigger issues in the world of apps today: keeping users coming back to and using your app, in the face of those users downloading yet another new app instead, always moving on to the next big thing. Appoxee gives developers a way to addresses this using push messages — sending messages to you to remind you to finish playing a game, or to send you info about an app update, or coupons for goods in the app. It also has a platform to help build these push messaging campaigns.

2014. Teradata acquired data-archiving service RainStor

Data warehouse vendor Teradata continues to buy its way into Big Data leadership. It has made its fourth acquisition of the year, announcing on Wednesday it has bought data-archiving specialist RainStor for an undisclosed amount. RainStor builds an archival system that can sit on top of Hadoop and, it claims, compress data volumes by up to 95 percent. Taken as a whole with the company’s other acquisitions, including Hadapt and Think Big Analytics, it’s pretty clear that Teradata wants to play a bigger role in companies’ big data environments than just that of a data warehouse and business intelligence provider.

2014. Business analytics provider Palantir raises $50 Million

Palantir, the big data company, has raised another $50 million. Even at $9 billion, Palantir was already among Silicon Valley’s most valuable private technology companies, some of which have seen massive bumps in valuations recently. Palantir, co-founded by entrepreneur Peter Thiel in 2004, got its start selling its software, which looks for patterns across broad data sets, to government agencies like the CIA and NSA. The company expects to end the year with over $1 billion in revenue and is working to expand its customer base. It is selling data analysis technology to Wall Street firms that want to detect fraud and to pharmaceutical companies looking to expedite the development of new drugs. Hershey HSY -1.06% has been using Palantir’s tools to find correlations between weather patterns and consumer behavior.

2014. After IPO, Hortonworks is a $1 billion Hadoop company

Shares for Hadoop vendor Hortonworks finished their first day of trading at $26.48, so the company’s total market cap is $1.1 billion at the close of trading Friday. Hortonworks offers only open source software and makes its money on support and services. Hortonworks was founded and launched in 2011, after a group of engineers spun the company out from Yahoo, which had been driving much of the work on the open source Apache Hadoop project. But the stock rallied late in a trading day that was awful for most major stocks. No doubt Cloudera and MapR, Hortonworks’ two largest rivals in the pure-play Hadoop space, will be watching the company’s stock closely over the coming months. MapR also claims a private-market valuation of more than $1 billion, while Cloudera’s valuation is more than $4 billion.

2014. Big Data as a Service company Qubole raises $13 million

Hadoop-as-a-service startup Qubole has raised a $13 million series B round of venture capital. Qubole is hosted on the Amazon Web Services cloud, but can also run on Google Compute Engine, and acts like one might expect a cloud-native Hadoop service to act. It has a graphical user interface, connectors to several common data sources (including cloud object stores), and it takes advantage of cloud capabilities such as autoscaling and spot pricing for compute. What’s interesting about Qubole is that although it originally boasted optimized versions of Hive and other MapReduce-based tools, the company also lets users analyze data using the Facebook-created Presto SQL-on-Hadoop engine, and is working on a service around the increasingly popular and very fast Apache Spark framework.

2014. MapR partners with Teradata to reach enterprise customers

The last independent Hadoop provider MapR and big data analytics provider Teradata announced that they will work together to integrate and co-develop their joint products and to create a unified go to market strategy. Teradata will also be able to resell MapR software, professional services, and provide customer support. In other words, Teradata will be the face of MapR to enterprises who use, or want to use, both technologies. Until recently Teradata partnered most closely with Hortonworks, but now it’s sharing love and its analytic market leadership with all three providers. Similarly, earlier this week, HP announced Vertica for SQL on Hadoop, which allows users to access and explore data residing in any of the three primary Hadoop distros — Hortonworks, MapR, Cloudera.

2014. HP plugs the Vertica analytics platform into Hadoop

HP announced Vertica for SQL on Hadoop. Vertica is an analytics platform that enables customers to access and explore data residing in any of the three primary Hadoop distros — Hortonworks, MapR, Cloudera — or any combination thereof. Large companies are often using all three kinds of Hadoop because they don’t know which will be dominant. HP is one of the first big vendors to say “any flavor of Hadoop will do” by taking action, though it has invested $50 million in Hortonworks which is, at present, the flavor of Hadoop inside HAVEn, its analytics stack. HP’s announcement centers not only around its interoperability, but also its power on data stored in a data lake, enterprise data hub, whatever you want to call it. HP now provides a seamless way to explore and exploit value in data that’s stored on the Hadoop Distributed File System (HDFS). The power, speed, and scalability of HP Vertica with the ease with which Hadoop lassos big data might persuade reticent managers to come out from underneath their desks and take big data on.

2014. Enterprise Hadoop provider Hortonworks filed for an IPO

Hortonworks, the company building commercial Hadoop technology, has filed for its initial public offering. The company claims more than $33 million in revenue for the year thus far and nearly $88 million in operating loss. Hortonworks spun off from Yahoo in 2011. It offers a big data processing platform that includes the ability to process various types of data including SQL and NoSQL sources then search across data, or use various analytics tools to visualize the data. Hortonworks has a reputation for being a pure Hadoop offering without any proprietary extensions.

2014. IBM adds Netezza analytics as a service to its cloud

IBM announced a bunch of new cloud data services to IBM Cloud, including an intelligent data-preparation tool DataWorks, an in-memory analytic database Netezza-powered dashDB and even a local version of a historically cloud-based database Cloudant. It’s an impressive and in some cases even unique set of capabilities that complements the work IBM has been pushing with its Bluemix platform. In particular, with the dashDB, IBM joins Amazon Web Services, Google and Microsoft with a homegrown analytic service built atop columnar database technology.

2014. The Netezza team is back with Big Data startup Cazena

Starup Cazena, that just launched with $8 million funding, promises to simplify big data for large companies. The company is leaning pretty heavily on its founding team’s experience building data warehouse specialist Netezza (which was acquired by IBM in 2010) earlier this century: Cazena CEO Prat Moghe (pictured above) was a senior vice president at Netezza, while Netezza founder Jit Saxena and longtime Netezza CEO Jim Baum sit on the company’s board. They say that large companies are confused about the technologies they need to deploy. They don’t necessarily know when and where things like Hadoop, NoSQL, Spark, Elasticsearch come into play, and they certainly don’t know how to turn them into a functional “data lake” like some vendors are pitching. Cazena wants to make big data less about infrastructure and more about applications, and it wants to use the cloud to do it.