Top 24 Big Data Analytics platforms
Last updated: December 08, 2019
Big Data Analytics platforms allow to manage and analyse data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.
Teradata Aster features Teradata Aster SQL-GR analytic engine which is a native graph processing engine for Graph Analysis across big data sets. Using this next generation analytic engine, organizations can easily solve complex business problems such as social network/influencer analysis, fraud detection, supply chain management, network analysis and threat detection, and money laundering.
Cloudera helps you become information-driven by leveraging the best of the open source community with the enterprise capabilities you need to succeed with Apache Hadoop in your organization. Designed specifically for mission-critical environments, Cloudera Enterprise includes CDH, the world’s most popular open source Hadoop-based platform, as well as advanced system management and data management tools plus dedicated support and community advocacy from our world-class team of Hadoop developers and experts. Cloudera is your partner on the path to big data.
SAP HANA converges database and application platform capabilities in-memory to transform transactions, analytics, text analysis, predictive and spatial processing so businesses can operate in real-time.
Apache Spark is a fast and general engine for large-scale data processing. Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Write applications quickly in Java, Scala or Python. Combine SQL, streaming, and complex analytics.
Unlock the value in your big data with Enterprise Apache Hadoop. Hortonworks builds Apache Hadoop with the enterprise in mind, all tested and certified with real-world rigor in the world’s largest Hadoop clusters. Hortonworks has a world-class enterprise support and services organization with vast experience of the largest Hadoop deployments.
HP Vertica offers organizations new and faster ways to store, explore and serve more data. HP Vertica lets organizations store data in a cost-effectively, explore it quickly and leverage well-known SQL-based tools to get customer insights. By offering blazingly-fast speed, accuracy and security, it offers operational advantages to the entire organization.
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. You can start small for just $0.25 per hour with no commitments or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year, less than a tenth of most other data warehousing solutions.
The Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
on Live Enterprise
IBM Netezza appliances - expert integrated systems with built in expertise, integration by design and a simplified user experience. With simple deployment, out-of-the-box optimization, no tuning and minimal on-going maintenance, the IBM PureData System for Analytics has the industry’s fastest time-to-value and lowest total-cost-of-ownership.
Trifacta's Data Transformation Platform increases productivity up to 10x. Trifacta’s Data Profiling features provide immediate visibility into unique elements of the data set like data distributions and outliers to inform the transformation and analysis process. Trifacta’s automated detection of data types, value distribution and missing or inconsistent values provides the user with immediate cues to a dataset’s fit and trustworthiness.
Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage. Get faster insights without the overhead (data loading, schema creation and maintenance, transformations, etc.). Analyze the multi-structured and nested data in non-relational datastores directly without transforming or restricting the data
The MapR Distribution for Apache Hadoop provides organizations with an enterprise-grade distributed data platform to reliably store and process big data. MapR packages a broad set of Apache open source ecosystem projects enabling batch, interactive, or real-time applications. The data platform and the projects are all tied together through an advanced management console to monitor and manage the entire system.
HDInsight is a Hadoop distribution powered by the cloud. This means HDInsight was architected to handle any amount of data, scaling from terabytes to petabytes on demand. You can spin up any number of nodes at anytime. We charge only for the compute and storage you actually use.
IBM InfoSphere BigInsights brings the power of Hadoop to the enterprise. IBM makes it simpler to use Hadoop to get value out of big data and build big data applications. It enhances open source technology to withstand the demands of your enterprise, adding administrative, discovery, development, provisioning, security, and support, along with best-in-class analytical capabilities. The result is a more user-friendly solution for complex, large scale projects.
RainStor's enterprise-class database software offers big data management, archiving and data analysis at a lower total cost. RainStor enables the world’s largest enterprises manage critical data smarter, reducing the cost, complexity and compliance risk paramount to their success. RainStor’s Active Archive solutions create proven business value by being able to store limitless volumes of data, with high performance query access, in the most efficient way.
Tamr enables enterprises to use 100% of available data for business intelligence and analytics by unifying and enriching their vast reserves of valuable data. Tamr’s data unification platform catalogues, connects and curates hundreds or thousands of internal and external data sources through a combination of machine learning algorithms and human expert guidance – radically reducing the cost, time and effort of preparing data for analysis.
Paxata is a self-service data preparation solution designed to help everyone who deals with data eliminate the pain of combining, cleaning and shaping their data prior to analytics. Business analysts work within an Excel-like application that is easy to use, with visually interactive guidance that makes it easy to bring together data, find and fix dirty or missing data, and share and re-use data projects across teams. IT organizations and curators leverage the Paxata platform as a shared environment for delivering data, monitoring how data is prepared and used, and building dynamic governance that is aligned with the semantics of the business, not data systems.
Palantir builds software that connects data, technologies, humans and environments. Organizations have data. Lots of it. Structured data like log files, spreadsheets, and tables. Unstructured data like emails, documents, images, and videos. This data is typically stored in disconnected systems, where it is rapidly diversifying in type, exponentially increasing in volume, and becoming more difficult to use every day.
1010data provides a cloud-based platform for big data discovery and data sharing that delivers actionable, data-driven insights quickly and easily. 1010data offers a complete suite of products for big data discovery and data sharing for both business and technical users. Companies look to 1010data to help them become data-driven enterprises.
Qubole is a Big Data as a Service (BDaas) Platform Running on Leading Cloud Offerings Like AWS. Qubole enables you to utilize a variety of Cloud Databases and Sources, including S3, MySQL, Postgres, Oracle, RedShift, MongoDB, Vertica, Omniture, Google Analytics, and your on-premise data
Build, deploy, and run data processing pipelines that scale to solve your key business challenges. Google Cloud Dataflow enables reliable execution for large scale data processing scenarios such as ETL, analytics, real-time computation, and process orchestration.
Google Cloud Dataproc is a managed Hadoop MapReduce, Spark, Pig, and Hive service designed to easily and cost effectively process big datasets. You can quickly create managed clusters of any size and turn them off when you are finished, so you only pay for what you need. Cloud Dataproc is integrated across several Google Cloud Platform products, so you have access to a simple, powerful, and complete data processing platform.
Latest news about Big Data Analytics platforms
2019. HPE acquires big data platform MapR
Hewlett Packard Enterprises has acquired MapR Technologies, the distributor of a Hadoop-based data analytics platform. The deal includes MapR’s technology, intellectual property, and domain expertise in AI, machine learning, and analytics data management. The MapR portfolio will bolster HPE’s existing big data offerings, which includes the BlueData software it acquired in November. BlueData’s software delivers a container-based approach for spinning up and managing Hadoop, Spark, and other environments on bare metal, cloud, or hybrid platforms. The MapR platform provides a number of capabilities for running distributed applications. The software exposes storage APIs for e S3 API, to go along with APIs for HDFS, POISX, NFS, and Kafka.
2015. MapR tries to separate from Hadoop
MapR is one of several companies built on the open source Hadoop platform, and as such it has a bit of competition in the space. In an effort to create some separation from its better heeled rivals, it announced a new product called MapR Streams. This new product takes a constant stream of data like feeding consumer data to advertisers to create custom offers or distributing health data to medical professionals to tailor medication or treatment options — all of this in near real-time. Streams let customers share data sources with people or machines that need to make use of that information in a subscription-style model. A maintenance program could subscribe to the data coming from the shop floor of a manufacturer and learn about usage, production, bottlenecks and wear and tear, or IT could subscribe to a data stream with log information looking for anomalies that signal maintenance issues or a security breach.
2015. Google launched new managed Big Data service Cloud Dataproc
Google is adding another product in its range of big data services on the Google Cloud Platform - Cloud Dataproc service, that sits between managing the Spark data processing engine or Hadoop framework directly on virtual machines and a fully managed service like Cloud Dataflow, which lets you orchestrate your data pipelines on Google’s platform. Dataproc users will be able to spin up a Hadoop cluster in under 90 seconds — significantly faster than other services — and Google will only charge 1 cent per virtual CPU/hour in the cluster. That’s on top of the usual cost of running virtual machines and data storage, but you can add Google’s cheaper preemptible instances to your cluster to save a bit on compute costs. Billing is per-minute, with a 10-minute minimum. Because Dataproc can spin up clusters this fast, users will be able to set up ad-hoc clusters when needed and because it is managed, Google will handle the administration for them.
2015. Hortonworks acquired dataflow solutions developer Onyara
Hortonworks, a publicly traded company selling a commercial distribution of the Hadoop open-source big data software, announced today that it has acquired Onyara, an early-stage startup whose employees developed Apache NiFi, a piece of open-source software that was first used inside the National Security Agency (NSA). Apache NiFi allows to to deliver sensor data to the right systems and keep track of what was happening to the data. Hortonworks, which itself spun out of Yahoo, has previously acquired XA Secure and SequenceIQ. Now Hortonworks will be selling a new subscription based on the Apache NiFi software, under the name Hortonworks DataFlow.