Top 10 Big Data Analytics platforms
Last updated: August 27, 2017
Big Data Analytics platforms allow to manage and analyse data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.
See also: Cloud Database Engines
See also: Cloud Database Engines
SAP HANA converges database and application platform capabilities in-memory to transform transactions, analytics, text analysis, predictive and spatial processing so businesses can operate in real-time.
Teradata Aster features Teradata Aster SQL-GR analytic engine which is a native graph processing engine for Graph Analysis across big data sets. Using this next generation analytic engine, organizations can easily solve complex business problems such as social network/influencer analysis, fraud detection, supply chain management, network analysis and threat detection, and money laundering.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Cloudera helps you become information-driven by leveraging the best of the open source community with the enterprise capabilities you need to succeed with Apache Hadoop in your organization. Designed specifically for mission-critical environments, Cloudera Enterprise includes CDH, the world’s most popular open source Hadoop-based platform, as well as advanced system management and data management tools plus dedicated support and community advocacy from our world-class team of Hadoop developers and experts. Cloudera is your partner on the path to big data.
Unlock the value in your big data with Enterprise Apache Hadoop. Hortonworks builds Apache Hadoop with the enterprise in mind, all tested and certified with real-world rigor in the world’s largest Hadoop clusters. Hortonworks has a world-class enterprise support and services organization with vast experience of the largest Hadoop deployments.
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. You can start small for just $0.25 per hour with no commitments or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year, less than a tenth of most other data warehousing solutions.
HP Vertica offers organizations new and faster ways to store, explore and serve more data. HP Vertica lets organizations store data in a cost-effectively, explore it quickly and leverage well-known SQL-based tools to get customer insights. By offering blazingly-fast speed, accuracy and security, it offers operational advantages to the entire organization.
IBM Netezza appliances - expert integrated systems with built in expertise, integration by design and a simplified user experience. With simple deployment, out-of-the-box optimization, no tuning and minimal on-going maintenance, the IBM PureData System for Analytics has the industry’s fastest time-to-value and lowest total-cost-of-ownership.
HDInsight is a Hadoop distribution powered by the cloud. This means HDInsight was architected to handle any amount of data, scaling from terabytes to petabytes on demand. You can spin up any number of nodes at anytime. We charge only for the compute and storage you actually use.
Trifacta's Data Transformation Platform increases productivity up to 10x. Trifacta’s Data Profiling features provide immediate visibility into unique elements of the data set like data distributions and outliers to inform the transformation and analysis process. Trifacta’s automated detection of data types, value distribution and missing or inconsistent values provides the user with immediate cues to a dataset’s fit and trustworthiness.
The Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
The MapR Distribution for Apache Hadoop provides organizations with an enterprise-grade distributed data platform to reliably store and process big data. MapR packages a broad set of Apache open source ecosystem projects enabling batch, interactive, or real-time applications. The data platform and the projects are all tied together through an advanced management console to monitor and manage the entire system.
IBM InfoSphere BigInsights brings the power of Hadoop to the enterprise. IBM makes it simpler to use Hadoop to get value out of big data and build big data applications. It enhances open source technology to withstand the demands of your enterprise, adding administrative, discovery, development, provisioning, security, and support, along with best-in-class analytical capabilities. The result is a more user-friendly solution for complex, large scale projects.
Apache Spark is a fast and general engine for large-scale data processing. Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Write applications quickly in Java, Scala or Python. Combine SQL, streaming, and complex analytics.
Palantir builds software that connects data, technologies, humans and environments. Organizations have data. Lots of it. Structured data like log files, spreadsheets, and tables. Unstructured data like emails, documents, images, and videos. This data is typically stored in disconnected systems, where it is rapidly diversifying in type, exponentially increasing in volume, and becoming more difficult to use every day.
RainStor's enterprise-class database software offers big data management, archiving and data analysis at a lower total cost. RainStor enables the world’s largest enterprises manage critical data smarter, reducing the cost, complexity and compliance risk paramount to their success. RainStor’s Active Archive solutions create proven business value by being able to store limitless volumes of data, with high performance query access, in the most efficient way.
Qubole is a Big Data as a Service (BDaas) Platform Running on Leading Cloud Offerings Like AWS. Qubole enables you to utilize a variety of Cloud Databases and Sources, including S3, MySQL, Postgres, Oracle, RedShift, MongoDB, Vertica, Omniture, Google Analytics, and your on-premise data
Paxata is a self-service data preparation solution designed to help everyone who deals with data eliminate the pain of combining, cleaning and shaping their data prior to analytics. Business analysts work within an Excel-like application that is easy to use, with visually interactive guidance that makes it easy to bring together data, find and fix dirty or missing data, and share and re-use data projects across teams. IT organizations and curators leverage the Paxata platform as a shared environment for delivering data, monitoring how data is prepared and used, and building dynamic governance that is aligned with the semantics of the business, not data systems.
Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage. Get faster insights without the overhead (data loading, schema creation and maintenance, transformations, etc.). Analyze the multi-structured and nested data in non-relational datastores directly without transforming or restricting the data
Google Cloud Dataproc is a managed Hadoop MapReduce, Spark, Pig, and Hive service designed to easily and cost effectively process big datasets. You can quickly create managed clusters of any size and turn them off when you are finished, so you only pay for what you need. Cloud Dataproc is integrated across several Google Cloud Platform products, so you have access to a simple, powerful, and complete data processing platform.
Tamr enables enterprises to use 100% of available data for business intelligence and analytics by unifying and enriching their vast reserves of valuable data. Tamr’s data unification platform catalogues, connects and curates hundreds or thousands of internal and external data sources through a combination of machine learning algorithms and human expert guidance – radically reducing the cost, time and effort of preparing data for analysis.
1010data provides a cloud-based platform for big data discovery and data sharing that delivers actionable, data-driven insights quickly and easily. 1010data offers a complete suite of products for big data discovery and data sharing for both business and technical users. Companies look to 1010data to help them become data-driven enterprises.
Build, deploy, and run data processing pipelines that scale to solve your key business challenges. Google Cloud Dataflow enables reliable execution for large scale data processing scenarios such as ETL, analytics, real-time computation, and process orchestration.