Apache Spark vs MapR
Last updated: August 05, 2019
Apache Spark is a fast and general engine for large-scale data processing. Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Write applications quickly in Java, Scala or Python. Combine SQL, streaming, and complex analytics.
The MapR Distribution for Apache Hadoop provides organizations with an enterprise-grade distributed data platform to reliably store and process big data. MapR packages a broad set of Apache open source ecosystem projects enabling batch, interactive, or real-time applications. The data platform and the projects are all tied together through an advanced management console to monitor and manage the entire system.
Apache Spark vs MapR in our news:
2019 - HPE acquires big data platform MapR
Hewlett Packard Enterprises has acquired MapR Technologies, the distributor of a Hadoop-based data analytics platform. The deal includes MapR’s technology, intellectual property, and domain expertise in AI, machine learning, and analytics data management. The MapR portfolio will bolster HPE’s existing big data offerings, which includes the BlueData software it acquired in November. BlueData’s software delivers a container-based approach for spinning up and managing Hadoop, Spark, and other environments on bare metal, cloud, or hybrid platforms. The MapR platform provides a number of capabilities for running distributed applications. The software exposes storage APIs for e S3 API, to go along with APIs for HDFS, POISX, NFS, and Kafka.
2015 - MapR tries to separate from Hadoop - a new advantage over Apache Cassandra
MapR is one of several companies built on the open source Hadoop platform, and as such it has a bit of competition in the space. In an effort to create some separation from its better heeled rivals, it announced a new product called MapR Streams. This new product takes a constant stream of data like feeding consumer data to advertisers to create custom offers or distributing health data to medical professionals to tailor medication or treatment options — all of this in near real-time. Streams let customers share data sources with people or machines that need to make use of that information in a subscription-style model. A maintenance program could subscribe to the data coming from the shop floor of a manufacturer and learn about usage, production, bottlenecks and wear and tear, or IT could subscribe to a data stream with log information looking for anomalies that signal maintenance issues or a security breach.
2015 - IBM bets on big data Apache Spark project
IBM has announced that it would devote 3500 researchers to the open source big data project Apache Spark. It also announced that it was open sourcing its own IBM SystemML machine learning technology in a move designed to help push it to the forefront of big data and machine learning. These two technologies are part of the IBM transformation strategy that includes cloud, big data, analytics and security as its pillars. As part of today’s announcement, IBM has pledged to build Spark into the core of its analytics products and will work with Databricks, the commercial entity created to support the open source Spark project. IBM isn’t just giving all of these resources away out of largesse. It wants to be a part of this community because it sees these tools as the foundation for big data moving forward. If it can show itself to be a committed member to the open source project, it gives it clout with companies who are working on big data and machine learning projects using open source tools — and that opens the door to consulting services and other business opportunities for Big Blue.
2015 - MapR adds Apache Drill to its Hadoop distribution
MapR announced that its Hadoop distribution now ships with Apache Drill - an open source, low latency SQL query engine for Hadoop and NoSQL. Its promise is that it makes it easier for end users to interact with data from both legacy transactional systems and new data sources, such as Internet of Things (IoT) sensors, web click-streams and other semi-structured data, along with support for popular business intelligence (BI) and data visualization tools. Apache Drill 1.0, which is now included in MapR’s distro, is free for the taking. So should a competitor, like Hortonworks, who has at least one contributor on the project, find it extremely valuable, they can engineer it into their distro as well.
2015 - MapR revamps its Hadoop platform with more real-time analytics. Beware Cloudera
The latest release of MapR enterprise-grade distributed Hadoop data platform is built for the real time, data-centric enterprise. It leverages table replication features designed to extend access to “big and fast” data enabling multiple instances to be updated in different locations, with all the changes synchronized across them. Reacting to business as it happens with the right offer is a must. Wrong offers are not only missed opportunities but put enough of them together and they could threaten a company’s viability. That’s one of the reasons why some enterprises are ditching their RDBMS and going with MapR. It offers both a top-rated NoSQL database and Hadoop in nicely bundled solution. MapR, unlike its competitors Hortonworks and Cloudera, is a software company whose aim is to make big data plug and play.
2015 - Google partners with Cloudera to bring Cloud Dataflow to Apache Spark
Google announced that it has teamed up with the Hadoop specialists at Cloudera to bring its Cloud Dataflow programming model to Apache’s Spark data processing engine. With Google Cloud Dataflow, developers can create and monitor data processing pipelines without having to worry about the underlying data processing cluster. As Google likes to stress, the service evolved out of the company’s internal tools for processing large datasets at Internet scale. Not all data processing tasks are the same, though, and sometimes you may want to run a task in the cloud or on premise or on different processing engines. With Cloud Dataflow — in its ideal state — data analysts will be able use the same system for creating their pipelines, no matter the underlying architecture they want to run them on.
2014 - MapR partners with Teradata to reach enterprise customers
The last independent Hadoop provider MapR and big data analytics provider Teradata announced that they will work together to integrate and co-develop their joint products and to create a unified go to market strategy. Teradata will also be able to resell MapR software, professional services, and provide customer support. In other words, Teradata will be the face of MapR to enterprises who use, or want to use, both technologies. Until recently Teradata partnered most closely with Hortonworks, but now it’s sharing love and its analytic market leadership with all three providers. Similarly, earlier this week, HP announced Vertica for SQL on Hadoop, which allows users to access and explore data residing in any of the three primary Hadoop distros — Hortonworks, MapR, Cloudera.