Google Cloud Dataproc is #20 in Top 23 Big Data platforms

Last updated: January 02, 2020
Google Cloud Dataproc is a managed Hadoop MapReduce, Spark, Pig, and Hive service designed to easily and cost effectively process big datasets. You can quickly create managed clusters of any size and turn them off when you are finished, so you only pay for what you need. Cloud Dataproc is integrated across several Google Cloud Platform products, so you have access to a simple, powerful, and complete data processing platform.

Positions in ratings

#20 in Top 23 Big Data platforms


The best alternative to Google Cloud Dataproc is Google Cloud Dataflow

Latest news about Google Cloud Dataproc

2015. Google launched new managed Big Data service Cloud Dataproc

Google is adding another product in its range of big data services on the Google Cloud Platform - Cloud Dataproc service, that sits between managing the Spark data processing engine or Hadoop framework directly on virtual machines and a fully managed service like Cloud Dataflow, which lets you orchestrate your data pipelines on Google’s platform. Dataproc users will be able to spin up a Hadoop cluster in under 90 seconds — significantly faster than other services — and Google will only charge 1 cent per virtual CPU/hour in the cluster. That’s on top of the usual cost of running virtual machines and data storage, but you can add Google’s cheaper preemptible instances to your cluster to save a bit on compute costs. Billing is per-minute, with a 10-minute minimum. Because Dataproc can spin up clusters this fast, users will be able to set up ad-hoc clusters when needed and because it is managed, Google will handle the administration for them.