Data Pipeline software

Updated: October 12, 2021

2021. Orchest raises $3.5M to provide a simpler way to build data pipelines



Orchest, the company is building an open source integrated development environment tool for data scientists so they can develop, iterate and deploy data pipelines without having to rely on an infrastructure or engineering team, has raised a $3.5 million seed round. Orchest makes that type of workflow more autonomous so data scientists don’t have to solve those technology issues themselves, but can go from an initial idea to deployment in the same environment, he added.


2021. Cribl raises $200M to help enterprises do more with their data



Cribl, the company developing an “open ecosystem of data” for enterprises that utilizes unified data pipelines, called “observability pipelines,” to parse and route any type of data that flows through a corporate IT system, has raised a $200 million round of Series C. Cribl users can then choose their own analytics tools and storage destinations like Splunk, Datadog and Exabeam, but without becoming dependent on a vendor. Cribl also enables users to choose how they want to store their data, which is different from competitors that often lock companies into using only their products. Instead, customers can buy the best products from different categories and they will all talk to each other through Cribl


2021. Meroxa raises $15M for its real-time data platform



Meroxa, a startup that makes it easier for businesses to build the data pipelines to power both their analytics and operational workflows, has raised a $15 million Series A. The promise of Meroxa is that businesses can use a single platform for their various data needs and won’t need a team of experts to build their infrastructure and then manage it. At its core, Meroxa provides a single software-as-a-service solution that connects relational databases to data warehouses and then helps businesses operationalize that data.


2021. No-code business intelligence service y42 raises $2.9M



Berlin-based y42, a data warehouse-centric business intelligence service that promises to give businesses access to an enterprise-level data stack that’s as simple to use as a spreadsheet, today announced that it has raised a $2.9 million seed funding. The service, which was founded in 2020, integrates with more than 100 data sources, covering all the standard B2B SaaS tools, from Airtable to Shopify and Zendesk, as well as database services like Google’s BigQuery. Users can then transform and visualize this data, orchestrate their data pipelines and trigger automated workflows based on this data.


2021. Iteratively raises $5.4M to help companies build data pipelines they can trust



As companies gather more data, ensuring that they can trust the quality of that data is becoming increasingly important. An analytics pipeline is only as good as the data it collects, after all, and messy data — or outright bugs — can easily lead to issues further down the line. Startup Iteratively, that wants to help businesses build data pipelines they can trust, has raised $5.4 million seed funding. Iteratively focuses on event streaming data for product and marketing analytics — the kind of data that typically flows into a Mixpanel, Amplitude or Segment. Iteratively itself sits at the origin of the data, say an app, and then validates the data and routes it to whatever third-party solution a company may use.


2020. Avo raises $3M for its analytics governance platform



Avo, a startup that helps businesses better manage their data quality across teams, has raised a $3 million seed round. Avo gives developers, data scientists and product managers a shared workspace to develop and optimize their data pipelines. Good product analytics is the product of collaboration between these cross-functional groups of stakeholders, and the goal of Avo is to give these groups a platform for their analytics planning and governance — and to set company-wide standards for how they create their analytics events.


2020. Data dashboard startup Count raises $2.4M



Count, a startup that is an attempt to create an all-in-one data platform, providing early-stage teams with tools to build data pipelines more cheaply, has raised $2.4m. Count gives a way to pull all data together and build reports for the whole team. Its notebooks are a powerful way to share insights in context and give the team the ability to query data without having to learn SQL. Count competes with a number of different solutions including Data warehouses such as Snowflake; Data cleaning tools like DBT; and analytics platforms like Looker.


2015. Google launched new managed Big Data service Cloud Dataproc



Google is adding another product in its range of big data services on the Google Cloud Platform - Cloud Dataproc service, that sits between managing the Spark data processing engine or Hadoop framework directly on virtual machines and a fully managed service like Cloud Dataflow, which lets you orchestrate your data pipelines on Google’s platform. Dataproc users will be able to spin up a Hadoop cluster in under 90 seconds — significantly faster than other services — and Google will only charge 1 cent per virtual CPU/hour in the cluster. That’s on top of the usual cost of running virtual machines and data storage, but you can add Google’s cheaper preemptible instances to your cluster to save a bit on compute costs. Billing is per-minute, with a 10-minute minimum. Because Dataproc can spin up clusters this fast, users will be able to set up ad-hoc clusters when needed and because it is managed, Google will handle the administration for them.