The Future of Data Integration cover

The Future of Data Integration

Hassan Syyid profile image

by Hassan Syyid

Feb 12th 2021

Cloud computing, big data, machine learning, data lakes, data warehouses — no doubt, if you’ve been following the tech world you’ve heard these buzz words. These trends and the resulting technologies have changed the world and are continuing to unearth new opportunities for innovation.

​If you looked at the face of data integration 15 years ago when Talend, now a behemoth in the space, launched Talend Open Studio the words that came to mind were “drag and drop” interface, SQL-based, on premise, and Windows native. Since then, things have changed dramatically.

“We have observed an industry transition to cloud-based technologies and a decrease in on-premise big data application adoption … our future success depends on the growth and expansion of such market and our ability to adapt and respond effectively to [it].“ — Talend, 2020

cover.png

A new era

The tools that are disrupting legacy integration solutions from the likes of Talend and Informatica are very different. For starters, they’re cloud-based rather than on-premise, web apps rather than desktop software, benefit from robust transformation tools like dbt, and leverage the capacity of data warehouses and data lakes like Snowflake to consolidate more data than ever before.

​The reason these new tools are so attractive is largely due to how spread out data is becoming. New SaaS platforms to help businesses manage leads, sales, invoicing, billing, advertising, investing, user analytics, and more are growing at a rapid pace. The modern business analyst is tasked with consolidating this data efficiently and drawing useful insights that influence business decisions — and these tools deliver.

A brief overview

To give you some background, let’s discuss some of the modern players in the data integration space.

Brief Overview Cover

StitchData

StitchData created a platform that is focused solely on moving data from these SaaS platforms into data warehouses. The company was eventually acquired by Talend in 2018, as part of Talend’s greater efforts to penetrate the new cloud-based market.

The genius of Stitch was appealing to both analysts and developers. They started an open-source initiative called Singer which introduced a standard spec for building taps in Python (connectors to different platforms like CRMs, ERPs, and more) and targets. In recent years, the Python stack has become known for its use with machine learning and data cleansing (for instance, packages like Pandas, PySpark, Dask, and more) so it makes sense that the taps designed to obtain the data use the same stack and appeal to the same developers.

The idea was that everyone can benefit from well-maintained taps to all these platforms. After all, what single organization wants to devote the manpower to maintain hundreds of these taps as their APIs and schemas change?

​For their enterprise offering, Stitch offers a nice web interface and an API that enables business analysts and developers alike to leverage these taps through Stitch’s infrastructure.

Fivetran and Xplenty

Tools like Fivetran and Xplenty take a slightly different approach, and hone in on the business analyst market. Both platforms boast hundreds of pre-built, proprietary connectors maintained by their team, a robust transformation layer, and highly scalable infrastructure to sync data from these connectors to your data warehouse.

These platforms appeal to the technical business analyst who needs to consolidate data and draw insights. Fivetran emphasizes pre-normalized data with the added benefit of in-warehouse, SQL-based transformations via dbt. While Xplenty offers a UI based transformation system reminiscent of Talend Data Studio, but with the scalability possible from cloud computing.

Meltano

Unfortunately, after Talend’s acquisition of Stitch the Singer project lost direction and many taps fell out of maintenance. Luckily, GitLab funded the Meltano open source project which aims to pick up where Singer left off. The project aims to provide analysts with the tools necessary to host, create, and run data integration pipelines on their own. They agree with Singer’s initial mission and are working on SDKs to make creating high-quality Singer taps easier.

Unlike Stitch, they want to fully realize the idea of decentralized open-source maintained taps that every organization can utilize and contribute to. They aim to find open source maintainers outside of the Meltano staff who are using their Singer taps have a motivation to keep them maintained. They already have several consultancies and developers who have stepped up to keep these taps working for the whole community.

Airbyte

Airbyte is building another fast-growing open source platform to solve the issues in data integration. Much like Meltano, their aim is to commoditize data integration and offer a self-hosted alternative to tools like Fivetran. The company is focused on expanding their open-source platform and community, and aims to become a new standard in the market.

A different perspective: what about the SaaS platforms?

Interestingly, the same problems these business analysts face while trying to consolidate their business data from all the CRMs, ERPs, and billing platforms are faced by the developers behind these SaaS platforms.

SaaS Cover Image

In fact, these SaaS platforms have begun to differentiate their own products from competitors by the integrations they support. For example, an accounting reporting platform that supports importing your invoices directly from Quickbooks is more likely to gain customers than one which requires manually uploading invoices as CSV files.

Hold on! We just talked about all these different tools that help analysts create pipelines for their data using all these fancy new tech trends. Can’t those help these developers too?

To an extent, yes. In fact, the APIs behind products like Stitch and Airbyte are offered as solutions to these developers. The problem is that developers often need more control over integrations than analysts do, so they often end up building and maintaining these integration pipelines themselves.

To us, that seemed like a waste of all the new infrastructure and tooling that’s already helping business analysts. Why not leverage the same software to help these developers too? That’s why we started hotglue — a lightweight tool focused on helping developers build integrations for their SaaS platforms that leverage these new trends.

Conclusion

The data integration space is massive. There are many different reasons to consolidate your business data, and still more tools to help you do it. One thing is clear though — the best tools are leveraging the latest data engineering trends and appealing to an increasingly technical audience.

It seems that the future of data integration is a market with many more niche players, and a renewed reliance on the open source community to scale with all the new SaaS platforms and use cases.

Thanks for reading!