We’re super excited to embrace DuckDB at Omni. The analytical data world has been taken by storm by DuckDB. Data Twitter is excited about the wide number of interesting applications. We are particularly happy about leveraging DuckDB as a query execution engine in the context of Business Intelligence.
DuckDB is a new open source OLAP database that runs in-process similar to SQLite (the most widely deployed OLTP database). By running embedded, it enables very easy packaging and integration. It’s blowing away comparable implementations for performance and in general pushing the boundaries of modern database technology.
What makes DuckDB so appealing for Business Intelligence and analytics?
⚡ Cost Performance: DuckDB offers best in class cost performance for certain analytical workloads. For smaller (and it’s all relative) datasets and workloads, DuckDB will outperform the scale out cloud native data warehouses like Snowflake, BigQuery and Databricks. As a simple demonstration, there have been prototypes of running DuckDB against S3 parquet files where single nodes can outperform 32 node Spark clusters. Similar performance benchmarks have demonstrated how fast it can be in advanced analytical context like data science in a python or R notebook. DuckDB achieves this incredible performance through its state of the art columnar-vectorized query execution engine.
🏹 Integration: DuckDB has a native integration with Arrow, an in-memory data format optimized for analytical libraries, which when applied to Business Intelligence use cases can drive significant performance. Zero copy means faster analytics and with a smaller memory footprint.
In combination with DuckDB’s integration with Apache Arrow, one direct impact of running in process is that it’s possible to compile with WASM, deploy and run DuckDB in modern web browsers (find an example here DuckDB WASM). DuckDB in combination with Omni allows for performance like that of legacy desktop applications built on data extracts. For example, DuckDB uses ART indexes like Tableau’s proprietary Hyper engine.
By integrating DuckDB into Omni’s platform, we can use the underlying semantic data model to power a requery-able cache. For queries that only require data from the previously cached dataset, e.g. filtering down a result set, re-aggregating metrics at a higher level, or even simply resort, can leverage DuckDB’s incredible performance without sacrificing accuracy of data. Omni’s platform allows for the performance and ease of workbook analytics with the scale and power of modern cloud native data warehouses.
❤️ 🦆 In short, DuckDB represents a powerful new analytical engine that can be leveraged in a variety of contexts but has a direct application to power performant BI. We’re excited to embrace and support DuckDB!
Not to mention, I absolutely love the delightful improvements to the, nearly 50 year old, SQL language.
Let us know if you’d like to give it a try! We look forward to your feedback.