Financial services organizations are increasingly adopting Delta Parquet, an open-source, column-oriented file format designed to significantly cut costs and processing time related to large datasets utilized in analytics, artificial intelligence (AI), and machine learning (ML) workflows.
Building on the established Parquet file format, Delta Parquet enhances its columnar storage functionality with features such as transaction logs, schema enforcement, and performance optimizations. This combination imbues the data layer with database-like intelligence, making the management of extensive financial datasets faster and more economical.
Recently, LSEG Data & Analytics explored the technology’s impact on innovation and efficiency. According to their findings, the potential benefits are substantial.
A typical 1TB CSV file can be reduced to approximately 130GB when converted to Delta Parquet, achieving an 87% decrease in size. Query execution times for the same dataset drop dramatically from 236 seconds to just 6.78 seconds, reflecting an improvement of 34 times. More impressively, the cost of computing per query declines from $5.75 to a mere $0.01, amounting to a reduction of 99.7%.
These efficiencies arise from several integrated technical features. The columnar storage approach organizes data by columns instead of rows, optimizing analytical queries that often require only a subset of available columns. Compression algorithms further reduce file sizes by applying uniform data compression.
The format also allows for schema evolution, enabling users to add, remove, or modify columns without the need to rewrite entire datasets—an essential capability in rapidly changing data environments. Features such as row group statistics permit query engines to bypass irrelevant data segments when filters are applied, and its structure supports parallel processing across distributed systems like Apache Spark and Hadoop.
Delta Parquet is adaptable and can function across various platforms and programming languages, ensuring seamless integration across multiple systems without vendor lock-in. The London Stock Exchange Group (LSEG) has made various flagship data products available in Delta Parquet format via AWS, including Quantitative Analytics and Tick History data.
Organizations utilizing these datasets gain access to granular point-in-time and tick-level data suitable for backtesting, research, advanced transaction cost analysis, and compliance with regulations, including the Fundamental Review of the Trading Book (FRTB).
For a comprehensive discussion on this technology, read the full story here.
