A Financial Distributed Database based on Lakehouse Technology

SequoiaDB - Service Scenes of Real-Time Data Lake

With the rapid development of the mobile Internet, the application scenarios of the financial and banking industry are diversifying constantly, and the massive data explosion has also brought growing diversity to data types. With technical features like Multi-modelity and flexible scaling, data lakes have become enterprises’ new choice for digital infrastructure construction. However, data lakes are not capable of high concurrent Real-time processing. How to provide high concurrent Real-time online processing for full data has become the focus of enterprises.

In response, the concept of Real-Time Data Lake came into being. Compared with the "Schema-on-read" method of the data lake, the Real-Time Data Lake can directly provide high concurrent SQL access, especially the ability to build indexes on structured and semi-structured data to support terminal-oriented services, to support massive real-time concurrent services. Compared with traditional ODS, Real-Time Data Lake are better in terms of elastic scaling and concurrency and are more suitable for enterprises in the digitized industries that need to directly provide full data access to customers.

The Real-Time Data Lake constructed on the distributed database of SequoiaDB can effectively integrate the features of the traditional on-source layer (ODS) and the data lake. Apart from data flexibility, the Real-Time Data Lake has better support for concurrency and horizontal scaling. It can simultaneously handle Real-time SQL queries under over 10,000 concurrent connections to trillions of data, helping enterprises build historical data platforms, full data platforms, Real-time data middle platforms, etc., and enable various service scenes involving Real-time massive data services.

Technical Features

Engine-Level Multi-Modelity

Support cross-structured and semi-structured multi-modal data processing to help enterprises reduce migration risks, reduce the learning cost of R&D personnel, and improve migration efficiency.

Real-Time Data Query

The distributed architecture supports full data streaming into the data lake with the latency time between the application and the access restricted to within seconds and conducts 24-7 integrated Real-time data processing for various businesses.

Online Elastic Scaling

Provide hundreds of petabytes of storage capacity, and support online horizontal elastic scaling to easily manage the explosive growth of data and data application scenes of different scales and types.

Architecture

The architecture is built on SequoiaDB.

SequoiaDB with a Municipal Commercial Bank
Background

With the development of the Internet and the rise of new business forms such as online banking and mobile banking, the demands for data management and applications in the banking industry tend to be more diversified and more complicated. To build a comprehensive management system for the overall data assets, banks need to continuously enrich and improve the services of the data middle platform to explore the application value of data and empower financial services through data products and services.

At present, the bank has an internal big data platform based on Hadoop, which is mainly used to carry out functions such as internal massive data analysis and online queries for some business systems. However, with the launching of the new banking core system and the new credit card core system, the big data platform cannot fully undertake the query services of the core system as the data size of the core system is expanding dramatically.

Therefore, the bank urgently needs a unified high-performance data platform to meet the new business requirements, and at the same time the pain points in an external online query of the bank's full data need to be solved:

Construction Logic

The historical data platform will provide massive data storage and high concurrent data access for various business systems within the bank. Each business system at the source side can migrate its historical data to the platform through application and configuration. Then the platform can provide efficient data access for each business system.

Data synchronization is conducted through a data exchange platform to migrate data from the data warehouse or a peripheral system to the historical data platform. Data synchronization will be conducted in a unified and flexible manner to support T+1 synchronization. Each business system can adopt different synchronization methods based on its own needs for data synchronization.

Construction Results
Ready to get started with SequoiaDB?