Design and Implementation of a Real Time Data Platform for the Stock Market Integrating Anomaly Detection and MLOps
Từ khóa:
Apache Flink, Apache Kafka, Bayesian Online Change Point Detection, ClickHouse, Data Engineering, real time data, MLOps, ScyllaDB, stock marketTóm tắt
This paper presents the design and implementation of a real time data platform for stock market applications, built on modern data engineering principles and machine learning lifecycle management. In financial markets, data is generated continuously, changes rapidly, and requires low latency processing. Traditional batch processing systems therefore face limitations in providing timely information for market monitoring and analysis. The proposed architecture follows an event driven design, in which stock price data is collected from external sources, transmitted through Apache Kafka, processed by Apache Flink and dedicated workers, and then stored in ClickHouse and ScyllaDB to support both analytical queries and low-latency access. In addition to the stock price data flow, the system integrates financial news collection, sentiment analysis, anomaly detection using Bayesian Online Change Point Detection, and a Whale ML model for forecasting trends in subsequent trading sessions after significant market movements. The processes of model training, experiment tracking, model version management, and inference serving are standardized according to MLOps practices through Airflow, MLflow, and MinIO. Experimental deployment results show that the system can process near real time data, provide APIs and visual dashboards, detect abnormal signals, and support predictive model operation within a unified architecture.