Jiantao Bao, Zhao Li, Xuerong Zhang, Yingbin Deng, Xiaofang Li, Xiaoyan Peng, Renrong Chen, Yiwen Jia, Tong Li, Yan Deng, Ji Yang, Xiwen Wu
Accepted: 2025-03-27
With rapid societal development, economic growth, industry, agriculture, and human activities, large amounts of sewage rich in nitrogen, phosphorus, and other nutrients are produced. This causes eutrophication in lakes and reservoirs, leading to frequent algal blooms that pose a serious threat to aquatic ecosystem stability and drinking water safety. Chlorophyll-a (Chl-a) concentration, a core indicator of algal biomass and the degree of eutrophication in water bodies, correlates positively with algal bloom outbreaks. However, Chl-a concentration variance under the influence of multiple factors displays highly nonlinear characteristics. Traditional prediction models are unsuitable for relationship analyses between environmental factors, generally producing low prediction accuracy with weak applicability. To address this challenge, we proposed a short-term chlorophyll a concentration prediction method based on multi-scene segmentation, and constructed a prediction model with enhanced adaptability by recognizing the characteristic laws of different environmental scenarios to improve the accuracy of Chl-a concentration prediction. By analyzing the laws influencing various factors, we proposed three partitioning strategies: 1) factor interaction and scene partitioning by analyzing connections between key environmental factors and adopting K-means method to partition these scenes, 2) diurnal difference scene partitioning based on diurnal cyclicity of algal physiological activities, dividing the data into two scenarios, and 3) trophic state scenarios, based on Trophic Level Index. A water body is divided into three trophic categories: anaerobic, mesotrophic, and eutrophic. Three machine learning models (Random Forest, Gradient Boosting Decison Tree, and eXtreme Gradient Boosting) using multi-scenario classification and a linear regression model were each adapted to perform short-term prediction of Chl-a concentration. The multi-scenario partitioning strategy proposed in this study substantially optimized model prediction performance. Factor interaction scenario partitioning yielded the best prediction results, with an overall RMSE(Root mean square error) prediction average of 0.0045, improving prediction accuracy by 4.26% compared to that of the unpartitioned scenarios. Overall improvement in prediction accuracy from diurnal and nocturnal scenario partitioning was limited. Its overall RMSE prediction average was 0.00474, improving prediction accuracy by 0.9%. In eutrophic scenario, the four prediction models (RF, GBDT, XGBoost, and linear regression) exhibited respective RMSEs of 0.0034, 0.0036, 0.0035, and 0.0039, with RF model giving the highest prediction accuracy. In summary, we propose an innovative short-term prediction model to improve the accuracy of low Chl-a concentration predictions in complex situations, providing a new paradigm for intelligent modeling and precise governance. The data obtained using a multi-scenario delineation system revealed the dynamic coupling mechanisms between the diurnal biological rhythms in a water body, nutrient grading, and the interaction of water quality factors. The idea of this study effectively solves the problem that the traditional prediction model can not meet the current prediction needs. This study can provide a systematic analysis for the prediction of cyanobacterial bloom, and can provide technical reference and theoretical support for the multi scenario prediction of Chl-a concentration.This study not only clarified the driving law of Chla concentration change under different scenarios, but also promoted the transformation of simplifying the analysis of complex problems of water environment, providing a new perspective for the mechanism research of complex water environment system.