Article Title
Chlorophyll a Predictions in a Piedmont Lake in Upstate South Carolina Using Machine-Learning Approaches
Abstract
Freshwater systems are often breeding grounds for harmful algal blooms (HABs), although they are more dominant in ponds and lakes due to the prevailing conditions in those bodies of water. Therefore, the monitoring, modeling, and management of HABs requires knowledge of the complex interrelationship between factors that influence HABs and their detrimental effect on the ecosystem. High concentrations of chlorophyll a are often used to measure algal blooms in bodies of water. Generally, water samples are collected from the field and the concentration of chlorophyll a is measured in a laboratory and compared to water quality standards in order to indicate the potential presence or absence of an algal bloom. While numerical water quality models can help answer some of the critical environmental conditions that affect HABs and their effective management, due to sensor technologies numerous model inputs, the uncertainty in model predictions, and the complexity of HABs ecosystems encourage the application of newly rising data-driven models. The current study utilized high-frequency water quality data and investigated machine-learning algorithms (random forest (RF) and artificial neural network (ANN)) to predict chlorophyll a concentrations in Boyd Millpond, a lake in Upstate South Carolina. The model performances were compared using root mean square error (RMSE), coefficient of determination (R2), and correlation coefficient. The water quality parameters used as inputs were pH, specific conductivity, dissolved oxygen, saturated dissolved oxygen, temperature, oxidation-reduction potential (ORP), and turbidity, while chlorophyll a was selected as the target variable. The results from this study showed that RF performed better than ANN. The error metrics observed using all parameters as input were RMSE, R2,and correlation with values 0.00013, 0.86, and 0.93, respectively, when testing the RF model and 0.00025, 0.74, and 0.86, respectively, during the testing stage of the ANN model. The Least Absolute Shrinkage and Selection Operator (LASSO) was used for variable selection and identified pH and specific conductivity as essential parameters. The broader outcome of this research, upon further field validation, will enable the timely detection of HABs with chlorophyll as a signal to instigate further tests and early warning for recreational activities and livestock protection and initiate countermeasures to safeguard the lives of aquatic organisms.