Date of Award

12-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Bioengineering

Committee Chair/Advisor

Reed Gurchiek

Committee Member

Jason Avedesian

Committee Member

John DesJardins

Committee Member

Lea Jenkins

Abstract

Ball tracking systems are becoming ubiquitous in sport, creating an unprecedented opportunity for big data applications to optimize human health and performance. These applications are especially common in baseball, a sport known for analyzing ball flight data to quantify performance. Analysts routinely use ball flight data to identify the attributes of top performing pitchers, finding that the best pitchers throw with optimal combinations of release speed and spin to precise locations. However, for certain pitchers, the throwing motion required to produce optimal ball flight places exceedingly high biomechanical load on the elbow, and consequently injury rates continue to rise. This dissertation attempts to address this issue by leveraging ball tracking release metrics as features in three machine learning models corresponding to three specific aims. Each aim provides a contribution to research in data science, biomechanics, and injury prevention research in baseball pitching.

In the first aim, a multi-output deep neural network was developed to predict final pitch location using ball tracking release metrics and contextual ball flight information (i.e., projectile motion predictions), using over two million pitches thrown during National Collegiate Athletic Association Division I games. Predictions from a deep neural network (DNN) were compared to previously reported machine learning models, and permutation-based feature importance was used to investigate the most important features for predicting pitch location. Euclidean distance errors with the DNN were approximately 15 centimeters, outperforming linear regression models by 33% (6 centimeters). A post-hoc analysis revealed that a DNN trained without projectile motion predictions performed 17% (2.8 centimeters) worse than the optimal model, suggesting the context helped the model learn underlying physics principles that govern ball flight. The most important ball tracking metrics for predicting pitch location were lateral release position and spin rate, which are under direct control of the pitcher and associated with injury risk.

The results of the first aim motivated the second, which aimed to estimate pitching biomechanics directly from ball tracking data. Previous biomechanics research has established peak elbow varus torque as a viable surrogate for injury risk, suggesting an underlying variable between ball flight and injury risk that can be targeted with machine learning. This relationship was modeled by developing a random forest model to predict peak elbow varus torque directly from ball flight metrics. Results from the n-one-pitcher-out cross-validated sample of 150 pitchers and 2,901 pitches in a random forest model indicated that peak elbow varus torque could be predicted to within 4.61 Nm root mean square error (RMSE), far better than linear regression (21.41 Nm RMSE) and within ranges previously shown to separate injured and non-injured pitches (-8.52 Nm and 8.51 Nm limits of agreement). Additionally, the most important ball flight metrics were release speed, release position (vertical and horizontal), and spin rate, which align with the results of the first aim.

In the final study, the most important ball flight release metrics from Aims 1 and 2 were combined with the Aim 2 model predictions in a clinical validation of injury risk sensitivity. Using a cohort of over 1,500 Major League Baseball (MLB) pitchers with pitches and injury status scraped from publicly available data sources, a deep learning architecture based on convolutional neural networks and bi-directional long-short term memory networks was developed to estimate injury risk from a sequence of in-game pitches. Initial comparisons of Aim 2 model outputs revealed statistically but not clinically significant group-wise differences in model predictions between the two groups across all pitches, as well as significant differences after 2, 5, and 6 days of rest. Deep learning validation results after only 25 training epochs showed a successful balance of precision and recall on unseen data (0.74 F1-score), and the most important features for model predictions were days since previous game, spin rate, and release position. Similar to Aims 1 and 2, the underlying mechanism for model predictions were aligned with previously identified injury risk factors.

In each of the first two aims, the same ball flight metrics were most important for predicting key indicators of pitching performance and injury risk. In the third aim, these features also demonstrated effectiveness at predicting future injury given a sequence of in-game pitches. Altogether, these results suggest the potential to leverage ball tracking technology as a wide-scale, non-invasive tool to monitor pitching biomechanics. Such a solution would allow a much larger number of coaches, trainers, and sports scientists to better evaluate the tradeoff between performance gains and potential injury risk during games and throughout a season.

Author ORCID Identifier

0009-0003-7927-7052

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.