All Dissertations

NLP-based Quantitative Methods for Measuring Highway Project Similarity to Support Project Clustering and Bundling

Quan Do, Clemson UniversityFollow

Date of Award

12-2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Civil Engineering

Committee Chair/Advisor

Dr. Tuyen (Robert) Le

Committee Member

Dr. Kalyan R. Piratla

Committee Member

Dr. Kapil Chalil Madathil

Committee Member

Dr. Chao Fan

Abstract

State highway agencies (SHAs) often need to classify similar projects into distinct work type clusters, enabling them to correlate new projects with historical data on cost, schedule, pay item production rates, and project complexity. This classification also facilitates the identification of similar projects for project bundling. However, the conventional method for classifying projects into work types is inefficient, relying heavily on subjective judgment and manual review of project scope. This inefficiency hinders SHAs’ ability to effectively apply knowledge gained from historical projects to future projects. Moreover, the subjective nature of project classification is time-consuming and can result in inconsistencies and inaccuracies. To address these issues, this dissertation proposed four individual studies.

The first study aims to develop a novel method that employs Natural Language Processing (NLP) techniques to extract the semantic information of pay item descriptions and integrate it with cost contribution information for measuring project similarity. Various machine learning algorithms are used to evaluate the effectiveness of the proposed method in enhancing project clustering and bundling. The second study introduces a novel project clustering framework designed to support final design cost estimating during the early stages of project development where the unit price is unavailable, and the use of historical average unit price would not be ideal. This framework includes the development of project representation vectors using a novel quantity-weighted term frequency-inverse document frequency (QW-TFIDF) method, followed by the implementation of unsupervised machine learning and case-based reasoning cost estimate. The third study aims to enhance the project bundling practice of SHAs by considering both scope similarity and spatial proximity. A novel multiple-objective clustering method is developed that employs the CW-TF-IDF method and a genetic algorithm. This method effectively groups projects into clusters, ensuring that each cluster comprises projects with high-scope similarities and close spatial proximity, thereby optimizing the project bundling process. In the final study, a novel framework is developed that enables cross-state project clustering by incorporating a systematic pay item description standardization process. The framework first addresses the heterogeneity in pay item descriptions through similarity-based standardization, then employs advanced vectorization methods (i.e., CW-TF-IDF and QW-TF-IDF) to convert projects into a unified vector space for clustering algorithm implementation. The proposed methods were rigorously evaluated through empirical experiments. The results indicate that the proposed methods are reliable in measuring the similarity between projects and can help agencies assess the candidacy of projects for clustering and bundling. By implementing our developed models, SHAs can expect a significant reduction in the time and effort required for project clustering and a decrease in conflicts arising from manual bid data review and labeling. The model’s implementation has the potential to greatly minimize biases in project classification, enabling professionals to swiftly identify similar projects crucial for construction management tasks. The adoption of these methods is anticipated to facilitate prompt and accurate decision-making by SHAs, particularly during time-sensitive periods such as busy seasons, when allocating projects into bundled contracts is critical. Moreover, the application of cost predictions in the early stages of project development, made possible by our approach, contributes to improved budget forecasting, cost management, and resource allocation. Furthermore, this automated method offers design teams and SHAs the ability to efficiently compare and evaluate various design alternatives by leveraging quantity data extracted from designs and historical bid information, without the need for additional data collection.

Recommended Citation

Do, Quan, "NLP-based Quantitative Methods for Measuring Highway Project Similarity to Support Project Clustering and Bundling" (2024). All Dissertations. 3819.
https://open.clemson.edu/all_dissertations/3819

Download

Included in

Construction Engineering and Management Commons

COinS

All Dissertations

NLP-based Quantitative Methods for Measuring Highway Project Similarity to Support Project Clustering and Bundling

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Search

Browse by

Useful Links

All Dissertations

NLP-based Quantitative Methods for Measuring Highway Project Similarity to Support Project Clustering and Bundling

Author

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Share

Search

Browse by

Useful Links