All Theses

A COMPARATIVE STUDY ON ONTOLOGY GENERATION AND TEXT CLUSTERING USING VSM, LSI, AND DOCUMENT ONTOLOGY MODELS

William Taylor, Clemson UniversityFollow

Date of Award

8-2007

Document Type

Thesis

Degree Name

Master of Science (MS)

Legacy Department

Computer Science

Committee Chair/Advisor

Wang, James

Committee Member

Lou , Feng

Committee Member

Smotherman , Mark

Abstract

Although using ontologies to assist information retrieval and text document processing has recently attracted more and more attention, existing ontology-based approaches have not shown advantages over the traditional keywords-based Latent Semantic Indexing (LSI) method. This paper proposes an algorithm to extract a concept forest (CF) from a document with the assistance of a natural language ontology, the WordNet lexical database. Using concept forests to represent the semantics of text documents, the semantic similarities of these documents are then measured as the commonalities of their concept forests. Performance studies of text document clustering based on different document similarity measurement methods show that the CF-based similarity measurement is an effective alternative to the existing keywords-based methods. Especially, this CF-based approach has obvious advantages over the existing keywords-based methods, including LSI, in dealing with text abstract databases, such as MEDLINE, or in P2P environments where it is impractical to collect the entire document corpus for analysis.

Recommended Citation

Taylor, William, "A COMPARATIVE STUDY ON ONTOLOGY GENERATION AND TEXT CLUSTERING USING VSM, LSI, AND DOCUMENT ONTOLOGY MODELS" (2007). All Theses. 183.
https://open.clemson.edu/all_theses/183

Download

Included in

Computer Sciences Commons

COinS

All Theses

A COMPARATIVE STUDY ON ONTOLOGY GENERATION AND TEXT CLUSTERING USING VSM, LSI, AND DOCUMENT ONTOLOGY MODELS

Date of Award

Document Type

Degree Name

Legacy Department

Committee Chair/Advisor

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Search

Browse by

Useful Links

All Theses

A COMPARATIVE STUDY ON ONTOLOGY GENERATION AND TEXT CLUSTERING USING VSM, LSI, AND DOCUMENT ONTOLOGY MODELS

Author

Date of Award

Document Type

Degree Name

Legacy Department

Committee Chair/Advisor

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Share

Search

Browse by

Useful Links