Digital library of construction informatics
and information technology in civil engineering and construction


Paper w78-2010-25:
Unstructured Construction Document Classification Model Through Latent Semantic Analysis (LSA)

Facilitated by the SciX project

Tarek Mahfouz, Amr Kandil

Unstructured Construction Document Classification Model Through Latent Semantic Analysis (LSA)

Abstract: The dynamic nature of the construction industry and the increasing sophistication and complexity of construction projects mandate extensive coordination between different parties and produces massive amounts of documents in diverse formats. Therefore, in an attempt to provide a robust document classification methodology for the construction industry, the current research develops an automated classifier model through Latent semantic Analysis (LSA). The analyses and models developed in this paper focused on two groups of construction documents. The first constitutes of documents with high variation in words like transmittals, correspondences, and meeting minutes. The second relates to documents of low word variations like construction claims and legal documents. The adopted research methodology (1) investigated Latent Semantic Analysis (LSA) algorithms; (2) developed reduced feature spaces; (3) developed two C++ algorithms which process unstructured construction documents into a readable format by the LSA algorithms; (4) developed LSA automated classification models; and (5) tested and validated the developed models. The developed models under the current research attained higher classification accuracy, and better precision and recall than previous researches illustrated in the literature. An overall accuracy of 89% and 92% were attained in the first and second groups of documents addressed respectively. The main finding of this paper represent a step in a line of research that targets developing a coherent and integrated methodology for Knowledge Management (KM) and construction decision support through Machine Learning (ML) techniques. It is conjectured that this research stream would help in relieving the negative consequences associated with lengthy tasks related to analyzing textual documents in the construction industry.

Keywords: Knowledge Management, Latent Semantic analysis, Machine Learning, Document Classification


Full text: content.pdf (212,872 bytes) (available to registered users only)

Series: w78:2010 (browse)
Similar papers:
Sound: N/A.


hosted by University of Ljubljana



© itc.scix.net 2003
FIRST PREVIOUS NEXT LAST Home page of this database login Powered by SciX Open Publishing Services 1.002 February 16, 2003