Paper title: |
Unstructured Construction Document Classification Model Through Latent Semantic Analysis (LSA) |
Authors: |
Tarek Mahfouz, Amr Kandil |
Summary: |
The dynamic nature of the construction industry and the increasing sophistication and complexity of construction projects mandate extensive coordination between different parties and produces massive amounts of documents in diverse formats. Therefore, in an attempt to provide a robust document classification methodology for the construction industry, the current research develops an automated classifier model through Latent semantic Analysis (LSA). The analyses and models developed in this paper focused on two groups of construction documents. The first constitutes of documents with high variation in words like transmittals, correspondences, and meeting minutes. The second relates to documents of low word variations like construction claims and legal documents. The adopted research methodology (1) investigated Latent Semantic Analysis (LSA) algorithms; (2) developed reduced feature spaces; (3) developed two C++ algorithms which process unstructured construction documents into a readable format by the LSA algorithms; (4) developed LSA automated classification models; and (5) tested and validated the developed models. The developed models under the current research attained higher classification accuracy, and better precision and recall than previous researches illustrated in the literature. An overall accuracy of 89% and 92% were attained in the first and second groups of documents addressed respectively. The main finding of this paper represent a step in a line of research that targets developing a coherent and integrated methodology for Knowledge Management (KM) and construction decision support through Machine Learning (ML) techniques. It is conjectured that this research stream would help in relieving the negative consequences associated with lengthy tasks related to analyzing textual documents in the construction industry. |
Type: |
normal paper |
Year of publication: |
2010 |
Keywords: |
Knowledge Management, Latent Semantic analysis, Machine Learning, Document Classification |
Series: |
w78:2010 |
ISSN: |
2706-6568 |
Download paper: |
/pdfs/w78-2010-25.pdf |
Citation: |
Tarek Mahfouz, Amr Kandil (2010).
Unstructured Construction Document Classification Model Through Latent Semantic Analysis (LSA) . CIB W78 2010 - Applications of IT in the AEC Industry (ISSN: 2706-6568),
http://itc.scix.net/paper/w78-2010-25
|