||Identification and classification of A/E/C web sites and pages
||Chen Y, Amor R
||Current search engines are not well suited to serving the needs of A/E/Cprofessionals. The general ones do not know about the vocabulary of the domain(e.g., so 'window' is a meaningless word) or rely on human classification (whichseverely limits the percentage of sites which are indexed). Domain specificdatabases and hot lists tend to be the only other option. While these have very goodinformation they reflect a very small proportion of what is on the web.This paper looks at a system for automated classification of web sites and pages inthe A/E/C domain. In particular, we concentrate on web sites and pages in NewZealand, and use the common classification system for the New Zealandconstruction industry (CBI). For this particular problem it is clear that no singleapproach to classifying web information gives a perfect answer. We thereforecombine several approaches for automated classification, including:· Identifying web sites that are already classified by other Internet portalsand mapping these classifications to the CBI classification system.· Extracting keywords from web pages and sites and then finding therelationships between the extracted keywords and topics in the CBIclassification system.· Using link analysis to find related web pages on a certain topic in the CBIclassification system.When an A/E/C professional searches with our system we determine metrics foreach approach above, and find the best combination of approaches to determine aclassification and hence the resultant web sites and pages.This paper describes the components of the search engine which has been createdand provides an analysis of the classification approaches.
|Year of publication:
Chen Y, Amor R (2002).
Identification and classification of A/E/C web sites and pages. Agger K (ed.); Distributing knowledge in building; Arhus, June 12 - 14, Denmark (ISSN: 2706-6568),