Digital library of construction informatics
and information technology in civil engineering and construction


Paper w78-2002-34:
Identification and classification of A/E/C web sites and pages

Facilitated by the SciX project

Chen Y, Amor R

Identification and classification of A/E/C web sites and pages

Abstract: Current search engines are not well suited to serving the needs of A/E/C professionals. The general ones do not know about the vocabulary of the domain (e.g., so 'window' is a meaningless word) or rely on human classification (which severely limits the percentage of sites which are indexed). Domain specific databases and hot lists tend to be the only other option. While these have very good information they reflect a very small proportion of what is on the web. This paper looks at a system for automated classification of web sites and pages in the A/E/C domain. In particular, we concentrate on web sites and pages in New Zealand, and use the common classification system for the New Zealand construction industry (CBI). For this particular problem it is clear that no single approach to classifying web information gives a perfect answer. We therefore combine several approaches for automated classification, including: Identifying web sites that are already classified by other Internet portals and mapping these classifications to the CBI classification system. Extracting keywords from web pages and sites and then finding the relationships between the extracted keywords and topics in the CBI classification system. Using link analysis to find related web pages on a certain topic in the CBI classification system. When an A/E/C professional searches with our system we determine metrics for each approach above, and find the best combination of approaches to determine a classification and hence the resultant web sites and pages. This paper describes the components of the search engine which has been created and provides an analysis of the classification approaches.



Full text: content.pdf (171,531 bytes) (available to registered users only)

Series: w78:2002 (browse)
Cluster: papers of the same cluster (result of machine made clusters)
Class: class.collaboration (0.067651) class.retrieve (0.043347) class.man-software (0.025151)
Similar papers:
Sound: read aloud.

Permission to reproduce these documents have been graciously provided by the Aarhus School of Architecture, Denmark. The assistnace of the editor, Prof. Kristian Agger, is gratefully aprecciated.


hosted by University of Ljubljana



© itc.scix.net 2003
FIRST PREVIOUS NEXT LAST Home page of this database login Powered by SciX Open Publishing Services 1.002 February 16, 2003