Keyword: HTML Tables : Search

Anywhere

Advanced Search

SEARCH GUIDE

Results: 1 - 2of2

Follow results:

refine search

Filters

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Advanced

Search name	Searched On	Run search
Keyword: HTML Tables (2)	31 Mar 2025	Run
Keyword: Red Ginseng (6)	31 Mar 2025	Run
Keyword: Coordinates (4)	31 Mar 2025	Run
Keyword: Loan Access (1)	31 Mar 2025	Run
Keyword: NO Molecule (1)	31 Mar 2025	Run

articleNo Access
A Method for Materials Knowledge Extraction from HTML Tables Based on Sibling Comparison
- Xiaoming Zhang,
- Pengtao Lv,
- Chongchong Zhao, and
- Jianxian Wang
International Journal of Software Engineering and Knowledge Engineering01 Aug 2016
Preview Abstract
There are rich data resources residing in available materials websites, and most of these data resources are shown in the form of HTML tables. However, it is difficult to distinguish the attributes and values because of the semi-structured feature of HTML tables. Therefore, identifying attributes in HTML tables is the key issue for the information acquisition. In this paper, based on sibling comparison, a method for materials knowledge extraction from HTML tables is proposed, which consists of three steps: acquiring sibling tables, identifying table pattern and extracting table data. We show how to use $F$ -measure to find the appropriate thresholds for matching of tables from materials websites when acquiring sibling tables. Further, we propose a strategy named FRFC (i.e. the First Row matching and First Column matching) to distinguish attributes and values, so that table pattern is identified. Moreover, the data from HTML tables is extracted based on their corresponding table patterns and mapped to a predefined schema, which will facilitate the population to materials ontology. The proposed approach is applicable to circumstances, where an attribute in the table may span multiple cells and matched attributes in sibling tables are more. We acquire desired accuracy ( $> 90$ %) through using FRFC for identifying table pattern. The time about extraction may not increase significantly with increasing number of documents and cells in tables, so our approach is effective to process a large number of documents. A prototype named MTES is developed and demonstrates the effectiveness of our proposed approach.
articleNo Access
A Query Engine for Retrieving Information from Chinese HTML Documents
- LIHUA ZHANG and
- YIU-KAI NG
International Journal of Computer Processing of Languages01 Sep 2004
Preview Abstract
The amount of online information in Chinese and the number of Chinese Internet users have been increasing tremendously during the past decade. Since Chinese language is significantly different from English, techniques that have been developed for retrieving information from English Web documents cannot be directly applied to retrieve information from Chinese Web documents. In order to provide high-performance access of Chinese information on the Web, we have developed a Chinese Web query engine that (i) extracts (hierarchical) data of interest from Chinese HTML tables using an information extraction tool called semantic hierarchy, (ii) allows the user to submit queries in Chinese using a menu-driven user interface, and (iii) processes the user's queries (as Boolean expressions) to generate the correct results. Our query engine supports various groups of information that are categorized into various subject areas, such as car ads, house rentals, job ads, stocks, university catalogs, etc. We have tested our information extraction tool on two application domains, car-ads and house-rental. The average F-measure on extracting Chinese data from these two application domains is above 90%. More importantly, our query engine can easily be configured and internationalized to become a worldwide, multilingual query engine with minor changes in system settings on PCs running Windows operating systems.