1. Latest News
  2. Submit Press Release
  1. PR Home
  2. Latest News
  3. Feeds
  4. Alerts
  5. Submit Free Press Release
  6. Reporter Account

Extracting Structured Data from Web Pages

Many web sites contain large sets of pages generated using a common template or layout. For example, Amazon lays out the author, title, comments, etc. in the same way in all its book pages.

FOR IMMEDIATE RELEASE

 
web mining
web mining
PRLog (Press Release) - Nov 20, 2008 -
Many web sites contain large sets of pages generated using a common template or layout. For example, Amazon lays out the author, title, comments, etc. in the same way in all its book pages. The values used to generate the pages (e.g., the author, title,...) typically come from a database.
  In this paper, we study the problem of automatically extracting the database values from the web pages without any learning examples or other similar human input. We formally define the notion of a template, and propose a model that describes how values are encoded into pages using a template. We present an extraction algorithm that uses sets of words that have similar occurrence pattern in the input pages, to construct the template. The constructed template is then used to extract values from the pages. We show experimentally that the extracted values make semantic sense in most cases.

More Screen Scraping Data information,please vist: http://www.knowlesys.com/
Screen Scraping Examples:http://www.knowlesys.com/examples.htm

Photo:
http://www.prlog.org/10143812/1

# # #

Founded in 2003, Knowlesys Software Inc. has provided web data extraction services or softwares to our clients more than 500 times. Our focus is Web Data Extraction. We try to provide the best web data extraction services and softwares in the world.

At Knowlesys we continuous improve our development progress. We build four guides to improve the quality and effective of our daily work: Knowlesys Software Process Guide, Knowlesys Software Design Guide, Knowlesys Solution Framework Guide, Knowlesys Service Process Guide.

--- end ---

Click to Share

Contact Email:
Source:Knowlesys Software Inc.
Tags:,
Last Updated:Nov 20, 2008
Shortcut:http://prlog.org/10143812
Disclaimer:   Issuers of the press releases are solely responsible for the content of their press releases. PRLog can't be held liable for the content posted by others.   Report Abuse

Upcoming Press Releases...



  1. SiteMap
  2. Privacy Policy
  3. Terms of Service
  4. Copyright Notice
  5. About
  6. Advertise
Like PRLog?
3.5K1.4K1.3K
Click to Share