Parse HTML document in .NET, C#

Elerium HTML .NET Parser component, new release 1.6 with Improved algorithm of the HTML Parser has been introduced by Elerium Software.
 
May 8, 2013 - PRLog -- HyperText Markup Language (HTML) is the main markup language for creating web pages and other information that can be displayed in a web browser. HTML is written in the form of HTML elements consisting of tags enclosed in angle brackets, within the web page content. HTML elements form the building blocks of all websites.

Very often, developers need to parse HTML documents into components for data analysis or automatic operation with HTML pages without user intervention. To analyze the structure of the document, developer can use regular expressions. But this is not the best solution, because HTML is not a regular language. Also it is almost impossible to compose the regular expression that similar to the HTML markup. That why developers use special tools to parse HTML files. An example of such tool is Elerium HTML .NET Parser.

Elerium HTML .NET Parser component, new release 1.6 has been introduced by Elerium Software. It has several important changes:

- Improved the algorithm of the HTML Parser.
- Fixed some issues of HTML saving.
- Extended the list of encoded chars of HTML document.

What are the features of the HTML Parser? The component has a wide set of functions and properties to operate with HTML document. For example, Elerium HTML .NET Parser allows to get or set attributes of html tags. This sample shows how to find specified attribute of the html element and change its value:

C# Example:

using System;  
using System.Collections.Generic;  
using System.Linq;  
using System.Text;  
using Docs.Html;  

namespace CreateHtmlDoc  
{  
   class Program  
   {  
       static string indent = "";  
       static void Main(string[] args)  
       {  
         string htmltext = "<div><b>Bold text</b> before <span style=\"font-style:italic;\">Italic text</span></div>";  
         // Build DOM of the html
         HtmlDoc html = HtmlDoc.ParseHTML(htmltext);  
         // The function changes text style from italic to underline  
         ScanHtmlTree(html.Nodes);  
         // Show the tree of the html  
         ViewHtmlDOM(html.Nodes);  
         Console.ReadLine();  
       }  
       static void ScanHtmlTree(HtmlNodeCollection nodes)  
       {  
         foreach (HtmlNode node in nodes)  
         {  
         if (node.IsElement)  
         {  
         HtmlTag tag = node as HtmlTag;  
         if (tag.Attributes.IndexOf("style") != -1)  
         tag.Attributes["style"].Value = "text-decoration:underline;";  
         ScanHtmlTree(tag.Nodes);  
         }  
         }  
       }  
       static void ViewHtmlDOM(HtmlNodeCollection nodes)  
       {  
         foreach (HtmlNode node in nodes)  
         {  
         if (node.IsText)  
         {  
         HtmlText text = node as HtmlText;  
         Console.WriteLine(indent + text.ToString());  
         }  
         else  
         {  
         HtmlTag tag = node as HtmlTag;  
         Console.WriteLine(indent + tag.ToString());  
         indent += "  ";  
         ViewHtmlDOM(tag.Nodes);  
         indent = indent.Substring(0, indent.Length - 2);  
         Console.WriteLine(indent + "");  
         } } }
}
}

About Elerium Software

Elerium Software develops professional solutions for use in .NET projects (C#, VB.NET, ASP.NET) that aimed to read/write/convert different office/web documents and formats. Elerium Software components are based on the unique design and fast algorithms that allow being independent from the third-party applications and libraries.

For more information about the component please visit the product page:
http://www.eleriumsoft.com/HTML_NET/HTMLParser/Default.aspx
End
Source: » Follow
Email:***@eleriumsoft.com Email Verified
Tags:Html Parser, Xml, Net, Asp Net, c
Industry:Computers, Software
Location:luzern - Switzerland
Account Email Address Verified     Account Phone Number Verified     Disclaimer     Report Abuse
Elerium Software News
Trending
Most Viewed
Daily News



Like PRLog?
9K2K1K
Click to Share