News By Tag Industry News News By Location Country(s) Industry News
| Parse HTML document in .NET, C#Elerium HTML .NET Parser component, new release 1.6 with Improved algorithm of the HTML Parser has been introduced by Elerium Software.
By: Elerium Software Very often, developers need to parse HTML documents into components for data analysis or automatic operation with HTML pages without user intervention. To analyze the structure of the document, developer can use regular expressions. But this is not the best solution, because HTML is not a regular language. Also it is almost impossible to compose the regular expression that similar to the HTML markup. That why developers use special tools to parse HTML files. An example of such tool is Elerium HTML .NET Parser. Elerium HTML .NET Parser component, new release 1.6 has been introduced by Elerium Software. It has several important changes: - Improved the algorithm of the HTML Parser. - Fixed some issues of HTML saving. - Extended the list of encoded chars of HTML document. What are the features of the HTML Parser? The component has a wide set of functions and properties to operate with HTML document. For example, Elerium HTML .NET Parser allows to get or set attributes of html tags. This sample shows how to find specified attribute of the html element and change its value: C# Example: using System; using System.Collections.Generic; using System.Linq; using System.Text; using Docs.Html; namespace CreateHtmlDoc { class Program { static string indent = ""; static void Main(string[] { string htmltext = "<div><b>Bold text</b> // Build DOM of the html HtmlDoc html = HtmlDoc.ParseHTML( // The function changes text style from italic to underline ScanHtmlTree( // Show the tree of the html ViewHtmlDOM( Console.ReadLine(); } static void ScanHtmlTree( { foreach (HtmlNode node in nodes) { if (node.IsElement) { HtmlTag tag = node as HtmlTag; if (tag.Attributes.IndexOf(" tag.Attributes[" ScanHtmlTree( } } } static void ViewHtmlDOM( { foreach (HtmlNode node in nodes) { if (node.IsText) { HtmlText text = node as HtmlText; Console.WriteLine( } else { HtmlTag tag = node as HtmlTag; Console.WriteLine( indent += " "; ViewHtmlDOM( indent = indent.Substring( Console.WriteLine( } } } } } About Elerium Software Elerium Software develops professional solutions for use in .NET projects (C#, VB.NET, ASP.NET) that aimed to read/write/convert different office/web documents and formats. Elerium Software components are based on the unique design and fast algorithms that allow being independent from the third-party applications and libraries. For more information about the component please visit the product page: http://www.eleriumsoft.com/ End
|
| ||||||||||||||||||||||||||||||||||||||||||||