Some .NET Regular Expressions for HTML parsing
Despite the observations in my previous post, my intranet publishing solution is based on a combination of the ASP.NET approach to obtain web content and regular expressions coded in C#. Here are some regular expressions that worked well for me to get specific meta and input tags regardless of letter case and attribute quoting style (single, double or no quotes):
- Content attribute of meta tag with name content-type:
(?insx)
<meta\s([^>]*?\s*)?
content\s*=\s*
( '(?<Result>[^']*)'
| "(?<Result>[^"]*)"
| (?<Result>[^\s>]*)
)
[^>]*>
(?<=\sname\s*=\s*['"]?content-type['"]?[^>]*>) - Value of input element with name input-name:
(?insx)
<input\s([^>]*?\s*)?
value\s*=\s*
( '(?<Result>[^']*)'
| "(?<Result>[^"]*)"
| (?<Result>[^\s>]*)
)
[^>]*>
(?<=\sname\s*=\s*['"]?input-name['"]?[^>]*>)
0 Comments:
Post a Comment
<< Home