Find regex in next line in perl -
my content this form:
<tr> <td width="50%" align="right" valign="middle">email </td> <td width="50%" align="center" valign="middle"> unique@gmail.com </td> </tr> <tr> <td width="50%" align="right" valign="middle">code </td> <td width="50%" align="center" valign="middle">twenty</td> </tr> <tr> <td width="50%" align="right" valign="middle">code12 </td> <td width="50%" align="center" valign="middle">forty</td> </tr>
what regex should use if want extract "twenty" ie data accociated "code"
i tried extract whole line, empty response
$c=$m->content(); ($a) = $c =~ /code(.*?)tr>/; print "$a\n";
do not try parse html regex, way madness , broken code lies. instead, use existing xml tools. searching in html (which can treated xml) use xpath. there many perl implementations. recommend xml::libxml uses fast , maintained libxml2 c library.
here's example of how you'd content of cell next "code " cell.
use v5.10; use strict; use warnings; use xml::libxml; # parse html xml::libxml::document $parsed_html = xml::libxml->load_html( string => <<'html'); <tr> <td width="50%" align="right" valign="middle">email </td> <td width="50%" align="center" valign="middle"> unique@gmail.com </td> </tr> <tr> <td width="50%" align="right" valign="middle">code </td> <td width="50%" align="center" valign="middle">twenty</td> </tr> <tr> <td width="50%" align="right" valign="middle">code12 </td> <td width="50%" align="center" valign="middle">forty</td> </tr> html # find rows first cell contains "code" ignoring whitespace. @code_rows = $parsed_html->findnodes(q{//tr[normalize-space(td[1])='code']}); # in each of code rows, value of second cell. $row (@code_rows) { $row->findvalue(q{td[2]}); }
xml::libxml , xpath large they're worth investment if you're going working html , xml. save endless hours debugging special cases regexes don't handle. of need in xml::libxml::node.
Comments
Post a Comment