Find regex in next line in perl -


my content this form:

<tr>         <td width="50%" align="right" valign="middle">email </td>      <td width="50%" align="center" valign="middle"> unique@gmail.com </td> </tr> <tr>         <td width="50%" align="right" valign="middle">code </td>     <td width="50%" align="center" valign="middle">twenty</td> </tr>  <tr>        <td width="50%" align="right" valign="middle">code12 </td>     <td width="50%" align="center" valign="middle">forty</td> </tr> 

what regex should use if want extract "twenty" ie data accociated "code"

i tried extract whole line, empty response

$c=$m->content(); ($a) = $c =~ /code(.*?)tr>/; print "$a\n"; 

do not try parse html regex, way madness , broken code lies. instead, use existing xml tools. searching in html (which can treated xml) use xpath. there many perl implementations. recommend xml::libxml uses fast , maintained libxml2 c library.

here's example of how you'd content of cell next "code " cell.

use v5.10; use strict; use warnings;  use xml::libxml;  # parse html xml::libxml::document $parsed_html = xml::libxml->load_html( string => <<'html'); <tr>         <td width="50%" align="right" valign="middle">email </td>      <td width="50%" align="center" valign="middle"> unique@gmail.com </td> </tr> <tr>         <td width="50%" align="right" valign="middle">code </td>     <td width="50%" align="center" valign="middle">twenty</td> </tr>  <tr>        <td width="50%" align="right" valign="middle">code12 </td>     <td width="50%" align="center" valign="middle">forty</td> </tr> html  # find rows first cell contains "code" ignoring whitespace. @code_rows = $parsed_html->findnodes(q{//tr[normalize-space(td[1])='code']});  # in each of code rows, value of second cell. $row (@code_rows) {     $row->findvalue(q{td[2]}); } 

xml::libxml , xpath large they're worth investment if you're going working html , xml. save endless hours debugging special cases regexes don't handle. of need in xml::libxml::node.


Comments

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

c# - Get rid of xmlns attribute when adding node to existing xml -