How do I preserve HTML from the XML in PHP?

251

I'm trying to create myself a news page using PHP. However, I ran into a little bit of a roadblock. I want each post to be separated, to only have 5 posts at a time loaded, and for the HTML to be preserved. I got the first two, but the third is becoming a problem.

I've tried everything I can. I don't exactly know how each function is interacting with each other, so I always break something when trying to add something new. I tried using saveHTML() in here, but no matter where I place it, it either does nothing or breaks something.

All I want is for the post's content to preserve the HTML, some posts have unordered lists, and some have links.

By the way, here is the code:

<?php
    $rss = new DOMDocument();
    $rss->load('http://screenbones.com/news.xml');
    $feed = array();
    foreach ($rss->getElementsByTagName('item') as $node) {
        $item = array ( 
            'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
            'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
            'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
            'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue,
            );
        array_push($feed, $item);
    }
    $limit = 5;
    for($x=0;$x<$limit;$x++) {
        $title = str_replace(' & ', ' &amp; ', $feed[$x]['title']);
        $link = $feed[$x]['link'];
        $description = $feed[$x]['desc'];
        $date = date('l F d, Y', strtotime($feed[$x]['date']));

        echo '<article>';
        echo '<p><strong><a href="'.$link.'" title="'.$title.'">'.$title.'</a></strong><br />';
        echo '<small><em>Posted on '.$date.'</em></small></p>';
        echo $description;
        echo '</article>';
    }
?>
334

Answer

Solution:

You are reading thenodeValue property of<description> which is just the text content. Use the DOMDocument::saveHTML() method with the node instead.

Edit: Credit goes to Musa. My original answer was wrong. The <article> tags made me think OP was producing XML for some reason..

225

Answer

Solution:

Typically RSS feed use CDATA section/text field in description element with serialized XML. In your case XHTML is used, but without defining the namespace.

$document = new DOMDocument();
$document->load('http://screenbones.com/news.xml');
$xpath = new DOMXpath($document);

foreach ($xpath->evaluate('//item[position() <= 5]') as $item) {
  $title = $xpath->evaluate('string(title)', $item);
  $link = $xpath->evaluate('string(link)', $item);
  $date = date('l F d, Y', strtotime($xpath->evaluate('string(date)', $item)));

  if ($xpath->evaluate('count(description/*) > 0', $item)) {
    $descriptionFragment = '';
    foreach ($xpath->evaluate('description/node()', $item) as $content) {
      $descriptionFragment .= $document->saveHtml($content);
    }
  } else {
    $descriptionFragment = $xpath->evaluate('string(description)', $item);
  }

  printf (
    '<article>
       <p><strong><a href="%1$s" title="%2$s">%2$s</a></strong><br />
       <small><em>Posted on %2$s</em></small></p>
       %3$s
     </article>',
    htmlspecialchars($link),
    htmlspecialchars($title),
    htmlspecialchars($date),
    $descriptionFragment 
  );
} 

The example uses Xpath expressions, which allow you to fetch nodes and values from you DOM.

The first expression//item[position() <= 5] fetches the first fiveitem elements. For the other expression the$item node is used as context, so they are relative to it.

Expression likestring(title) fetch the element nodes by name and cast the first found node into a string. If no node is found, it returns an empty string.

count(description/*) > 0 checks if here are element nodes inside the description (not only text/cdata nodes). If this is the case it iterates all child nodes in the description and serializes them to HTML. Otherwise it read the single text node as serialized HTML.

htmlspecialchars() is used to escape characters like& for HTML output. Be careful with the$descriptionFragment. It is HTML directly from the external source. You might want to cleanup that before using it.

People are also looking for solutions to the problem: debugging - How can I get useful error messages in PHP?

Source

Didn't find the answer?

Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.

Ask a Question

Write quick answer

Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.

Similar questions

Find the answer in similar questions on our website.