PHP: Get DOM attribute with encoding preserved
I'm currently parsing some tags in an HTML document using PHP DOM. I want to get the value of the content attribute value of the "keywords" meta tag UNCHANGED.
For an example, the string "keyword1, keyword2,
keyword2, keyword3" returns "keyword1, keyword2, keyword2, keyword3", and therefore, breaks the real amount of keywords in the output XML Document.
I have already tried using "htmlentities()", but it didn't do anything.
Answer
Solution:
I know this is late, but after I revisited my code to make some edits, I found the solution via regex.
This takes the raw HTML (preferably parsed), and uses regex to get the meta tag itself, then from that, extracts the content value from the meta tag you want.
However, to append the data successfully to say, an XML document as I was, you need to use "textContent" specifically. More on that here: PHP: DOMNode - Manual