Questions tagged [domdocument]

The concept of DOMDocument is a term often used to describe the encapsulation of the Document Object Model within different programming languages and technologies. This includes PHP, COM, C++, and ActiveX among others.

Combining cURL with multiple URLs for efficient result parsing

I am currently developing a PHP web scraper with the following objectives: Retrieve content from less than 10 URLs using cURL, Add the HTML content of each URL to a DOMDocument, Search the DOM document for <a> elements that link to PDF files, ...

When using getElementsByTagName on a title element, it returns a DOMNodeList Object

Our unique script utilizes dom to extract all the links (a tags) from a document and then iterates through child nodes to collect information. The process starts like this: @$dom->loadHTML($str); $documentLinks = $dom->getElementsByTagName("a"); He ...

What is the best way to extract the HTML from a DOMDocument without including the HTML wrapper

Below is a function that is experiencing difficulty outputting the DOMDocument without appending the XML, HTML, body, and p tag wrappers before the content. The suggested solution provided: $postarray['post_content'] = $d->saveXML($d->getElementsByT ...

Leveraging XSD schema validation to enhance XPath query evaluation

I am currently using the given code snippet to create a DOMDocument and validate it against an external XSD file. <?php $xmlPath = "/xml/some/file.xml"; $xsdPath = "/xsd/some/schema.xsd"; $doc = new DOMDocument(); $doc->loa ...

Leverage the power of DOMXPath in combination with the query function

Here is a snippet of HTML code to work with: <ul id="tree"> <li> <a href="">first</a> <ul> <li><a href="">subfirst</a></li> <li><a href=""> ...

Splitting HTML content into nodes using PHP XPath (including empty nodes)

I am currently attempting to separate the HTML string into individual nodes, along with their text content if applicable. Here is the HTML string that I am working with: <p>Paragraph one.</p> <p><strong>Paragraph <em>two</ ...

What is the best way to access the properties of a domText object, specifically one that is a span element?

I'm facing an issue where my span element is being treated as a DomText object, making it difficult for me to retrieve the attribute of the span. Despite trying various solutions, I have found that using a simple DomDocument (instead of xpath) is faster f ...

Converting HTML to plain text using the DOMDocument class

Is there a way to extract the text content from an HTML page source code, excluding the HTML tags? For example: <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <meta http-equiv="content-language" content="hu"/> <titl ...

The problem with DOMdocument and Xpath array pairing

I'm currently experiencing some difficulties with file_get_contents, DOMdocument, and Xpath. I am attempting to perform some web scraping. I have created an array of website links to scrape: array(5) { [0]=> string(34) "https://lions-mansion.jp/M ...

Having trouble accessing the website content with PHP5 domdocument

<?php class parsedictionary { public function _process() { $webpage="http://www.oppapers.com/essays/Computerized-World/160871?read_essay"; $doc=new DOMDocument(); $doc->loadHTML($webpage); e ...

Could my HTML security measures be vulnerable to exploitation?

I have successfully developed a function that accomplishes the following: It accepts a string as input, which can be either an entire HTML document or an HTML "snippet" (even if it's broken). It creates a DOMDocument from the input and iterates through al ...