php - How to create a sitemap with page relationships
I'm currently trying to figure out a way to write a script (preferrably PHP) that would crawl through a site and create a sitemap. In addition to the traditional standard listing of pages, I'd like the script to keep track of which pages link to other pages.
Example pages
A
B
C
D
I'd like the output to give me something like the following.
Page Name: A
Pages linking to Page A:
- B
- C
- D
Page Name: B
Pages linking to Page B:
- A
- C
etc...
I've come across multiple standard sitemap scripts, but nothing that really accomplishes what I am looking for.
EDIT Seems I didn't give enough info. Sorry about my lack of clarity there. Here is the code I currently have. I've used simple_html_dom.php to take care of the tasks of parsing and searching through the html for me.
<?php
include("simple_html_dom.php");
url = 'page_url';
$html = new simple_html_dom();
$html->load_file($url);
$linkmap = array();
foreach($html->find('a') as $link):
if(contains("cms/education",$link)):
if(!in_array($link, $linkmap)):
$linkmap[$link->href] = array();
endif;
endif;
endforeach;
?>
Note: My little foreach loop just filters based on a specific substring in the url.
So, I have the necessary first level pages. Where I am stuck is in creating a loop that will not run indefinitely, while keeping track of the pages you have already visited.
Answer
Solution:
Basically, you need two arrays to control the flow here. The first will keep track of the pages you need to look at and the second will track the pages you have already looked at. Then you just run your existing code on each page until there are none left: