php - cURL Gets redirected?
I'm writing a php script that will eventually scrape images from html retrieved by cURL. I notice on some sites, my targeted url isn't what is returned back. My script gets redirected to a specific part of that websites page.
For instance, if i'm trying to retrieve the html on this page: Link
I get the html returned from this page: Link
Here is my cURL code:
function curl($url){
$headers[] = "User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13";
$headers[] = "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,
*/*;q=0.8";
$headers[] = "Accept-Language:en-us,en;q=0.5";
$headers[] = "Accept-Encoding:gzip,deflate";
$headers[] = "Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$headers[] = "Keep-Alive:115";
$headers[] = "Connection:keep-alive";
$headers[] = "Cache-Control:max-age=0";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_ENCODING, "gzip");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec( $curl );
$header = curl_getinfo( $curl );
curl_close($curl);
return $header;
}
$data = curl($_GET['url']);
echo print_r($data);
Is there any way to spoof the script more so it doesn't get redirected?
@mariobgr Here I'm trying to display a quick response where ever there is an image. If I turn follow location off, I don't get anything back
...
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_ENCODING, "gzip");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 0);
$data = curl_exec( $curl );
//$header = curl_getinfo( $curl );
curl_close($curl);
return $data;
}
$data = curl($_GET['url']);
$dom = new DOMDocument();
@$dom->loadHTML($data);
$images = $dom->getElementsByTagName('img');
foreach($images as $image) {
echo "image here";
}
Answer
Solution:
http://curl.haxx.se/libcurl/c/CURLOPT_FOLLOWLOCATION.html
A parameter set to 1 tells the library to follow any Location: header that the server sends as part of a HTTP header in a 3xx response. This means that libcurl will re-send the same request on the new location and follow new Location: headers all the way until no more such headers are returned. CURLOPT_MAXREDIRS can be used to limit the number of redirects libcurl will follow.
You can set it to FALSE/0 to prevent redirecting