php - cURL Gets redirected?

26

I'm writing a php script that will eventually scrape images from html retrieved by cURL. I notice on some sites, my targeted url isn't what is returned back. My script gets redirected to a specific part of that websites page.

For instance, if i'm trying to retrieve the html on this page: Link

I get the html returned from this page: Link

Here is my cURL code:

           function curl($url){
                $headers[]  = "User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; 
                    rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13";
                $headers[]  = "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,
                    */*;q=0.8";
                $headers[]  = "Accept-Language:en-us,en;q=0.5";
                $headers[]  = "Accept-Encoding:gzip,deflate";
                $headers[]  = "Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7";
                $headers[]  = "Keep-Alive:115";
                $headers[]  = "Connection:keep-alive";
                $headers[]  = "Cache-Control:max-age=0";

                $curl = curl_init();
                curl_setopt($curl, CURLOPT_URL, $url);
                curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
                curl_setopt($curl, CURLOPT_ENCODING, "gzip");
                curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
                curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);

                $data = curl_exec( $curl );
                $header = curl_getinfo( $curl );

                curl_close($curl);

                return $header; 
            }

            $data = curl($_GET['url']);

            echo print_r($data);

Is there any way to spoof the script more so it doesn't get redirected?

@mariobgr Here I'm trying to display a quick response where ever there is an image. If I turn follow location off, I don't get anything back

                ...

                $curl = curl_init();
                curl_setopt($curl, CURLOPT_URL, $url);
                curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
                curl_setopt($curl, CURLOPT_ENCODING, "gzip");
                curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
                curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 0);

                $data = curl_exec( $curl );
                //$header = curl_getinfo( $curl );

                curl_close($curl);

                return $data;   
            }

            $data = curl($_GET['url']);

            $dom = new DOMDocument();
            @$dom->loadHTML($data);

            $images = $dom->getElementsByTagName('img');

            foreach($images as $image) {

                echo "image here";
            }
46

Answer

Solution:

http://curl.haxx.se/libcurl/c/CURLOPT_FOLLOWLOCATION.html

A parameter set to 1 tells the library to follow any Location: header that the server sends as part of a HTTP header in a 3xx response. This means that libcurl will re-send the same request on the new location and follow new Location: headers all the way until no more such headers are returned. CURLOPT_MAXREDIRS can be used to limit the number of redirects libcurl will follow.

You can set it to FALSE/0 to prevent redirecting

People are also looking for solutions to the problem: php - mySQL data unique for each user

Source

Didn't find the answer?

Our community is visited by hundreds of web development professionals every day. Ask your question and get a quick answer for free.

Ask a Question

Write quick answer

Do you know the answer to this question? Write a quick response to it. With your help, we will make our community stronger.

Similar questions

Find the answer in similar questions on our website.