Проблема в том, что я не могу получить все заголовки последовательных запросов с php curl.
Скручиваемость возвращается заголовки только последнего запроса.
Пожалуйста, проверьте PHP-код ниже:
$curl = curl_init('http://yandex.com');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_VERBOSE, true);
curl_setopt($curl, CURLOPT_COOKIEFILE, '');
curl_setopt($curl, CURLOPT_HEADER, true); //include the header in the output
curl_setopt($curl, CURLINFO_HEADER_OUT, true); //track the handle's request str
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($curl, CURLOPT_TIMEOUT, 5); //seconds
$requestHeaders = array("Connection: keep-alive", //we want to look like real browser
"Cache-Control: max-age=0",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36",
"Dnt: 1",
"Accept-Encoding: gzip, deflate, sdch",
"Accept-Language: ru,en-US;q=0.8,en;q=0.6,uk;q=0.4,tr;q=0.2");
curl_setopt($curl, CURLOPT_HTTPHEADER, $requestHeaders);
$response = curl_exec($curl);
$info = curl_getinfo($curl);
echo 'Redirect count = ' . $info['redirect_count'] . "\n\n<br /><br />";
echo "Request header\n<br />" . str_replace("\n", "\n<br />", $info['request_header']) . "\n\n<br /><br />";
echo "Response headers\n<br />" . str_replace("\n", "\n<br />", substr($response, 0, intval($info['header_size']))) . "\n\n<br /><br />";
Этот код показывает только последние заголовки запроса:
HEAD / HTTP/1.1
Host: www.yandex.com
Cookie: yandexuid=1858417061425447211
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36
Dnt: 1
Accept-Encoding: gzip, deflate, sdch
Accept-Language: ru,en-US;q=0.8,en;q=0.6,uk;q=0.4,tr;q=0.2
Но все три http ответа:
HTTP/1.1 302 Moved Temporarily
Date: Wed, 04 Mar 2015 05:33:30 GMT
Content-Type: text/html
Location: http://www.yandex.com/
HTTP/1.1 302 Found
Date: Wed, 04 Mar 2015 05:33:31 GMT
Cache-Control: no-cache,no-store,max-age=0,must-revalidate
Location: https://www.yandex.com/
Expires: Wed, 04 Mar 2015 05:33:31 GMT
Last-Modified: Wed, 04 Mar 2015 05:33:31 GMT
P3P: policyref="/w3c/p3p.xml", CP="NON DSP ADM DEV PSD IVDo OUR IND STP PHY PRE NAV UNI"Set-Cookie: yandexuid=1858417061425447211; Expires=Sat, 01-Mar-2025 05:33:31 GMT; Domain=.yandex.com; Path=/
X-XRDS-Location: http://openid.yandex.ru/server_xrds/
HTTP/1.1 200 Ok
Date: Wed, 04 Mar 2015 05:33:31 GMT
Content-Type: text/html; charset=UTF-8
Cache-Control: no-cache,no-store,max-age=0,must-revalidate
Expires: Wed, 04 Mar 2015 05:33:32 GMT
Last-Modified: Wed, 04 Mar 2015 05:33:32 GMT
Content-Security-Policy: default-src 'self' 'unsafe-inline' 'unsafe-eval' wss://portal-xiva.yandex.net *.yandex.ru yandex.ru *.yandex.net *.yandex.com yandex.com yandex.st yastatic.net *.yastatic.net wss://portal-xiva.yandex.net wss://push.yandex.ru; img-src data: 'self' *.yandex.ru *.yandex.com *.tns-counter.ru *.gemius.pl yandex.st *.yandex.net yastatic.net *.yastatic.net; report-uri https://www.yandex.ru/log/csp?from=com.com&showid=22877.12428.1425447211.60001&h=m60&yu=1858417061425447211;
P3P: policyref="/w3c/p3p.xml", CP="NON DSP ADM DEV PSD IVDo OUR IND STP PHY PRE NAV UNI"X-Frame-Options: DENY
X-XRDS-Location: http://openid.yandex.ru/server_xrds/
Content-Encoding: gzip
Проверено на
Это невозможно сделать средствами cURL. Единственный способ — обрабатывать перенаправления вручную.
Что-то вроде этого:
$a_options = array(
CURLOPT_NOBODY => false,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_BINARYTRANSFER => true,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_FOLLOWLOCATION => false,
CURLOPT_VERBOSE => true,
CURLOPT_HEADER => true,
CURLINFO_HEADER_OUT => true,
CURLOPT_TIMEOUT => 5,
CURLOPT_COOKIEFILE => '',
CURLOPT_HTTPHEADER => array(
...
),
);
$a_headers = array();
$url = 'http://yandex.com';
$url_ref = '';
$ch = curl_init();
$count = 0;
for ($count = 0; $url; $count++) {
$a_options[CURLOPT_URL] = $url;
if ($url_ref) $a_options[CURLOPT_REFERER] = $url_ref;
curl_setopt_array($ch, $a_options);
$response = curl_exec($ch);
$a_info = curl_getinfo($ch);
$header = substr($response, 0, $i = $a_info['header_size']);
$response = substr($response, $i);
$a_headers[] = array(
'request' => $a_info['request_header'],
'response' => $header
);
$url_ref = $url;
$url = '';
if ($a_info['http_code'] == 302 &&
preg_match('/\nlocation:\s*(\S+)\s/i', $header, $arr)) {
$url = $arr[1];
}
}
echo "Redirect count = $count\n\n";
echo "Headers:\n" . print_r($a_headers, true) . "\n\n";
Других решений пока нет …