Нужна помощь в преобразовании моего красивого кода (Python) в Stack Overflow

Question

Нужна помощь в преобразовании моего красивого кода (Python) в Stack Overflow

В настоящее время я нахожусь в процессе преобразования всего моего красивого супового кода в PHP, чтобы привыкнуть к PHP. Однако я столкнулся с небольшой проблемой, мой php-код будет работать только тогда, когда на вики-странице есть «Внешние ссылки» после первоначального запуска в html (например, Настоящий Детектив Вики). Я только что узнал, что это не всегда происходит, потому что не всегда может быть раздел «Внешние ссылки». Мне было интересно, можно ли как-нибудь преобразовать мой красивый суп-код в php-код, используя ту же технику, что и мой красивый суп-код?

import requests, re
from bs4 import BeautifulSoup

def get_date(url):
r = requests.get(url)

soup = BeautifulSoup(r.content)

date = soup.find_all("table", {"class": "infobox"})

for item in date:
dates = item.find_all("th")
for item2 in dates:
if item2.text == "Original run":
test2 = item2.find_next("td").text.encode("utf-8")
mysub = re.sub(r'\([^)]*\)', '', test2)
return my sub

и вот мой код php в настоящее время

<?php
// Defining the basic cURL function
function curl($url) {
$ch = curl_init();  // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url);    // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch);    // Closing cURL
return $data;   // Returning the data from the function
}
?>

<?php
// Defining the basic scraping function
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start));  // Stripping $start
$stop = stripos($data, $end);   // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop);    // Stripping all data from after and including the $end of the data to scrape
return $data;   // Returning the scraped data from the function
}
?>

<?php
$scraped_page = curl("http://en.wikipedia.org/wiki/The_Walking_Dead_(TV_series)");    // Downloading IMDB home page to variable $scraped_page
$scraped_data = scrape_between($scraped_page, "<table class=\"infobox vevent\" style=\"width:22em\">", "</table>");   // Scraping downloaded dara in $scraped_page for content between <title> and </title> tags
$original_run = mb_substr($scraped_data, strpos($scraped_data, "Original run")-2, strpos($scraped_data, "External links") - strpos($scraped_data, "Original run")-2);
echo $original_run;

?>

1

beautifulsoup php python

Решение

Другие решения

Других решений пока нет …

Источник

Accepted Answer

Рассматривали ли вы просто использование Википедия API? Сгенерированная вики-разметка, как правило, невероятно ужасна, и может измениться в любое время.

Кроме того, вместо попытки регулярного синтаксического анализа HTML или чего-то в PHP, просто используйте phpQuery библиотека с композитором, вы можете просто найти селектор table.infobox.vevent,

1