(爬)BBC News新聞版面
(爬)BBC News新聞版面
有時候想練練英文閱讀,都會從新聞或者實事著手練習,
突然想去爬爬BBC上的新聞來練習一下,一方面學爬新聞
一方面練看英文。
目標練站:
觀察網站結構:
總個標題,內文...都在這結構裡
標題在h3裡頭可以截取出來
內文在p裡面
發文時間的截取
新聞連結
大概知道要截取的方向後就用程式碼讀起來
import requests
from bs4 import BeautifulSoup
url='https://www.bbc.com/news'
html=requests.get(url)
soup=BeautifulSoup(html.text, 'lxml')
items=soup.find('div', 'gel-layout gel-layout--no-flex nw-c-top-stories--standard nw-c-top-stories--international')
for i in items:
title=i.find('h3')
time=i.find('time')
content=i.find('a')
p=i.find('p')
if title==None:
pass
else:
print("title: %s"%title.text)
if p==None:
pass
else:
print("text: %s"%a.text)
print("time: %s"%time.getText())
print('https://www.bbc.com'+content['href'])
print()
因為有些要爬取的內文Nonetype,所以我用了if-else來判斷
得到的結果如下:
PS D:\python> & "C:/Program Files (x86)/python3.9.0/python.exe" d:/python/bbcNews.py
title: Biden after win formalised: Time to turn the page
text: Joe Biden condemns Donald Trump's attempts to challenge the election result as his win is formalised.
time: 2h2 hours ago
https://www.bbc.com/news/election-us-2020-55312272
============================================================
title: Japan 'Twitter killer' sentenced to death
text: Takahiro Shiraishi was convicted for killing nine people he had contacted on the social media platform.
time: 4m5 minutes ago
https://www.bbc.com/news/world-asia-55313161
============================================================
title: New Uighur evidence 'game-changer' for fashion brands
title: Australia deluge sparks flood evacuation warnings
text: Some New South Wales residents are told to leave their homes, after storms batter two states.
time: 1han hour ago
https://www.bbc.com/news/world-australia-55311980
============================================================
title: Geminid meteor shower dazzles night skies
text: Some of the best views of the annual meteor shower were above Lijiang city in southwest China.
time: Video 56 seconds0:56
https://www.bbc.com/news/world-55301460
============================================================
title: Craig McLachlan not guilty in indecent assault case
text: The former Neighbours actor has consistently denied allegations raised by four women in Australia.
time: 5h5 hours ago
https://www.bbc.com/news/world-australia-55312358
============================================================
title: Desperation for 'unproven' Covid-19 treatment
text: Plasma therapy is allowed in many countries despite unanswered questions over its efficacy.
time: 7h7 hours ago
https://www.bbc.com/news/world-asia-india-55257669
============================================================
title: The day US began Covid vaccinations
text: A massive vaccination effort has kicked off with healthcare workers first to get a vaccine.
time: Video 1 minute 36 seconds1:36
https://www.bbc.com/news/world-us-canada-55312180
============================================================
留言
張貼留言