爬虫爬取豆瓣电影、价格、书名
2023-12-26 16:35:07
1、爬取豆瓣电影top250
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
}
for i in range(0, 250, 25):
print(f"--------第{i+1}到{i+25}个电影------------")
response = requests.get(f"https://movie.douban.com/top250?start={i}", headers=headers)
if response.ok:
html = response.text
soup = BeautifulSoup(html, "html.parser")
all_titles = soup.findAll("span", attrs={"class": "title"})
j = i
for title in all_titles:
title_string = title.string
if "/" not in title_string:
j += 1
print(f"{j}、{title_string}")
else:
print("请求失败")
2、爬取价格
import requests
from bs4 import BeautifulSoup
content = requests.get("http://books.toscrape.com/").text
soup = BeautifulSoup(content, "html.parser")
# 因为价格在标签为p的里面,所以写p,它的属性为class="price_color"
all_prices = soup.findAll("p", attrs={"class": "price_color"})
print(all_prices)
for price in all_prices:
print(price.string[2:])
3、爬取书名
import requests
from bs4 import BeautifulSoup
content = requests.get("http://books.toscrape.com/").text
soup = BeautifulSoup(content, "html.parser")
# 因为书名在h3中,又包了一层a,所以先找h3,再找a
all_titles = soup.findAll("h3")
for title in all_titles:
all_links = title.findAll("a")
for link in all_links:
print(link.string)
文章来源:https://blog.csdn.net/Ling_Ze/article/details/135224823
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。 如若内容造成侵权/违法违规/事实不符,请联系我的编程经验分享网邮箱:veading@qq.com进行投诉反馈,一经查实,立即删除!
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。 如若内容造成侵权/违法违规/事实不符,请联系我的编程经验分享网邮箱:veading@qq.com进行投诉反馈,一经查实,立即删除!