要从 Crunchyroll 网页中提取字幕的 ssid,即特定链接中的数字。例如,从以下链接中提取数字:
http://www.crunchyroll.com/i-cant-understand-what-my-husband-is-saying/episode-1-wriggling-memories-678035?ssid=154757
希望提取 "154757",但目前的 Python 脚本无法正常工作。
import feedparser
import re
import urllib2
from urllib2 import urlopen
from bs4 import BeautifulSoup
feed = feedparser.parse('http://www.crunchyroll.com/rss/anime')
url1 = feed['entries'][0]['link']
soup = BeautifulSoup(urlopen(url1), 'html.parser')
- 解决方案
问题在于当前代码使用了 urllib2 和 re 模块,可以简化代码,使用 feedparser 和 BeautifulSoup 模块即可实现所需的功能。以下是如何修改代码来搜索并提取特定数字:
import feedparser
import requests
from bs4 import BeautifulSoup
d = feedparser.parse('http://www.crunchyroll.com/rss/anime')
for url in d.entries:
# 获取链接
r = requests.get(url.link)
soup = BeautifulSoup(r.text)
# 找到包含字幕信息的元素
subtitles = soup.find_all('span', {'class': 'showmedia-subtitle-text'})
# 提取 ssid
for ssid in subtitles:
links = ssid.findAll('a')
for a in links:
print(a['href'])
此代码将解析 Crunchyroll 的 RSS 提要,并针对每个条目执行以下操作:
- 使用
requests库获取链接的 HTML 内容。 - 使用
BeautifulSoup解析 HTML 内容。 - 找到包含字幕信息的元素。
- 提取字幕链接中的 ssid。
输出结果类似如下:
/i-cant-understand-what-my-husband-is-saying/episode-12-baby-skip-beat-678057?ssid=166035
/i-cant-understand-what-my-husband-is-saying/episode-12-baby-skip-beat-678057?ssid=165817
/i-cant-understand-what-my-husband-is-saying/episode-12-baby-skip-beat-678057?ssid=165819
/i-cant-understand-what-my-husband-is-saying/episode-12-baby-skip-beat-678057?ssid=166783
/i-cant-understand-what-my-husband-is-saying/episode-12-baby-skip-beat-678057?ssid=165839
/i-cant-understand-what-my-husband-is-saying/episode-12-baby-skip-beat-678057?ssid=165989
/i-cant-understand-what-my-husband-is-saying/episode-12-baby-skip-beat-678057?ssid=166051
/urawa-no-usagi-chan/episode-11-if-i-retort-i-lose-678873?ssid=166011
/urawa-no-usagi-chan/episode-11-if-i-retort-i-lose-678873?ssid=165995
/urawa-no-usagi-chan/episode-11-if-i-retort-i-lose-678873?ssid=165997
/urawa-no-usagi-chan/episode-11-if-i-retort-i-lose-678873?ssid=166033
/urawa-no-usagi-chan/episode-11-if-i-retort-i-lose-678873?ssid=165825
/urawa-no-usagi-chan/episode-11-if-i-retort-i-lose-678873?ssid=166013
/urawa-no-usagi-chan/episode-11-if-i-retort-i-lose-678873?ssid=166009
/urawa-no-usagi-chan/episode-11-if-i-retort-i-lose-678873?ssid=166003
/etotama/episode-11-catrat-shuffle-678659?ssid=166007
/etotama/episode-11-catrat-shuffle-678659?ssid=165969
/etotama/episode-11-catrat-shuffle-678659?ssid=166489
/etotama/episode-11-catrat-shuffle-678659?ssid=166023
/etotama/episode-11-catrat-shuffle-678659?ssid=166015
/etotama/episode-11-catrat-shuffle-678659?ssid=166049
/etotama/episode-11-catrat-shuffle-678659?ssid=165993
/etotama/episode-11-catrat-shuffle-678659?ssid=165981
您可以根据需要进一步处理这些结果,例如提取 ssid 并将其存储在列表中。