我需要从一个网站上刮取所有条款和条件,但只得到<br>标签。我尝试使用BeautifulSoup的.findAll()方法,但它只打印<br>。我也可以得到所有条款,但不是我想要的形式。有没有办法只刮取<br>之间的文本?
2、解决方案
一种方法是使用.get_text()方法,它可以获得所有文本和换行符:
terms_elements = terms_and_conditions_soup.select(".popBodyText")[0]
terms = terms_elements.get_text('\n', strip=True)
或者,你可以循环遍历.strings或.stripped_strings生成器:
terms = list(terms_elements.stripped_strings)
如果你只想要项目符号开头的行,可以这样选择:
terms = [t.lstrip('\u2022 ') for t in terms_elements.stripped_strings
if t.startswith('\u2022')]
这将移除项目符号并只返回项目符号开头的行。
代码例子
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get('http://deals.whotels.com/W-Guangzhou-3126/tnc/1680/24900/en').content)
terms_elements = soup.find(class_='popBodyText')
# 获得所有文本和换行符
terms = terms_elements.get_text('\n', strip=True)
# 循环遍历`.stripped_strings`生成器
terms = list(terms_elements.stripped_strings)
# 只返回项目符号开头的行
terms = [t.lstrip('\u2022 ') for t in terms_elements.stripped_strings
if t.startswith('\u2022')]
print(terms)
输出:
['Offer valid at W Guangzhou only.', 'Offer is valid for stays booked by December 30, 2014 and stays completed from December 30, 2014 to January 1, 2015.', 'Limited number of rooms available.', 'Minimum stay of 2 nights is required & must stay over December 31, 2014.', '15% service charge and tax is not included in the package and subject to change without any notice.', 'Breakfast to be consumed at the Kitchen Table restaurant on departure day. Guest will be eligible for the breakfast based on number of persons booked overnight. Additional persons will be charged at the restaurant according to retail price.', 'NYE dinner buffet to be consumed at The Kitchen Table on December 31, 2014 only. Two guests per room will be eligible for the dinner buffet, and additional guests will be charged at the restaurant according to retail price. Prior reservations for the additional guests are required.', 'Free access to the FEI NYEcountdown party on December 31, 2014 is limited to a maximum of 2 adults only per room. Guests under 18 years old will not be allowed. The tickets will not be sold to general public or any external guests. Please collect the passes at time of check in.', 'Alcoholic beverage service is restricted to those 18 years or older (with valid identification).', 'Massage treatment in the package is limited to 60min AWAY Spa Signature Massage only. Spa treatment cannot be cumulated & valid during stay only. Prior reservation is recommended for the Spa treatment. This is to ensure space availability and the hotel will not be held responsible for any unconsumed portion of the package.', 'All package components are not transferable and must be consumed during stay. If any portion is not consumed, they will not be refundable or exchangeable in cash.', 'Extra services & amenities not part of the package will be charged per consumption & will be on guest’s own expense.', 'All package amenities are per room/per night and will be presented upon arrival unless otherwise noted.', 'This offer is only available if booked via Starwood distribution channels. Offer will not be applicable if booked through third party distribution channels, travel agents or any other external websites.', 'Offer not applicable to groups nor is it combinable with other special/discounted rates.', 'Starwood Hotels & Resorts Worldwide, Inc. reserves the right to cancel this promotion at anytime without notice.', 'Not responsible for omissions or typographical errors. Void where prohibited by law. Not to be combined with offers or promotions.', 'Any unused portion/s of the package is not transferable or exchangeable for cash/credit.', 'Starpoints, SPG, Starwood Preferred Guest, Sheraton, Four Points, W, Aloft, Le Meridien, The Luxury Collection, Element, Westin, St. Regis and their respective logos are the trademarks of Starwood Hotels & Resorts Worldwide, Inc., or its affiliates.']