需求:
工作中需要计算上市公司污染排放数据,需要首先利用unescape方法对进行文本分析的数据预处理,html2 = """ My Second Heading
My second paragraph.
"""
html_list = [html1, html2]
for html in html_list: soup = BeautifulSoup(html, 'html.parser') text = soup.get_text() print(text) print('-----')
然后通过with参数进行转换后计算处理,最后利用分类分析法来进行单项计算和归类存储,用于后续的深度数据挖掘。
解决:
from bs4 import BeautifulSoup from html import unescape
html = """ My First & Heading
My first paragraph.
"""soup = BeautifulSoup(html, 'html.parser') text = soup.get_text() text = unescape(text)
print(text)
数据来源:上市公司污染排放数据