爬虫抽取网页中同层级多标签元素
public static String getDesc(Document doc) { String desc = ""; Elements element_contents_all = doc.select("h1,h2,blockquote,p"); for (Element element : element_contents_all) { desc += element.text() + " "; if (desc.length() >= 400) { logger.info("\t内容超过400字,停止抽取。"); logger.info(desc); break; } } return desc; }
相关推荐
Lzs 2020-10-23
聚合室 2020-11-16
零 2020-09-18
Justhavefun 2020-10-22
jacktangj 2020-10-14
ChaITSimpleLove 2020-10-06
Andrea0 2020-09-18
周游列国之仕子 2020-09-15
afanti 2020-09-16
88234852 2020-09-15
YClimb 2020-09-15
风雨断肠人 2020-09-04
卖口粥湛蓝的天空 2020-09-15
stulen 2020-09-15
pythonxuexi 2020-09-06
abfdada 2020-08-26
梦的天空 2020-08-25