小说搜索站快速搭建:2.内容页解析
三方框架
- JSOUP
- okhttp
解析要素
- 翻章:上一章
- 翻章:下一章
- 目录
- 内容
表设计
/** * 内容 */ private String content; @Field("content_title") private String contentTitle; @Field("chapter_url") private String chapterUrl; @Field("next_chapter_url") private String nextChapterUrl; @Field("last_chapter_url") private String lastChapterUrl;
解析代码
public BookChapter content(String url) { BookChapter bookChapter = new BookChapter(); BookSite bookSite = getSite(url); try { Document document = download(url); Element titleElement = document.selectFirst(bookSite.getContentTitle()); if (titleElement != null) { bookChapter.setName(titleElement.text()); } Element chapterElement = document.selectFirst(bookSite.getChapterUrl()); if (chapterElement != null) { bookChapter.setChapterUrl(chapterElement.absUrl("href")); } Element nextElement = document.selectFirst(bookSite.getNextChapterUrl()); if (nextElement != null) { bookChapter.setNextChapterUrl(nextElement.absUrl("href")); } Element lastElement = document.selectFirst(bookSite.getLastChapterUrl()); if (lastElement != null) { bookChapter.setLastChapterUrl(lastElement.absUrl("href")); } Element contentElement = document.selectFirst(bookSite.getContent()); if (contentElement != null) { contentElement.select("a").remove(); contentElement.select("script").remove(); contentElement.select("style").remove(); bookChapter.setContent(contentElement.html()); } } catch (IOException e) { log.error(e.getMessage(), e); } return bookChapter; }
最终结果
难点
技术没有什么难点,难在日常的维护上。
相关推荐
wikiwater 2020-10-27
IdeaElements 2020-08-19
Sophiego 2020-08-16
Kakoola 2020-08-01
Kakoola 2020-07-29
ELEMENTS爱乐冬雨 2020-07-18
ELEMENTS爱乐小超 2020-07-04
ELEMENTS爱乐小超 2020-07-04
Kakoola 2020-06-28
Feastaw 2020-06-18
Wmeng0 2020-06-14
ELEMENTS爱乐冬雨 2020-06-14
云之高水之远 2020-06-14
哈喽elements 2020-06-14
Feastaw 2020-06-11