1200字范文,内容丰富有趣,写作的好帮手!
1200字范文 > 当当图书分类html 基于httpclient与jsoup的抓取当当图书页面数据简单Demo

当当图书分类html 基于httpclient与jsoup的抓取当当图书页面数据简单Demo

时间:2021-03-14 02:38:13

相关推荐

当当图书分类html 基于httpclient与jsoup的抓取当当图书页面数据简单Demo

public classTest {/***

*简单抓取当当图书分类中某一页指定信息输出到控制台并保存到文件中*/

public static void main(String[] args) throwsIOException {

CloseableHttpClient httpclient=HttpClients.createDefault();//创建一个文件,用来保存信息

BufferedWriter writer=new BufferedWriter(new FileWriter("D:\book.csv"));try{//发送请求URL填入当当网图书分类某一页面的地址

HttpGet httpget = new HttpGet("/cp01.36.04.08.00.00.html");

System.out.println("Executing request " +httpget.getRequestLine());//Create a custom response handler

ResponseHandler responseHandler = response ->{int status =response.getStatusLine().getStatusCode();if (status >= 200 && status < 300) {

HttpEntity entity=response.getEntity();return entity != null ? EntityUtils.toString(entity) : null;

}else{throw new ClientProtocolException("Unexpected response status: " +status);

}

};//得到请求体也就是页面源码responseBody

String responseBody =httpclient.execute(httpget, responseHandler);

System.out.println("----------------------------------------");//使用Jsoup解析得到一个document对象,代表这个页面

Document document=Jsoup.parse(responseBody);//这是人为分析源码中的数据后,取docunment中需要的元素

Element pos=document.getElementsByClass("bigimg").get(0);

Elements list=pos.children();for(Element e:list){

Element name= e.getElementsByClass("pic").get(0);

Element detail= e.getElementsByClass("detail").get(0);

Element author= e.getElementsByAttributeValue("name","itemlist-author").get(0);

Element press= e.getElementsByAttributeValue("name","P_cbs").get(0);

Element market= e.getElementsByClass("search_pre_price").get(0);

Element sale= e.getElementsByClass("search_now_price").get(0);

System.out.println("图书名:"+name.attr("title"));

System.out.println("简介:"+detail.text());

System.out.println("作者:"+author.text());

System.out.println("出版社:"+press.text());

System.out.println("市场价:"+market.text());

System.out.println("惊喜价:"+sale.text());

System.out.println("--------------------");//添加要写入文件的信息

writer.write(name.attr("title")+","+detail.text()+","+author.text()+","+press.text());

writer.newLine();

}

}finally{

writer.close();

httpclient.close();

}

}

}

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。