IP地址与域名的获取
package cn.hanquan.test;import .InetAddress;import .UnknownHostException;/** IP地址*/public class IPtest {public static void main(String[] args) throws InterruptedException, UnknownHostException {// 获取本机InetAddress addr = InetAddress.getLocalHost();System.out.println(addr.getHostAddress());// 返回地址System.out.println(addr.getHostName());// 返回计算机名// 根据域名返回地址addr = InetAddress.getByName("");System.out.println(addr.getHostAddress());System.out.println(addr.getHostName());}}
端口
大小:0-65535(2字节,16位),端口用于区分软件。
同一个协议下,端口不要冲突。不同的协议下,端口不建议冲突。如果冲突,不好区分。
比如(默认的)http协议就使用了80端口,8080端口是tomcat服务器的,1521是oracle的,3306是mysql的
不用担心端口不够用,因为电脑里不会装那么多软件(2*65535)
netstat -ano
查看端口
InetSocketAddress的使用
package cn.hanquan.test;import .InetSocketAddress;import .UnknownHostException;/** 端口*/public class IPtest {public static void main(String[] args) throws InterruptedException, UnknownHostException {// 构造方式1InetSocketAddress addr1 = new InetSocketAddress("127.0.0.1", 8080);System.out.println(addr1.getHostName());System.out.println(addr1.getAddress());System.out.println(addr1.getPort());// 构造方式2InetSocketAddress addr2 = new InetSocketAddress("", 9000);System.out.println(addr2.getHostName());System.out.println(addr2.getAddress());System.out.println(addr2.getPort());}}
URL
代码
运行结果
代码(续)
运行结果
网络爬虫
一个简单的爬虫
package cn.hanquan.test;import java.io.BufferedReader;import java.io.IOException;import java.io.InputStream;import java.io.InputStreamReader;import .URL;/** 网络爬虫的原理*/public class Spidertest {public static void main(String[] args) throws IOException {// 获取urlURL url = new URL("");// 下载资源InputStream is = url.openStream();BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"));String line = null;while (null != (line = br.readLine())) {System.out.println(line);}br.close();// ToDo: 分析、处理...}}
有的网页禁止用这种方式爬取,产生异常
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: at java.base/.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1913)at java.base/.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1509)at java.base/.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:245)at java.base/.URL.openStream(URL.java:1117)at cn.hanquan.test.Spidertest.main(Spidertest.java:18)
想要爬取这种网页,我们可以模拟浏览器进行操作:
看到User-Agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Mobile Safari/537.36
以下代码模拟了浏览器,可以正常爬取网页。
package cn.hanquan.test;import java.io.BufferedReader;import java.io.IOException;import java.io.InputStream;import java.io.InputStreamReader;import .HttpURLConnection;import .URL;/** 网络爬虫的原理+模拟浏览器*/public class Spidertest {public static void main(String[] args) throws IOException {// 获取urlURL url = new URL("");// 模拟浏览器HttpURLConnection conn = (HttpURLConnection) url.openConnection();// 模拟浏览器,需要用到http协议conn.setRequestMethod("GET");conn.setRequestProperty("User-Agent","Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Mobile Safari/537.36");// 下载资源BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));String line = null;while (null != (line = br.readLine())) {System.out.println(line);}br.close();// ToDo: 分析、处理...}}
TCP、UDP协议
使用UDP编程
DatagramPacket
DatagramSocket