1200字范文 > 基于hive数仓的游戏指标分析

基于hive数仓的游戏指标分析

时间：2019-09-25 18:42:43

相关推荐

基于hive数仓的游戏指标分析

一、分析指标数据

二、基础层数据处理

1、先把全部数据导入到HDFS中

2、创建一个外部表，将数据导入到hive中

3、分割txt文件中的数据

4、取出需要的值，并建表

5、检查数据日期

三、设计展现层数据

四、建立数据中间层

五、新建maven工程JDBC连接hive与mysql

1、启动hive服务

2、向pom文件中导入依赖

3、编写java文件

六、将数据展示到前台

一、分析指标数据

1027.txt

0001000100010007|0007|0001|125.71.203.241|gs1001|d655f33d70064bc995b85d7b39f6789f|e1a5ced3528c4eac986dd64a837f7ba9|CharacterLogin|1414381913000|sanguo_01|1|0|1|0

0001000100010007|0007|0001|125.71.203.241|gs1001|540247b8573943429a6b543000cbe94b|98104ad0d9bd42cc8dece246aebf19af|CharacterLogin|1414381950000|sanguo_xy|1|0|1|0

<struct name="CharacterLogin" version="1" desc="（必填）账号登入登出信息 ">
<entry name="AppID" type="String" size="50" desc="(必填)应用ID"/>
<entry name="GameID" type="String" size="32" desc="(必填)游戏ID"/>
<entry name="ChildId" type="String" size="32" desc="(必填)子版本ID"/>
<entry name="IP" type="String" size="20" desc="(必填)登陆IP"/>
<entry name="ServerID" type="String" size="20" desc="(必填)服务器ID"/>
<entry name="AccountID" type="String" size="50" desc="(必填)账号ID"/>
<entry name="CharacterID" type="String" size="50" desc="(必填)角色ID"/>
<entry name="LogType" type="String" size="20" default="AccountLogin" desc="(必填)日志类型固定值"/>
<entry name="LogTime" type="int32" size="13" desc="(必填)日志时间单位毫秒"/>
<entry name="PlatformChannelId" type="String" size="100" desc="(必填)渠道平台ID"/>
<entry name="IsLogin" type="int" size="1" desc="(必填) 登陆标志 0：登出 1：登陆"/>
<entry name="OnlineTime" type="int" size="10" desc="(必填) 在线时长单位毫秒"/>
<entry name="Level" type="int" size="4" desc="(必填) 当前等级"/>
<entry name="VIPLevel" type="int" size="2" desc="(必填) VIP等级非VIP用 -1表示"/>
</struct>

静态观察CharacterLogin中的1027.txt数据文件，结合xml文件，发现有两个字段很有用，分别是LogTime和IsLogin。这里的IsLogin表示登陆和登出，无论取什么值，都可以看作登录。也就是说，有用的字段实际上只有2个。

二、基础层数据处理

1、先把全部数据导入到HDFS中

执行“hdfsdfsput/usr/game /”，数据就导入到hdfs的根目录下

2、创建一个外部表，将数据导入到hive中

创建外部表，数据自动导入

location的地址指的是HDFS中的位置

create external table t2(line string) location '/games/CharacterLogin';

3、分割txt文件中的数据

如：分割一行内容

select split(line,"\\|") from t2 limit 1;

4、取出需要的值，并建表

create table CharLogin as select split(line,'\\|')[5] as AccountID, split(line,'\\|')[6] as CharacterID, split(line,'\\|')[8] as LogTime, split(line,'\\|')[10] as IsLogin from t2;

5、检查数据日期

如将1414381913000去掉后三位并转换为bigint类型：-10-27

select from_unixtime(cast(substr(logtime,0,10) AS bigint),'yyyy-MM-dd') from charlogin limit 1;

三、设计展现层数据

在mysql中新建库表

CREATE TABLE n_days_stat( value INT(11) NOT NULL, statdate DATE NOT NULL );

四、建立数据中间层

创建数据中间表

create table temp_charlogin as select count(distinct accountid) as count, from_unixtime(int(logtime/1000),'yyyyMMdd') as logtime from charlogin group by from_unixtime(int(logtime/1000),'yyyyMMdd');

五、新建maven工程JDBC连接hive与mysql

1、启动hive服务

nohup hive --service hiveserver2 &

查看

ps -aux| grep hiveserver2

2、向pom文件中导入依赖

<dependencies><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.11</version><scope>test</scope></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.6.0</version></dependency><dependency><groupId>org.apache.hive</groupId><artifactId>hive-jdbc</artifactId><version>1.1.0</version></dependency><dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>5.1.25</version></dependency></dependencies>

3、编写java文件

package mysql;import mon.collect.Lists;import java.sql.*;import java.util.List;/*** @create -11-26 19:12*/public class Test01 {public static void main(String[] args) throws Exception {Class.forName("org.apache.hive.jdbc.HiveDriver");Connection hiveConn = DriverManager.getConnection("jdbc:hive2://hadoop101:10000/default", "root", "123456");Statement hiveStmt = hiveConn.createStatement();ResultSet hiveResultSet = hiveStmt.executeQuery("select count,logtime from temp_charlogin");List<String> resultList = Lists.newArrayList();while (hiveResultSet.next()) {resultList.add(hiveResultSet.getInt("count") + "\t" + hiveResultSet.getString("logtime"));}hiveConn.close();System.out.println(resultList.size());Class.forName("com.mysql.jdbc.Driver");Connection mysqlConn = DriverManager.getConnection("jdbc:mysql://hadoop101:3306/test", "root", "123456");PreparedStatement mysqlPst = mysqlConn.prepareStatement("insert into n_days_stat(value,statdate) values(?,?)");for (String s : resultList) {String[] split = s.split("\t");mysqlPst.setInt(1, Integer.parseInt(split[0]));mysqlPst.setString(2, split[1]);mysqlPst.execute();}mysqlPst.close();mysqlConn.close();}}

执行java文件（出现下图所示说明导入成功）

查看数据库（已经导入228条数据）