Hive内部表和外部表

Hive中的表有两种类型：受管理表和外部表。

受管理表（MANAGED_TABLE）也称之为内部表，默认存储在/user/hive/warehouse下，也可以在创建表时通过location子句指定。删除受管理表时，会删除表数据以及元数据。

例如，创建受管理表emp(映射到的HDFS文件中每行的字段是制表位分隔)的语法如下：

create table if not exists emp(
    	empno int,
	ename string,
	job string,
	mgr int,
	hiredate string,
	sal double,
	comm double,
	deptno int
) row format delimited 
fields terminated by '\t';

外部表（EXTERNAL_TABLE），在创建表时可以自己指定目录位置(使用LOCATION子句)；外部表删除表时，只会删除元数据不会删除表数据。如果数据在HDFS目录中已存在对应的结构，则不需要put数据，直接就可以读取里面的文件。

例如，创建外部表emp_ext的语法如下：

create external table if not exists emp_ext(
    	empno int,
	ename string,
	job string,
	mgr int,
	hiredate string,
	sal double,
	comm double,
	deptno int
) row format delimited fields terminated by '\t'
LOCATION '/user/hive/warehouse/emp';

一、准备数据文件

使用任何编辑器，如vi、vim或nano，在Linux本地创建一个文本文件peoples.txt，存储一些人员信息（以Tab制表位分隔字段）：

1	张三
2	李四
3	王老五

下面我们分别创建内部表和外部表，并加载数据内容到表中，再执行简单的查询。

二、使用内部表

1、新建一张以“TAB键”分隔字段的Hive表，名叫sample。加载Linux本地数据文件到Hive表中。

新建一张以“TAB键”分隔字段的Hive表，名叫sample：

create table sample(
    id  int,
    name string
)row format delimited 
fields terminated by '\t';

将以“tab键”分隔的文档数据加载到“sample”表中（从本地文件系统加载）：

$ load data local inpath '/home/hduser/data/peoples.txt' into table sample;

-- 或
-- load data local inpath '/home/hduser/data/peoples.txt' overwrite into table sample;

查看sample表的表结构：

desc sample;

查询sample表：

select * from sample;
select name,id from sample;

删除sample表：

drop table sample;

2、新建一张以“TAB键”分隔字段的Hive表，名叫sample_hdfs.txt。加载HDFS上已经存在的数据文件到Hive表中：

新建一张以“TAB键”分隔的表，名叫sample_hdfs：

-- 创建内部表
create table sample_hdfs(
    id int,
    name string
) row format delimited  
fields terminated by '\t';

-- 查看表结构
desc sample_hdfs;

将本地文件传到HDFS上：

$ hdfs dfs -mkdir /hive_data
$ hdfs dfs -put peoples.txt /hive_data/
$ hdfs dfs -ls /hive_data/

可以看到，数据文件peoples.txt已经被上传到HDFS的/hive/data/目录下了。

将以“tab键”分隔的hdfs文档数据加载到“sample_hdfs”表中（从HDFS上加载）

$ load data inpath '/hive_data/peoples.txt' into table sample_hdfs;

这时查看HDFS上的/hive_data目录，会发现peoples.txt已经不存在了(被移动到了Hive的数据仓库目录下了)：

$ hdfs dfs -ls /hive_data/

查询sample_hdfs表(如果出现null，说明表格式和数据格式不对应)：

select * from sample_hdfs;
select id,name from sample_hdfs;

删除表sample_hdfs

drop table sample_hdfs;
show tables;

三、使用外部表

Hive的数据模型与关系型数据库非常类似，由表模式、列、行和分区组成。这些对象是在Hive Metastore中定义的逻辑单位。

首先，将本地的peoples.txt文件传到HDFS上（主文件夹下）

$ hdfs dfs -mkdir -p /hive_data/p_tb
$ hdfs dfs -put peoples.txt /hive_data/p_tb/
$ hdfs dfs -ls /hive_data/p_tb/

新建一张以“TAB键”分隔的表，名叫sample_external（外部表）

create external table sample_external(
    id int,
    name string
) 
row format delimited  
fields terminated by '\t' 
location '/hive_data/p_tb';

show tables;

查询sample_external表(如果出现null，说明表格式和数据格式不对应)：

select * from sample_external;
select id,name from sample_external;

删除表sample_external：

drop table sample_external;
show tables;

这时查看HDFS上的/hive_data/p_tb/目录，会发现peoples.txt仍然存在-删除Hive外部表并不会删除对应的HDFS数据文件：

$ hdfs dfs -ls /hive_data/p_tb/

小白学苑

让大数据学习更简单