Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

hive json 값 다루기

총관리자 2014.04.17 10:26 조회 수 : 1222

1. json형식의 data파일 생성

hadoop@bigdata-host:~/hadoop/working$ vi simple.json
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}

2. data를 담을 table 생성

create table json_table (json string);

3. data파일을 table에 입력

hive> load data local inpath '/home/hadoop/hadoop/working/simple.json' into table json_table;
Copying data from file:/home/hadoop/hadoop/working/simple.json
Copying file: file:/home/hadoop/hadoop/working/simple.json
Loading data to table default.json_table
Table default.json_table stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 77, raw_data_size: 0]
OK

4. table내용 확인

hive> select * from json_table;
OK
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}

5. json을 컬럼 형태로 query하기(get_json_object이용)

select get_json_object(json_table.json, '$.Foo') as foo,

get_json_object(json_table.json, '$.Bar') as bar,

get_json_object(json_table.json, '$.Quux.QuuxId') as qid,

get_json_object(json_table.json, '$.Quux.QuuxName') as qname

from json_table;

-------------------------->

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404170922_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404170922_0003
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201404170922_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-04-17 10:30:42,028 Stage-1 map = 0%, reduce = 0%
2014-04-17 10:30:48,109 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:49,128 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:50,146 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:51,162 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:52,175 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:53,206 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.87 sec
MapReduce Total cumulative CPU time: 1 seconds 870 msec
Ended Job = job_201404170922_0003
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.87 sec HDFS Read: 295 HDFS Write: 28 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 870 msec
OK
ABC 20090101100000 1234 Sam
Time taken: 19.411 seconds, Fetched: 1 row(s)

6. json을 컬럼 형태로 query하기(json_tuple이용)

select v.foo, v.bar, v.quux, v.qid

from json_table jt

lateral view json_tuple(jt.json, 'Foo', 'Bar', 'Quux', 'Quux.QuuxId') v

as foo, bar, quux, qid;

------------>

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404170922_0004, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404170922_0004
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201404170922_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-04-17 10:36:41,978 Stage-1 map = 0%, reduce = 0%
2014-04-17 10:36:49,125 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:50,156 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:51,170 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:52,193 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:53,219 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.53 sec
MapReduce Total cumulative CPU time: 1 seconds 530 msec
Ended Job = job_201404170922_0004
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.53 sec HDFS Read: 295 HDFS Write: 55 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 530 msec
OK
foo bar quux qid
ABC 20090101100000 {"QuuxId":1234,"QuuxName":"Sam"} NULL
Time taken: 26.494 seconds, Fetched: 1 row(s)

이 게시물을

번호	제목	글쓴이	날짜	조회 수
10	json serde사용법	총관리자	2014.04.17	1140
»	json 값 다루기	총관리자	2014.04.17	1222
8	hive에서 생성된 external table에서 hbase의 table에 값 insert하기	총관리자	2014.04.11	1748
7	external partition table생성및 data확인	총관리자	2014.04.03	1072
6	의사분산모드에서 presto설치하기	총관리자	2014.03.31	3050
5	Hive Query Examples from test code (2 of 2)	총관리자	2014.03.26	5018
4	Hive Query Examples from test code (1 of 2)	총관리자	2014.03.26	1050
3	Hive java connection 설정	구퍼	2013.04.01	2013
2	Hive 사용법 및 쿼리 샘플코드	구퍼	2013.03.07	2991
1	Hive+mysql 설치 및 환경구축하기	구퍼	2013.03.07	2722

쓰기 태그

첫 페이지 2 3 4 5 6 7 끝 페이지

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

hive json 값 다루기

댓글 0

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다. 필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

hive json 값 다루기

댓글 0

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc. We are open to the required minutes. Please send inquiries to gooper@gooper.com.

LOGIN

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.