Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

hive json 값 다루기

총관리자 2014.04.17 10:26 조회 수 : 1222

1. json형식의 data파일 생성

hadoop@bigdata-host:~/hadoop/working$ vi simple.json
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}

2. data를 담을 table 생성

create table json_table (json string);

3. data파일을 table에 입력

hive> load data local inpath '/home/hadoop/hadoop/working/simple.json' into table json_table;
Copying data from file:/home/hadoop/hadoop/working/simple.json
Copying file: file:/home/hadoop/hadoop/working/simple.json
Loading data to table default.json_table
Table default.json_table stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 77, raw_data_size: 0]
OK

4. table내용 확인

hive> select * from json_table;
OK
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}

5. json을 컬럼 형태로 query하기(get_json_object이용)

select get_json_object(json_table.json, '$.Foo') as foo,

get_json_object(json_table.json, '$.Bar') as bar,

get_json_object(json_table.json, '$.Quux.QuuxId') as qid,

get_json_object(json_table.json, '$.Quux.QuuxName') as qname

from json_table;

-------------------------->

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404170922_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404170922_0003
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201404170922_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-04-17 10:30:42,028 Stage-1 map = 0%, reduce = 0%
2014-04-17 10:30:48,109 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:49,128 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:50,146 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:51,162 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:52,175 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:53,206 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.87 sec
MapReduce Total cumulative CPU time: 1 seconds 870 msec
Ended Job = job_201404170922_0003
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.87 sec HDFS Read: 295 HDFS Write: 28 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 870 msec
OK
ABC 20090101100000 1234 Sam
Time taken: 19.411 seconds, Fetched: 1 row(s)

6. json을 컬럼 형태로 query하기(json_tuple이용)

select v.foo, v.bar, v.quux, v.qid

from json_table jt

lateral view json_tuple(jt.json, 'Foo', 'Bar', 'Quux', 'Quux.QuuxId') v

as foo, bar, quux, qid;

------------>

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404170922_0004, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404170922_0004
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201404170922_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-04-17 10:36:41,978 Stage-1 map = 0%, reduce = 0%
2014-04-17 10:36:49,125 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:50,156 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:51,170 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:52,193 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:53,219 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.53 sec
MapReduce Total cumulative CPU time: 1 seconds 530 msec
Ended Job = job_201404170922_0004
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.53 sec HDFS Read: 295 HDFS Write: 55 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 530 msec
OK
foo bar quux qid
ABC 20090101100000 {"QuuxId":1234,"QuuxName":"Sam"} NULL
Time taken: 26.494 seconds, Fetched: 1 row(s)

이 게시물을

번호	제목	글쓴이	날짜	조회 수
130	insert hbase by hive ... error occured after 5 hours..HMaster가 뜨지 않는 장애에 대한 복구 방법	총관리자	2014.04.29	7129
129	hive 2.0.1 설치및 mariadb로 metastore 설정	총관리자	2016.06.03	5184
128	Hive Query Examples from test code (2 of 2)	총관리자	2014.03.26	5005
127	Spark에서 Serializable관련 오류및 조치사항	총관리자	2017.04.21	4901
126	의사분산모드에서 presto설치하기	총관리자	2014.03.31	3050
125	Hive 사용법 및 쿼리 샘플코드	구퍼	2013.03.07	2991
124	spark-sql실행시 Caused by: java.lang.NumberFormatException: For input string: "0s" 오류발생시 조치사항	총관리자	2016.06.09	2802
123	Hive+mysql 설치 및 환경구축하기	구퍼	2013.03.07	2722
122	banana pi에 hive 0.13.1+mysql(metastore)설치	총관리자	2014.09.09	2406
121	Hive java connection 설정	구퍼	2013.04.01	2013
120	VisualVM 1.3.9을 이용한 spark-submit JVM 모니터링을 위한 설정및 spark-submit실행 옵션	총관리자	2016.10.28	1891
119	Spark 2.1.1 clustering(5대) 설치(YARN기반)	총관리자	2016.04.22	1882
118	hive에서 생성된 external table에서 hbase의 table에 값 insert하기	총관리자	2014.04.11	1748
117	index생성, 삭제, 활용	총관리자	2014.04.25	1702
116	java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from...오류해결방법	총관리자	2015.06.16	1692
115	impald에서 idle_query_timeout 와 idle_session_timeout 구분	총관리자	2021.05.20	1630
114	FAILED: IllegalStateException Variable substitution depth too large: 40 오류발생시 조치사항	총관리자	2014.08.19	1520
113	hiverserver2기동시 connection refused가 발생하는 경우 조치방법	총관리자	2014.05.22	1471
»	json 값 다루기	총관리자	2014.04.17	1222
111	schema설정없이 hive를 최초에 실행했을때 발생하는 오류메세지및 처리방법	총관리자	2016.09.25	1222

쓰기 태그

첫 페이지 1 2 3 4 5 6 7 끝 페이지

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

hive json 값 다루기

댓글 0

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다. 필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

hive json 값 다루기

댓글 0

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc. We are open to the required minutes. Please send inquiries to gooper@gooper.com.

LOGIN

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.