메뉴 건너뛰기

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.


hive json 값 다루기

총관리자 2014.04.17 10:26 조회 수 : 1222

1. json형식의 data파일 생성

hadoop@bigdata-host:~/hadoop/working$ vi simple.json
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}

2. data를 담을 table 생성

create table json_table (json string);

 

3. data파일을 table에 입력

hive> load data local inpath '/home/hadoop/hadoop/working/simple.json' into table json_table;
Copying data from file:/home/hadoop/hadoop/working/simple.json
Copying file: file:/home/hadoop/hadoop/working/simple.json
Loading data to table default.json_table
Table default.json_table stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 77, raw_data_size: 0]
OK

4. table내용 확인

hive> select * from json_table;
OK
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}

 

5. json을 컬럼 형태로 query하기(get_json_object이용)

select get_json_object(json_table.json, '$.Foo') as foo,

          get_json_object(json_table.json, '$.Bar') as bar,

          get_json_object(json_table.json, '$.Quux.QuuxId') as qid,

          get_json_object(json_table.json, '$.Quux.QuuxName') as qname

from json_table;

-------------------------->

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404170922_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404170922_0003
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job  -kill job_201404170922_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-04-17 10:30:42,028 Stage-1 map = 0%,  reduce = 0%
2014-04-17 10:30:48,109 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:49,128 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:50,146 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:51,162 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:52,175 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:53,206 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.87 sec
MapReduce Total cumulative CPU time: 1 seconds 870 msec
Ended Job = job_201404170922_0003
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.87 sec   HDFS Read: 295 HDFS Write: 28 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 870 msec
OK
ABC 20090101100000 1234 Sam
Time taken: 19.411 seconds, Fetched: 1 row(s)

6. json을 컬럼 형태로 query하기(json_tuple이용)

select v.foo, v.bar, v.quux, v.qid

from json_table jt

 lateral view json_tuple(jt.json, 'Foo', 'Bar', 'Quux', 'Quux.QuuxId') v

    as foo, bar, quux, qid;

------------>

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404170922_0004, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404170922_0004
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job  -kill job_201404170922_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-04-17 10:36:41,978 Stage-1 map = 0%,  reduce = 0%
2014-04-17 10:36:49,125 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:50,156 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:51,170 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:52,193 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:53,219 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.53 sec
MapReduce Total cumulative CPU time: 1 seconds 530 msec
Ended Job = job_201404170922_0004
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.53 sec   HDFS Read: 295 HDFS Write: 55 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 530 msec
OK
foo bar quux qid
ABC 20090101100000 {"QuuxId":1234,"QuuxName":"Sam"} NULL
Time taken: 26.494 seconds, Fetched: 1 row(s)

 

번호 제목 글쓴이 날짜 조회 수
40 [Kudu]ERROR: Unable to advance iterator for node with id '2' for Kudu table 'impala::core.pm0_abdasubjct': Network error: recv error from unknown peer: Transport endpoint is not connected (error 107) gooper 2023.03.16 536
39 Hadoop Clsuter에 이미 포함된 host의 hostname변경시 처리 절차 gooper 2023.03.24 14
38 [KUDU] kudu tablet server여러가지 원인에 의해서 corrupted상태가 된 경우 복구방법 gooper 2023.03.28 37
37 [DataNode]org.apache.hadoop.security.KerberosAuthException: failure to login: for principal: hdfs/datanode03@GOOPER.COM from keytab hdfs.keytab오류 gooper 2023.04.18 4502
36 [Ranger]계정에 admin권한(grant, create등)의 권한 부여 방법 gooper 2023.04.18 49
35 [Atlas Server]org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=atlas/node01.gooper.com@GOOPER.COM, scope=default:atlas_janus, params=[table=default:atlas_janus,], action-CREATE)] gooper 2023.05.15 66
34 Impala Admission Control 설정시 쿼리가 사용하는 메모리 사용량 판단 방법 gooper 2023.05.19 93
33 [impala]insert into db명.table명 select a, b from db명.table명 쿼리 수행시 "Memory limit exceeded: Failed to allocate memory for Parquet page index"오류 조치 방법 gooper 2023.05.31 22
32 [CDP7.1.3]Ranger WebUI에서 Error! Connection refused: Please check the KMS provider URL and whether the Ranager KMS is running발생시 조치 방법 gooper 2023.06.07 19
31 [Ranger]RangerAdminRESTClient Error gertting pplicies; Received NULL response!!, secureMode=true, user=rangerkms/node01.gooper.com@ GOOPER.COM (auth:KERBEROS), serviceName=cm_kms gooper 2023.06.27 24
30 [KTS Cluster의 Key Trustee Server]self-signed 인증서 발급및 설정 방법 gooper 2023.06.27 29
29 [Hadoop Encryption] Encryption Zone 생성/설정시 User:hadoop not allowed to do 'DECRYPT_EEK' ON 'testkey' 오류 발생 조치 사항 gooper 2023.06.28 18
28 [HDFS]Encryption Zone에 생성된 테이블 조회시 Failed to open HDFS file hdfs://nameservice1/tmp/zone1/sec_test_file.txt Error(255): Unknown error 255 Root cause: AuthorizationException: User:impala not allowd to do 'DECRYPT_EEK' on 'testkey' gooper 2023.06.29 53
27 [EncryptionZone]User:testuser not allowed to do "DECRYPT_EEK" on 'testkey' gooper 2023.06.29 11
26 [Encryption Zone]Encryption Zone에 생성된 table을 select할때 HDFS /tmp/zone1에 대한 권한이 없는 경우 gooper 2023.06.29 12
25 [CDP7.1.6,HDFS]HDFS파일을 삭제하고 Trash비움이 완료된후에도 HDFS 공간을 차지하고 있는 경우 확인/조치 방법 gooper 2023.07.17 16
24 oozie의 sqoop action수행시 ooize:launcher의 applicationId를 이용하여 oozie:action의 applicationId및 관련 로그를 찾는 방법 gooper 2023.07.26 10
23 [Hue admin]Add/Sync LDAP user, Sync LDAP users/groups 버튼 기능 설명 gooper 2023.08.09 15
22 [Hue metadata]Oracle에 있는 Hue 메타정보 테이블을 이용하여 coordinator와 workflow관계 목록을 추출하는 방법 gooper 2023.08.22 15
21 [Impala jdbc]CDP7.1.7환경에서 java프로그램을 이용하여 kerberized impala cluster에 접근하여 SQL을 수행하는 방법 gooper 2023.08.22 58

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

위로