메뉴 건너뛰기

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.


1. hive table(file을 바라보고 있으며 hbase table(아래의 hbase_mytable)에 값을 넣기 위한 src table) 을  external로  table 생성

CREATE EXTERNAL TABLE IF NOT EXISTS external_file
     (
     FOO STRING,
     BAR STRING
     )
     COMMENT 'TEST TABLE OF EMP_IP_TABLE'
     ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
     STORED AS TEXTFILE LOCATION '/data';

 

* /data에 들어 있는 파일 내용

hadoop@bigdata-host:~/hive/conf$ hadoop fs -cat /data/external_file.txt
a,b
a1,b1
a2,b2
a3,b3

2.hive table( hbase  table을 바라보는 테이블)생성

CREATE EXTERNAL TABLE hbase_mytable(table_id string, foo string, bar string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:foo,cf:bar")
TBLPROPERTIES("hbase.table.name" = "mytable");

 

3. hive기동시 아래와 같이 jar를 포함해준다.

adoop@bigdata-host:~/hive/bin$ hive --auxpath /home/hadoop/hive/lib/hbase-0.94.6.1.jar,/home/hadoop/hive/lib/zookeeper-3.4.3.jar,/home/hadoop/hive/lib/hive-hbase-handler-0.11.0.jar,/home/hadoop/hive/lib/guava-11.0.2.jar,/home/hadoop/hive/lib/hive-contrib-0.11.0.jar -hiveconf hbase.master=localhost:60000

4. hive에 들어가서.. table을 생성한 후 hbase table에 입력 실행결과......

hive> insert into table hbase_mytable select foo, foo, bar from external_file;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404111158_0008, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404111158_0008
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job  -kill job_201404111158_0008
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2014-04-11 13:56:49,322 Stage-0 map = 0%,  reduce = 0%
2014-04-11 13:56:55,482 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.74 sec
2014-04-11 13:56:56,500 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.74 sec
2014-04-11 13:56:57,518 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.74 sec
2014-04-11 13:56:58,541 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.74 sec
2014-04-11 13:56:59,561 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.74 sec
2014-04-11 13:57:00,587 Stage-0 map = 100%,  reduce = 100%, Cumulative CPU 1.74 sec
MapReduce Total cumulative CPU time: 1 seconds 740 msec
Ended Job = job_201404111158_0008
4 Rows loaded to hbase_mytable
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.74 sec   HDFS Read: 220 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 740 msec
OK
Time taken: 34.565 seconds

5. hbase_mytable의 값  확인(기존에 있던 값하고 새로 추가된 값이 같이 보인다.)

hive> select * from hbase_mytable;                                           
OK
2.5 1.3 NULL
a a b
a1 a1 b1
a2 a2 b2
a3 a3 b3
second 3 NULL
third NULL 3.14159
Time taken: 1.315 seconds, Fetched: 7 row(s)

6. hbase shell에서 확인

hbase(main):001:0> scan 'mytable'
ROW                        COLUMN+CELL                                                               
 2.5                       column=cf:foo, timestamp=1397112248576, value=1.3                         
 a                         column=cf:bar, timestamp=1397192214568, value=b                           
 a                         column=cf:foo, timestamp=1397192214568, value=a                           
 a1                        column=cf:bar, timestamp=1397192214568, value=b1                          
 a1                        column=cf:foo, timestamp=1397192214568, value=a1                          
 a2                        column=cf:bar, timestamp=1397192214568, value=b2                          
 a2                        column=cf:foo, timestamp=1397192214568, value=a2                          
 a3                        column=cf:bar, timestamp=1397192214568, value=b3                          
 a3                        column=cf:foo, timestamp=1397192214568, value=a3                          
 first                     column=cf:message, timestamp=1397109873612, value=hellp Hbase             
 second                    column=cf:foo, timestamp=1397112803662, value=3                           
 second2                   column=cf:foo2, timestamp=1397112883691, value=3                          
 third                     column=cf:bar, timestamp=1397109940598, value=3.14159                     
9 row(s) in 1.8090 seconds

 

 

번호 제목 글쓴이 날짜 조회 수
130 [Impala] alter table구문수행시 "WARNINGS: Impala does not have READ_WRITE access to path 'hdfs://nameservice1/DATA/Temp/DB/source/table01_ccd'" 발생시 조치 gooper 2024.04.26 0
129 [CDP7.1.7, Hive Replication]Hive Replication진행중 "The following columns have types incompatible with the existing columns in their respective positions " 오류 gooper 2023.12.27 7
128 [CDP7.1.7]Oozie job에서 ERROR: Kudu error(s) reported, first error: Timed out: Failed to write batch of 774 ops to tablet 8003f9a064bf4be5890a178439b2ba91가 발생하면서 쿼리가 실패하는 경우 gooper 2024.01.05 7
127 [CDP7.1.7]impala-shell수행시 간헐적으로 "-k requires a valid kerberos ticket but no valid kerberos ticket found." 오류 gooper 2023.11.16 11
126 임시 테이블에서 데이터를 읽어서 partitioned table에 입력하는 impala SQL문 예시 gooper 2023.11.10 16
125 [impala]insert into db명.table명 select a, b from db명.table명 쿼리 수행시 "Memory limit exceeded: Failed to allocate memory for Parquet page index"오류 조치 방법 gooper 2023.05.31 22
124 not leader of this config: current role FOLLOWER 오류 발생시 확인방법 총관리자 2022.01.17 23
123 kudu table와 impala(hive) table정보가 틀어져서 테이블을 읽지 못하는 경우(Error Loading Metadata) 조치방법 gooper 2023.11.10 25
122 [CDP7.1.7]Impala Query의 Memory Spilled 양은 ScratchFileUsedBytes값을 누적해서 구할 수 있다. gooper 2022.07.29 29
121 [Cloudera 6.3.4, Kudu]]Service Monitor에서 사용하는 metric중에 일부를 blacklist로 설정하여 모니터링 정보 수집 제외하는 방법 gooper 2022.07.08 31
120 Failed to write to server: (no server available): 총관리자 2022.01.17 32
119 AnalysisException: Incomplatible return type 'DECIMAL(38,0)' and 'DECIMAL(38,5)' of exprs가 발생시 조치 총관리자 2021.07.26 34
118 spark에서 hive table을 읽어 출력하는 예제 소스 총관리자 2017.03.09 35
117 [TLS/SSL]Kudu Tablet Server설정 총관리자 2022.05.13 35
116 spark에서 hive table을 읽어 출력하는 예제 소스 총관리자 2017.03.09 37
115 [KUDU] kudu tablet server여러가지 원인에 의해서 corrupted상태가 된 경우 복구방법 gooper 2023.03.28 37
114 [CDP7.1.7]impala-shell을 이용하여 kudu table에 insert/update수행시 발생하는 오류(Transport endpoint is not connected (error 107)) 발생시 확인할 내용 gooper 2023.11.30 45
113 spark 온라인 책자링크 (제목 : mastering-apache-spark) 총관리자 2016.05.25 51
112 [hive] hive.tbls테이블의 owner컬럼값은 hadoop.security.auth_to_local에 의해서 filtering된다. 총관리자 2022.04.14 55
111 [Impala jdbc]CDP7.1.7환경에서 java프로그램을 이용하여 kerberized impala cluster에 접근하여 SQL을 수행하는 방법 gooper 2023.08.22 57

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

위로