메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


1. hive table(file을 바라보고 있으며 hbase table(아래의 hbase_mytable)에 값을 넣기 위한 src table) 을  external로  table 생성

CREATE EXTERNAL TABLE IF NOT EXISTS external_file
     (
     FOO STRING,
     BAR STRING
     )
     COMMENT 'TEST TABLE OF EMP_IP_TABLE'
     ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
     STORED AS TEXTFILE LOCATION '/data';

 

* /data에 들어 있는 파일 내용

hadoop@bigdata-host:~/hive/conf$ hadoop fs -cat /data/external_file.txt
a,b
a1,b1
a2,b2
a3,b3

2.hive table( hbase  table을 바라보는 테이블)생성

CREATE EXTERNAL TABLE hbase_mytable(table_id string, foo string, bar string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:foo,cf:bar")
TBLPROPERTIES("hbase.table.name" = "mytable");

 

3. hive기동시 아래와 같이 jar를 포함해준다.

adoop@bigdata-host:~/hive/bin$ hive --auxpath /home/hadoop/hive/lib/hbase-0.94.6.1.jar,/home/hadoop/hive/lib/zookeeper-3.4.3.jar,/home/hadoop/hive/lib/hive-hbase-handler-0.11.0.jar,/home/hadoop/hive/lib/guava-11.0.2.jar,/home/hadoop/hive/lib/hive-contrib-0.11.0.jar -hiveconf hbase.master=localhost:60000

4. hive에 들어가서.. table을 생성한 후 hbase table에 입력 실행결과......

hive> insert into table hbase_mytable select foo, foo, bar from external_file;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404111158_0008, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404111158_0008
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job  -kill job_201404111158_0008
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2014-04-11 13:56:49,322 Stage-0 map = 0%,  reduce = 0%
2014-04-11 13:56:55,482 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.74 sec
2014-04-11 13:56:56,500 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.74 sec
2014-04-11 13:56:57,518 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.74 sec
2014-04-11 13:56:58,541 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.74 sec
2014-04-11 13:56:59,561 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.74 sec
2014-04-11 13:57:00,587 Stage-0 map = 100%,  reduce = 100%, Cumulative CPU 1.74 sec
MapReduce Total cumulative CPU time: 1 seconds 740 msec
Ended Job = job_201404111158_0008
4 Rows loaded to hbase_mytable
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.74 sec   HDFS Read: 220 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 740 msec
OK
Time taken: 34.565 seconds

5. hbase_mytable의 값  확인(기존에 있던 값하고 새로 추가된 값이 같이 보인다.)

hive> select * from hbase_mytable;                                           
OK
2.5 1.3 NULL
a a b
a1 a1 b1
a2 a2 b2
a3 a3 b3
second 3 NULL
third NULL 3.14159
Time taken: 1.315 seconds, Fetched: 7 row(s)

6. hbase shell에서 확인

hbase(main):001:0> scan 'mytable'
ROW                        COLUMN+CELL                                                               
 2.5                       column=cf:foo, timestamp=1397112248576, value=1.3                         
 a                         column=cf:bar, timestamp=1397192214568, value=b                           
 a                         column=cf:foo, timestamp=1397192214568, value=a                           
 a1                        column=cf:bar, timestamp=1397192214568, value=b1                          
 a1                        column=cf:foo, timestamp=1397192214568, value=a1                          
 a2                        column=cf:bar, timestamp=1397192214568, value=b2                          
 a2                        column=cf:foo, timestamp=1397192214568, value=a2                          
 a3                        column=cf:bar, timestamp=1397192214568, value=b3                          
 a3                        column=cf:foo, timestamp=1397192214568, value=a3                          
 first                     column=cf:message, timestamp=1397109873612, value=hellp Hbase             
 second                    column=cf:foo, timestamp=1397112803662, value=3                           
 second2                   column=cf:foo2, timestamp=1397112883691, value=3                          
 third                     column=cf:bar, timestamp=1397109940598, value=3.14159                     
9 row(s) in 1.8090 seconds

 

 

번호 제목 날짜 조회 수
690 upsert구현방법(년-월-일 파티션을 기준으로) 및 테스트 script file 2018.07.03 5412
689 org.apache.hadoop.security.AccessControlException: Permission denied: user=hadoop, access=WRITE, inode="":root:supergroup:rwxr-xr-x 오류 처리방법 2014.07.05 5385
688 [Hive canary]Hive에 Metastore canary red alert및 hive log파일에 Duplicate entry '123456' for key 'NOTIFICATION_LOG_EVENT_ID'가 발생시 조치사항 2023.03.10 5353
687 mysql-server 기동시 Do you already have another mysqld server running on port 오류 발생할때 확인및 조치방법 2017.05.14 5346
686 banana pi에(lubuntu)에 hadoop설치하고 테스트하기 - 성공 file 2014.07.05 5344
685 빅데이터 분석을 위한 샘플 빅데이터 파일 다운로드 사이트 2014.04.28 5338
684 [CDP7.1.7]impala-shell수행시 간헐적으로 "-k requires a valid kerberos ticket but no valid kerberos ticket found." 오류 2023.11.16 5312
683 org.apache.hadoop.hbase.PleaseHoldException: Master is initializing 2013.03.15 5306
682 HBase, BigTable, Cassandra Schema Design file 2013.03.15 5298
681 Hive Query Examples from test code (1 of 2) 2014.03.26 5247
680 ping 안될때.. networking restart 날려주면 잘됨.. 2014.05.09 5219
679 [Kudu]ERROR: Unable to advance iterator for node with id '2' for Kudu table 'impala::core.pm0_abdasubjct': Network error: recv error from unknown peer: Transport endpoint is not connected (error 107) 2023.03.16 5212
678 Cloudera의 API를 이용하여 impala의 실행되었던 쿼리 확인하는 예시 2018.05.03 5210
677 HiveServer2인증을 PAM을 이용하도록 설정하는 방법 2018.07.21 5205
676 [Solr in Cloudera]Solr Data Directory변경 방법/절차 2023.04.21 5202
675 Windows7 64bit 환경에서 Apache Hadoop 2.7.1설치하기 2017.07.26 5201
674 kafka broker기동시 brokerId가 달라서 기동에 실패하는 경우 조치방법 2016.05.02 5190
673 의사분산모드에 hadoop설치및 ecosystem 환경 정리 2014.05.29 5170
672 [CDP7.1.3]Ranger WebUI에서 Error! Connection refused: Please check the KMS provider URL and whether the Ranager KMS is running발생시 조치 방법 2023.06.07 5141
671 [백업] 리눅스 시스템 백업하기 (Linux System Backup) - TAR 사용 시스템 전체 백업 2022.01.19 5140
위로