Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

hive Hive Query Examples from test code (2 of 2)

총관리자 2014.03.26 11:06 조회 수 : 5026

Hive query for adding jar or script files

set jar=${system:build.ivy.lib.dir}/default/derby-${system:derby.version}.jar;

add file ${hiveconf:jar}; -- 추가
list file;  -- 리스트
delete file ${hiveconf:jar}; -- 삭제

Hive example for creating table using RegexSerDe

CREATE TABLE serde_regex(
  host STRING,
  identity STRING,
  user STRING,
  time STRING,
  request STRING,
  status STRING,
  size STRING,
  referer STRING,
  agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\[[^\]]*\]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)(?: ([^ "]*|"[^"]*") ([^ "]*|"[^"]*"))?",
  "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
)
STORED AS TEXTFILE;

아파치 로그 파일

../data/files/apache.access.log

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

아파치 로그 파일2

../data/files/apache.access.2.log

127.0.0.1 - - [26/May/2009:00:00:00 +0000] "GET /someurl/?track=Blabla(Main) HTTP/1.1" 200 5864 - "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.65 Safari/525.19"

Hive query which contains “transform” using specific script.

set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
CREATE TABLE dest1(key INT, value STRING);

ADD FILE src/test/scripts/testgrep;

FROM (
  FROM src
  SELECT TRANSFORM(src.key, src.value)
         USING 'testgrep' AS (tkey, tvalue)
  CLUSTER BY tkey
) tmap
INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue;

SELECT dest1.* FROM dest1;

src/test/scripts/testgrep

#!/bin/bash
egrep '10.*'

exit 0;

Hive tablesamples example

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling

-- TABLESAMPLES
CREATE TABLE bucketized_src (key INT, value STRING)
CLUSTERED BY (key) SORTED BY (key) INTO 1 BUCKETS;

INSERT OVERWRITE TABLE bucketized_src
SELECT key, value FROM src WHERE key=66;

SELECT key FROM bucketized_src TABLESAMPLE(BUCKET 1 out of 1);

Hive query for creating table using specific delimiters

create table impressions (imp string, msg string)
row format delimited
fields terminated by 't'
lines terminated by 'n'
stored as textfile;

기본 파티션 이름 설정

create table default_partition_name (key int, value string) partitioned by (ds string);

set hive.exec.default.partition.name='some_other_default_partition_name';

alter table default_partition_name add partition(ds='__HIVE_DEFAULT_PARTITION__');

show partitions default_partition_name;

문자열 상수 처리

SELECT 'face''book', 'face' 'book', 'face'
                                'book',
   "face""book", "face" "book", "face"
                                "book",
   'face' 'bo' 'ok', 'face'"book",
   "face"'book', 'facebook' FROM src LIMIT 1;

결과

facebook    facebook    facebook    facebook    facebook    facebook    facebook    facebook    facebook    facebook

Hive table lock examples

CREATE TABLE tstsrc (col1 STRING) STORED AS TEXTFILE;

SHOW LOCKS;
SHOW LOCKS tstsrc;
SHOW LOCKS tstsrc extended;

LOCK TABLE tstsrc shared;
UNLOCK TABLE tstsrc;

Hive partition lock examples

LOCK TABLE tstsrcpart PARTITION(ds='2008-04-08', hr='11') EXCLUSIVE;

SHOW LOCKS tstsrcpart PARTITION(ds='2008-04-08', hr='11') extended;

UNLOCK TABLE tstsrcpart PARTITION(ds='2008-04-08', hr='11');

Hive virtual column

0.8.0 부터 INPUT_FILENAME, BLOCKOFFSETINSIDE_FILE 두 개 가상 컬럼 지원함

INPUT_FILE_NAME는 맵퍼 테스크의 파일 이름
BLOCK_OFFSETINSIDE_FILE는 현재 글로벌 파일 포지션

블락이 압축된 파일인 경운 현재 블락의 파일 오프셋은 현재 블락의 첫번째 바이트의 파일 오프셋이다.

select INPUT__FILE__NAME, key, BLOCK__OFFSET__INSIDE__FILE from src;

로컬 디렉토리에 결과 쓰기

FROM src INSERT OVERWRITE DIRECTORY '../build/contrib/hive/ql/test/data/warehouse/dest4.out' SELECT src.value WHERE src.key >= 300

dfs -cat ../build/contrib/hive/ql/test/data/warehouse/dest4.out/*;

Hive example for comparison of timestamp values

select cast('2011-05-06 07:08:09' as timestamp) > 
  cast('2011-05-06 07:08:09' as timestamp) from src limit 1;

Hive type casting

SELECT IF(false, 1, cast(2 as smallint)) + 3 FROM src LIMIT 1;

Show table properties in Hive

show tblproperties tmpfoo;
show tblproperties tmpfoo("bar");

Display functions in Hive CLI

SHOW FUNCTIONS;

SHOW FUNCTIONS '^c.*';

SHOW FUNCTIONS '.*e$';

SHOW FUNCTIONS 'log.*';

SHOW FUNCTIONS '.*date.*';

SHOW FUNCTIONS '***';

Show colums in Hive

CREATE TABLE shcol_test(KEY STRING, VALUE STRING) PARTITIONED BY(ds STRING) STORED AS TEXTFILE;

SHOW COLUMNS from shcol_test;

Reset hive settings

set hive.skewjoin.key;
set hive.skewjoin.mapjoin.min.split;
set hive.skewjoin.key=300000;
set hive.skewjoin.mapjoin.min.split=256000000;
set hive.skewjoin.key;
set hive.skewjoin.mapjoin.min.split;

reset;

set hive.skewjoin.key;
set hive.skewjoin.mapjoin.min.split;

Print column header in Hive CLI

set hive.cli.print.header=true;

프로그래스 heartbeat 간격

set hive.heartbeat.interval=5;

DDL 관련 출력 포맷을 json으로 변경

set hive.ddl.output.format=json;

desc extended table_name;

set hive.ddl.output.format=text; -- 기본값

이 게시물을

번호	제목	글쓴이	날짜	조회 수
741	VirtualBox에 ubuntu 설치 하기 (12.10)	구퍼	2013.03.04	1768
740	우분투 root 패스워드 설정하기	구퍼	2013.03.04	1314
739	메이븐 (maven) 설치 및 이클립스 연동하기	구퍼	2013.03.06	2280
738	Hadoop 설치 및 시작하기	구퍼	2013.03.06	1951
737	Hadoop wordcount 소스 작성	구퍼	2013.03.06	1888
736	이클립스에서 생성한 jar 파일 hadoop 으로 실행하기	구퍼	2013.03.06	2836
735	ExWordCount jar파일	구퍼	2013.03.06	1336
734	Hadoop Cluster 설치 (Hadoop+Zookeeper+Hbase)	구퍼	2013.03.07	3995
733	Hive+mysql 설치 및 환경구축하기	구퍼	2013.03.07	2722
732	Hive 사용법 및 쿼리 샘플코드	구퍼	2013.03.07	2991
731	hadoop 설치(3대)	구퍼	2013.03.07	2613
730	hadoop설치시 참고사항	구퍼	2013.03.08	2131
729	MySQL 다운로드 및 리눅스에서 간단 컴파일 설치	구퍼	2013.03.08	1869
728	checking for termcap functions library... configure: error: No curses/termcap library found	구퍼	2013.03.08	4120
727	../depcomp: line 512 exec : g++ : not found	구퍼	2013.03.08	2062
726	org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-root/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.	구퍼	2013.03.11	14781
725	HBase 설치하기 – Pseudo-distributed	구퍼	2013.03.12	2644
724	HBase 설치하기 – Fully-distributed	구퍼	2013.03.12	3548
723	Cacti로 Hadoop 모니터링 하기	구퍼	2013.03.12	2367
722	org.apache.hadoop.hbase.PleaseHoldException: Master is initializing	구퍼	2013.03.15	2668

쓰기 태그

첫 페이지 1 2 3 4 5 6 7 8 9 10 끝 페이지

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

hive Hive Query Examples from test code (2 of 2)

Hive query for adding jar or script files

Hive example for creating table using RegexSerDe

Hive query which contains “transform” using specific script.

Hive tablesamples example

Hive query for creating table using specific delimiters

기본 파티션 이름 설정

문자열 상수 처리

Hive table lock examples

Hive partition lock examples

Hive virtual column

로컬 디렉토리에 결과 쓰기

Hive example for comparison of timestamp values

Hive type casting

Show table properties in Hive

Display functions in Hive CLI

Show colums in Hive

Reset hive settings

Print column header in Hive CLI

프로그래스 heartbeat 간격

DDL 관련 출력 포맷을 json으로 변경

댓글 0

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다. 필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

hive Hive Query Examples from test code (2 of 2)

Hive query for adding jar or script files

Hive example for creating table using RegexSerDe

Hive query which contains “transform” using specific script.

Hive tablesamples example

Hive query for creating table using specific delimiters

기본 파티션 이름 설정

문자열 상수 처리

Hive table lock examples

Hive partition lock examples

Hive virtual column

로컬 디렉토리에 결과 쓰기

Hive example for comparison of timestamp values

Hive type casting

Show table properties in Hive

Display functions in Hive CLI

Show colums in Hive

Reset hive settings

Print column header in Hive CLI

프로그래스 heartbeat 간격

DDL 관련 출력 포맷을 json으로 변경

댓글 0

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc. We are open to the required minutes. Please send inquiries to gooper@gooper.com.

LOGIN

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.