메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


1. Cassandra 1.2.11 다운로드/설치(cumulusRDF 1.0.1와 호환되며 테스트 된 버젼임)

 http://archive.apache.org/dist/cassandra/


 *참고1 : https://www.gooper.com/ss/index.php?mid=bigdata&category=2803&document_srl=3110 (버젼이 다르지만 언급된 항목에 대한 설정은 같으므로 참고하여 설정해준다)

 *참고2 : CumulusRDF 1.1.0이 Cassandra 1.2.X만 지원하므로 1.2.X를 다운받아 설치해야한다.


2. CumulusRDF 1.0.1 Web Application다운로드및 설치(직접 maven으로 빌드하면.. *.jar, war파일이 만들어지나 잘안됨(?))

  https://github.com/cumulusrdf/cumulusrdf/wiki/Downloads

  에서 March 11th 2014: CumulusRDF v1.0.1 war파일을 다운로드 받아서 WAS에 deploy한다.

  (예, tomcat의 경우 webapps폴더 밑에 두면 파일명을 context명으로 자동설치된다)


* 참고1 : https://github.com/cumulusrdf/cumulusrdf/wiki
* 참고2 : http://xxx.xxx.xxx.43:8080/cumulusrdf-1.0.1/info에 접근하면 web페이지에서 query및 bulkupload를 할 수있다. 

3. CumulusRDF 1.0.1 CLI 툴 다운로드및 설치

  https://github.com/cumulusrdf/cumulusrdf/wiki/Downloads

  에서 March 11th 2014: CumulusRDF v1.0.1 CLI jar를 다운로드 받아서 적절한 위치에 복사한다.

 * 참고1 : https://github.com/cumulusrdf/cumulusrdf/wiki/CLI)

 * 참고2 : dump, load, query, remove를 실행할 수 있는 jar파일임


 

4. 첨부된 1.0.0버젼의 CLI jar파일은 load할때 아래와 같이 사용한다.(이것은 thread개수를 지정할 수 있는데.. CumulusRDF 1.0.1 CLI등은 사용법이르며 일부기능이 지원되지 않음)

java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Load -i ./icbms_2016-04-15_15-10-16.nq -b 10000 -t 8

배치 10000, 쓰레드 8개로 nq파일을 업로드함

: 첨부파일을 이용할것


----------------------------첨부된 jar파일 사용시 가능한 옵션(Load, Dump, Query, Remove별로 다름)----------------------

-bash-4.1# java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Load -help

***ERROR: class org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -help

usage: parameters:

 -b <arg>   batch size - number of triples (default: 100)

 -f <arg>   format ('nt', 'nq' or 'xml') (default: 'nt')

 -h         print help

 -i <arg>   name of file to read, - for stdin (but then need to specify -x

            option)

 -k <arg>   Cassandra keyspace (default KeyspaceCumulus)

 -n <arg>   Cassandra hosts as comma-separated list

            ('host1:port1,host2:port2,...') (default localhost:9160)

 -r <arg>   replication factor  (default: 1)

 -s <arg>   storage layout to use (triple|quad) (needs to match webapp

            configuration)

 -t <arg>   number of loading threads (defaults to min(1,|hosts|/1.5))

time elapsed 6 ms

-bash-4.1# java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Dump -help

***ERROR: class org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -help

usage: parameters:

 -h         print help

 -k <arg>   Cassandra keyspace (default KeyspaceCumulus)

 -n <arg>   Cassandra hosts as comma-separated list

            ('host1:port1,host2:port2,...') (default localhost:9160)

 -o <arg>   name of output file

time elapsed 6 ms

-bash-4.1# java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Query -help

***ERROR: class org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -help

usage: parameters:

 -h         print help

 -k <arg>   Cassandra keyspace (default KeyspaceCumulus)

 -n <arg>   Cassandra hosts as comma-separated list

            ('host1:port1,host2:port2,...') (default localhost:9160)

 -q <arg>   sparql query string

 -s <arg>   storage layout to use (triple|quad) (needs to match webapp

            configuration)

time elapsed 10 ms

-bash-4.1# java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Remove -help

***ERROR: class org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -help

usage: parameters:

 -h         print help

 -k <arg>   Cassandra keyspace (default KeyspaceCumulus)

 -n <arg>   Cassandra hosts as comma-separated list

            ('host1:port1,host2:port2,...') (default localhost:9160)

 -q <arg>   sparql construct query string. all its bindings will be

            removed.

 -s <arg>   storage layout to use (triple|quad) (needs to match webapp

            configuration)

time elapsed 11 ms

--------------------------------cql.sh--------------

가. keyspace확인하기 
 cqlsh> select * from system.schema_keyspaces;
===>
 keyspace_name   | durable_writes | strategy_class                              | strategy_options
-----------------+----------------+---------------------------------------------+----------------------------
 KeyspaceCumulus |           True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"1"}
          system |           True |  org.apache.cassandra.locator.LocalStrategy |                         {}
   system_traces |           True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"2"}
---------------

* keyspace목록 보기 : cqlsh>describe keyspaces;
* columnfamilies목록 보기 : describe columnfamilies;
* keyspace정보 보기 : describe keyspace "KeyspaceCumulus";

나. 사용할 keyspace지정하기
use "KeyspaceCumulus";


다. 테이블 목록 조회

describe tables;


*테이블 내용조회

select * from "DICT_P" limit 10;


라. KeyspaceCumulus가 사용하는 테이블 목록
TRUNCATE "DICT_P";
TRUNCATE "OSPC";
TRUNCATE "PREFIX_TO_NS";
TRUNCATE "SPOC";        
TRUNCATE "DICT_P_REVERSE";
TRUNCATE "POSC";          
TRUNCATE "SCHEMA_CLASSES";
TRUNCATE "SPO_RN_DT";     
TRUNCATE "DICT_SO";  
TRUNCATE "POS_RN_DT";
TRUNCATE "SCHEMA_D_PROPS";
TRUNCATE "SPO_RN_NUM";    
TRUNCATE "DICT_SO_REVERSE";
TRUNCATE "POS_RN_NUM";     
TRUNCATE "SCHEMA_O_PROPS";
TRUNCATE "counter";       

-----------------------------------KeyspaceCumulus정보---------------------
cqlsh> describe keyspace "KeyspaceCumulus";
CREATE KEYSPACE "KeyspaceCumulus" WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': '1'
};
USE "KeyspaceCumulus";
CREATE TABLE "DICT_P" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "DICT_P_REVERSE" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "DICT_SO" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "DICT_SO_REVERSE" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "OSPC" (
  key blob,
  column1 blob,
  column2 blob,
  column3 blob,
  value blob,
  PRIMARY KEY (key, column1, column2, column3)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "POSC" (
  key blob,
  column1 blob,
  column2 blob,
  column3 blob,
  "03" blob,
  PRIMARY KEY (key, column1, column2, column3)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE INDEX index_924575 ON "POSC" ("03");
CREATE TABLE "POS_RN_DT" (
  key blob,
  column1 bigint,
  column2 blob,
  value blob,
  PRIMARY KEY (key, column1, column2)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "POS_RN_NUM" (
  key blob,
  column1 double,
  column2 blob,
  value blob,
  PRIMARY KEY (key, column1, column2)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "PREFIX_TO_NS" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SCHEMA_CLASSES" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SCHEMA_D_PROPS" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SCHEMA_O_PROPS" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SPOC" (
  key blob,
  column1 blob,
  column2 blob,
  column3 blob,
  value blob,
  PRIMARY KEY (key, column1, column2, column3)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SPO_RN_DT" (
  key blob,
  column1 bigint,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SPO_RN_NUM" (
  key blob,
  column1 double,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=0 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='false' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE counter (
  key text,
  column1 text,
  value counter,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
번호 제목 날짜 조회 수
210 [CDP7.1.7][Replication]Table does not match version in getMetastore(). Table view original text mismatch 2024.01.02 4523
209 sequence한 번호 생성방법 2014.04.25 4526
208 [Atlas Server]org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=atlas/node01.gooper.com@GOOPER.COM, scope=default:atlas_janus, params=[table=default:atlas_janus,], action-CREATE)] 2023.05.15 4526
207 hadoop 2.6.0에 sqoop2 (1.99.5) server및 client설치 == fail 2015.06.11 4527
206 ../depcomp: line 512 exec : g++ : not found 2013.03.08 4528
205 Oracle NLOB type의 데이터를 import하는 경우 No Java type for SQL type 2011 for column rst와 같은 오류 발생시 조치사항 2022.01.14 4528
204 [CDP7.1.7]Hive Replication수행중 Specified catalog.database.table does not exist : hive.db명.table명 오류 발생시 조치방법 2024.04.05 4528
203 Spark 1.6.1 설치후 HA구성 2016.05.24 4533
202 [CDP7.1.7]Hive Replication수행시 Target Cluster에서 Specified catalog.database.table does not exist 오류 2024.05.08 4538
201 [CDP7.1.6,HDFS]HDFS파일을 삭제하고 Trash비움이 완료된후에도 HDFS 공간을 차지하고 있는 경우 확인/조치 방법 2023.07.17 4552
200 avro 사용하기(avsc 스키마 파일 컴파일 방법, consumer, producer샘플소스) 2016.07.08 4557
199 [sbt] sbt-assembly를 이용하여 실행에 필요한 모든 j라이브러리를 포함한 fat jar파일 만들기 2016.07.11 4557
198 oozie가 말하는 start시간은..서버에서 확인되는 시간이 아닙니다. 2014.05.14 4561
197 [DBeaver 4.3.0]import/export시 "Client home is not specified for connection" 오류발생시 조치사항 2017.12.21 4561
196 [Cloudera 6.3.4, Kudu]]Service Monitor에서 사용하는 metric중에 일부를 blacklist로 설정하여 모니터링 정보 수집 제외하는 방법 2022.07.08 4563
195 kudu table와 impala(hive) table정보가 틀어져서 테이블을 읽지 못하는 경우(Error Loading Metadata) 조치방법 2023.11.10 4567
194 missing block및 관련 파일명 찾는 명령어 2021.02.20 4568
193 [2.7.2] distribute-exclude.sh사용할때 ssh 포트변경에 따른 오류발생시 조치사항 2018.01.02 4573
192 column family삭제시 Column family 'delete' does not exist오류 발생하는 경우 2014.04.14 4574
191 mongodb aggregation query를 Java code로 변환한 샘플 2016.12.15 4574
위로