메뉴 건너뛰기

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.


1. Cassandra 1.2.11 다운로드/설치(cumulusRDF 1.0.1와 호환되며 테스트 된 버젼임)

 http://archive.apache.org/dist/cassandra/


 *참고1 : https://www.gooper.com/ss/index.php?mid=bigdata&category=2803&document_srl=3110 (버젼이 다르지만 언급된 항목에 대한 설정은 같으므로 참고하여 설정해준다)

 *참고2 : CumulusRDF 1.1.0이 Cassandra 1.2.X만 지원하므로 1.2.X를 다운받아 설치해야한다.


2. CumulusRDF 1.0.1 Web Application다운로드및 설치(직접 maven으로 빌드하면.. *.jar, war파일이 만들어지나 잘안됨(?))

  https://github.com/cumulusrdf/cumulusrdf/wiki/Downloads

  에서 March 11th 2014: CumulusRDF v1.0.1 war파일을 다운로드 받아서 WAS에 deploy한다.

  (예, tomcat의 경우 webapps폴더 밑에 두면 파일명을 context명으로 자동설치된다)


* 참고1 : https://github.com/cumulusrdf/cumulusrdf/wiki
* 참고2 : http://xxx.xxx.xxx.43:8080/cumulusrdf-1.0.1/info에 접근하면 web페이지에서 query및 bulkupload를 할 수있다. 

3. CumulusRDF 1.0.1 CLI 툴 다운로드및 설치

  https://github.com/cumulusrdf/cumulusrdf/wiki/Downloads

  에서 March 11th 2014: CumulusRDF v1.0.1 CLI jar를 다운로드 받아서 적절한 위치에 복사한다.

 * 참고1 : https://github.com/cumulusrdf/cumulusrdf/wiki/CLI)

 * 참고2 : dump, load, query, remove를 실행할 수 있는 jar파일임


 

4. 첨부된 1.0.0버젼의 CLI jar파일은 load할때 아래와 같이 사용한다.(이것은 thread개수를 지정할 수 있는데.. CumulusRDF 1.0.1 CLI등은 사용법이르며 일부기능이 지원되지 않음)

java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Load -i ./icbms_2016-04-15_15-10-16.nq -b 10000 -t 8

배치 10000, 쓰레드 8개로 nq파일을 업로드함

: 첨부파일을 이용할것


----------------------------첨부된 jar파일 사용시 가능한 옵션(Load, Dump, Query, Remove별로 다름)----------------------

-bash-4.1# java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Load -help

***ERROR: class org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -help

usage: parameters:

 -b <arg>   batch size - number of triples (default: 100)

 -f <arg>   format ('nt', 'nq' or 'xml') (default: 'nt')

 -h         print help

 -i <arg>   name of file to read, - for stdin (but then need to specify -x

            option)

 -k <arg>   Cassandra keyspace (default KeyspaceCumulus)

 -n <arg>   Cassandra hosts as comma-separated list

            ('host1:port1,host2:port2,...') (default localhost:9160)

 -r <arg>   replication factor  (default: 1)

 -s <arg>   storage layout to use (triple|quad) (needs to match webapp

            configuration)

 -t <arg>   number of loading threads (defaults to min(1,|hosts|/1.5))

time elapsed 6 ms

-bash-4.1# java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Dump -help

***ERROR: class org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -help

usage: parameters:

 -h         print help

 -k <arg>   Cassandra keyspace (default KeyspaceCumulus)

 -n <arg>   Cassandra hosts as comma-separated list

            ('host1:port1,host2:port2,...') (default localhost:9160)

 -o <arg>   name of output file

time elapsed 6 ms

-bash-4.1# java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Query -help

***ERROR: class org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -help

usage: parameters:

 -h         print help

 -k <arg>   Cassandra keyspace (default KeyspaceCumulus)

 -n <arg>   Cassandra hosts as comma-separated list

            ('host1:port1,host2:port2,...') (default localhost:9160)

 -q <arg>   sparql query string

 -s <arg>   storage layout to use (triple|quad) (needs to match webapp

            configuration)

time elapsed 10 ms

-bash-4.1# java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Remove -help

***ERROR: class org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -help

usage: parameters:

 -h         print help

 -k <arg>   Cassandra keyspace (default KeyspaceCumulus)

 -n <arg>   Cassandra hosts as comma-separated list

            ('host1:port1,host2:port2,...') (default localhost:9160)

 -q <arg>   sparql construct query string. all its bindings will be

            removed.

 -s <arg>   storage layout to use (triple|quad) (needs to match webapp

            configuration)

time elapsed 11 ms

--------------------------------cql.sh--------------

가. keyspace확인하기 
 cqlsh> select * from system.schema_keyspaces;
===>
 keyspace_name   | durable_writes | strategy_class                              | strategy_options
-----------------+----------------+---------------------------------------------+----------------------------
 KeyspaceCumulus |           True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"1"}
          system |           True |  org.apache.cassandra.locator.LocalStrategy |                         {}
   system_traces |           True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"2"}
---------------

* keyspace목록 보기 : cqlsh>describe keyspaces;
* columnfamilies목록 보기 : describe columnfamilies;
* keyspace정보 보기 : describe keyspace "KeyspaceCumulus";

나. 사용할 keyspace지정하기
use "KeyspaceCumulus";


다. 테이블 목록 조회

describe tables;


*테이블 내용조회

select * from "DICT_P" limit 10;


라. KeyspaceCumulus가 사용하는 테이블 목록
TRUNCATE "DICT_P";
TRUNCATE "OSPC";
TRUNCATE "PREFIX_TO_NS";
TRUNCATE "SPOC";        
TRUNCATE "DICT_P_REVERSE";
TRUNCATE "POSC";          
TRUNCATE "SCHEMA_CLASSES";
TRUNCATE "SPO_RN_DT";     
TRUNCATE "DICT_SO";  
TRUNCATE "POS_RN_DT";
TRUNCATE "SCHEMA_D_PROPS";
TRUNCATE "SPO_RN_NUM";    
TRUNCATE "DICT_SO_REVERSE";
TRUNCATE "POS_RN_NUM";     
TRUNCATE "SCHEMA_O_PROPS";
TRUNCATE "counter";       

-----------------------------------KeyspaceCumulus정보---------------------
cqlsh> describe keyspace "KeyspaceCumulus";
CREATE KEYSPACE "KeyspaceCumulus" WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': '1'
};
USE "KeyspaceCumulus";
CREATE TABLE "DICT_P" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "DICT_P_REVERSE" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "DICT_SO" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "DICT_SO_REVERSE" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "OSPC" (
  key blob,
  column1 blob,
  column2 blob,
  column3 blob,
  value blob,
  PRIMARY KEY (key, column1, column2, column3)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "POSC" (
  key blob,
  column1 blob,
  column2 blob,
  column3 blob,
  "03" blob,
  PRIMARY KEY (key, column1, column2, column3)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE INDEX index_924575 ON "POSC" ("03");
CREATE TABLE "POS_RN_DT" (
  key blob,
  column1 bigint,
  column2 blob,
  value blob,
  PRIMARY KEY (key, column1, column2)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "POS_RN_NUM" (
  key blob,
  column1 double,
  column2 blob,
  value blob,
  PRIMARY KEY (key, column1, column2)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "PREFIX_TO_NS" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SCHEMA_CLASSES" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SCHEMA_D_PROPS" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SCHEMA_O_PROPS" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SPOC" (
  key blob,
  column1 blob,
  column2 blob,
  column3 blob,
  value blob,
  PRIMARY KEY (key, column1, column2, column3)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SPO_RN_DT" (
  key blob,
  column1 bigint,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SPO_RN_NUM" (
  key blob,
  column1 double,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=0 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='false' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE counter (
  key text,
  column1 text,
  value counter,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
번호 제목 글쓴이 날짜 조회 수
59 S2RDF 테스트(벤치마크 테스트를 기준으로 python, scala소스가 만들어져서 기능은 파악되지 못함) [2] file 총관리자 2016.05.27 76
58 python실행시 ValueError: zero length field name in format오류 해결방법 총관리자 2016.05.27 44
57 DataSetCreator.py 실행시 파일을 찾을 수 없는 오류 총관리자 2016.05.27 53
56 --master yarn 옵션으로 spark client프로그램 실행할때 메모리 부족 오류발생시 조치방법 총관리자 2016.05.27 141
55 S2RDF모듈의 실행부분만 추출하여 별도록 실행하는 방법(draft) 총관리자 2016.06.14 36
54 S2RDF를 실행부분만 추출하여 1건의 triple data를 HDFS에 등록, sparql을 sql로 변환, sql실행하는 방법및 S2RDF소스 컴파일 방법 총관리자 2016.06.15 410
53 queryTranslator실행시 NullPointerException가 발생전에 java.lang.ArrayIndexOutOfBoundsException발생시 조치사항 총관리자 2016.06.16 58
52 5건의 triple data를 이용하여 특정 작업 폴더에서 작업하는 방법/절차 총관리자 2016.06.16 36
51 DataSetCreator실행시 "Illegal character in fragment at index"오류가 나는 경우 조치방안 총관리자 2016.06.17 480
50 Drools 6.0 - 비즈니스 룰 기반으로 간단한 룰 애플리케이션 만들기 file 총관리자 2016.07.18 440
49 실시간 쿼리 변환 모니터링(팩트내 필드값의 변경사항을 실시간으로 추적함)하는 테스트 java 프로그램 file 총관리자 2016.07.21 67
48 룰에 매칭되면 발생되는 엑티베이션 객체에 대한 작업(이전값 혹은 현재값)을 처리하는 클래스 파일 총관리자 2016.07.21 285
47 커리 변경 이벤트를 처리하기 위한 구현클래스 총관리자 2016.07.21 41
46 워킹 메모리에 대한 정보를 처리하는 클래스 파일 총관리자 2016.07.21 49
45 drools에서 drl관련 로그를 기록하기 위한 클래스 파일 총관리자 2016.07.21 74
44 ServerInfo객체파일 총관리자 2016.07.21 35
43 drools를 이용한 로그,rule matching등의 테스트 java프로그램 file 총관리자 2016.07.21 181
42 거침없이 배우는 Drools 책의 샘플소스 file 총관리자 2016.07.22 1232
41 슬라이딩 윈도우 예제 총관리자 2016.07.28 67
40 [Elephas] Jena Elephas를 이용하여 Spark에서 rdfTriples의 RDD를 만들고 RDD관련 작업하는 샘플소스 총관리자 2016.08.10 90

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

위로