메뉴 건너뛰기

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.


1. Cassandra 1.2.11 다운로드/설치(cumulusRDF 1.0.1와 호환되며 테스트 된 버젼임)

 http://archive.apache.org/dist/cassandra/


 *참고1 : https://www.gooper.com/ss/index.php?mid=bigdata&category=2803&document_srl=3110 (버젼이 다르지만 언급된 항목에 대한 설정은 같으므로 참고하여 설정해준다)

 *참고2 : CumulusRDF 1.1.0이 Cassandra 1.2.X만 지원하므로 1.2.X를 다운받아 설치해야한다.


2. CumulusRDF 1.0.1 Web Application다운로드및 설치(직접 maven으로 빌드하면.. *.jar, war파일이 만들어지나 잘안됨(?))

  https://github.com/cumulusrdf/cumulusrdf/wiki/Downloads

  에서 March 11th 2014: CumulusRDF v1.0.1 war파일을 다운로드 받아서 WAS에 deploy한다.

  (예, tomcat의 경우 webapps폴더 밑에 두면 파일명을 context명으로 자동설치된다)


* 참고1 : https://github.com/cumulusrdf/cumulusrdf/wiki
* 참고2 : http://xxx.xxx.xxx.43:8080/cumulusrdf-1.0.1/info에 접근하면 web페이지에서 query및 bulkupload를 할 수있다. 

3. CumulusRDF 1.0.1 CLI 툴 다운로드및 설치

  https://github.com/cumulusrdf/cumulusrdf/wiki/Downloads

  에서 March 11th 2014: CumulusRDF v1.0.1 CLI jar를 다운로드 받아서 적절한 위치에 복사한다.

 * 참고1 : https://github.com/cumulusrdf/cumulusrdf/wiki/CLI)

 * 참고2 : dump, load, query, remove를 실행할 수 있는 jar파일임


 

4. 첨부된 1.0.0버젼의 CLI jar파일은 load할때 아래와 같이 사용한다.(이것은 thread개수를 지정할 수 있는데.. CumulusRDF 1.0.1 CLI등은 사용법이르며 일부기능이 지원되지 않음)

java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Load -i ./icbms_2016-04-15_15-10-16.nq -b 10000 -t 8

배치 10000, 쓰레드 8개로 nq파일을 업로드함

: 첨부파일을 이용할것


----------------------------첨부된 jar파일 사용시 가능한 옵션(Load, Dump, Query, Remove별로 다름)----------------------

-bash-4.1# java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Load -help

***ERROR: class org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -help

usage: parameters:

 -b <arg>   batch size - number of triples (default: 100)

 -f <arg>   format ('nt', 'nq' or 'xml') (default: 'nt')

 -h         print help

 -i <arg>   name of file to read, - for stdin (but then need to specify -x

            option)

 -k <arg>   Cassandra keyspace (default KeyspaceCumulus)

 -n <arg>   Cassandra hosts as comma-separated list

            ('host1:port1,host2:port2,...') (default localhost:9160)

 -r <arg>   replication factor  (default: 1)

 -s <arg>   storage layout to use (triple|quad) (needs to match webapp

            configuration)

 -t <arg>   number of loading threads (defaults to min(1,|hosts|/1.5))

time elapsed 6 ms

-bash-4.1# java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Dump -help

***ERROR: class org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -help

usage: parameters:

 -h         print help

 -k <arg>   Cassandra keyspace (default KeyspaceCumulus)

 -n <arg>   Cassandra hosts as comma-separated list

            ('host1:port1,host2:port2,...') (default localhost:9160)

 -o <arg>   name of output file

time elapsed 6 ms

-bash-4.1# java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Query -help

***ERROR: class org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -help

usage: parameters:

 -h         print help

 -k <arg>   Cassandra keyspace (default KeyspaceCumulus)

 -n <arg>   Cassandra hosts as comma-separated list

            ('host1:port1,host2:port2,...') (default localhost:9160)

 -q <arg>   sparql query string

 -s <arg>   storage layout to use (triple|quad) (needs to match webapp

            configuration)

time elapsed 10 ms

-bash-4.1# java -cp ./cumulusrdf-1.0.0-jar-with-dependencies.jar edu.kit.aifb.cumulus.cli.Main Remove -help

***ERROR: class org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -help

usage: parameters:

 -h         print help

 -k <arg>   Cassandra keyspace (default KeyspaceCumulus)

 -n <arg>   Cassandra hosts as comma-separated list

            ('host1:port1,host2:port2,...') (default localhost:9160)

 -q <arg>   sparql construct query string. all its bindings will be

            removed.

 -s <arg>   storage layout to use (triple|quad) (needs to match webapp

            configuration)

time elapsed 11 ms

--------------------------------cql.sh--------------

가. keyspace확인하기 
 cqlsh> select * from system.schema_keyspaces;
===>
 keyspace_name   | durable_writes | strategy_class                              | strategy_options
-----------------+----------------+---------------------------------------------+----------------------------
 KeyspaceCumulus |           True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"1"}
          system |           True |  org.apache.cassandra.locator.LocalStrategy |                         {}
   system_traces |           True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"2"}
---------------

* keyspace목록 보기 : cqlsh>describe keyspaces;
* columnfamilies목록 보기 : describe columnfamilies;
* keyspace정보 보기 : describe keyspace "KeyspaceCumulus";

나. 사용할 keyspace지정하기
use "KeyspaceCumulus";


다. 테이블 목록 조회

describe tables;


*테이블 내용조회

select * from "DICT_P" limit 10;


라. KeyspaceCumulus가 사용하는 테이블 목록
TRUNCATE "DICT_P";
TRUNCATE "OSPC";
TRUNCATE "PREFIX_TO_NS";
TRUNCATE "SPOC";        
TRUNCATE "DICT_P_REVERSE";
TRUNCATE "POSC";          
TRUNCATE "SCHEMA_CLASSES";
TRUNCATE "SPO_RN_DT";     
TRUNCATE "DICT_SO";  
TRUNCATE "POS_RN_DT";
TRUNCATE "SCHEMA_D_PROPS";
TRUNCATE "SPO_RN_NUM";    
TRUNCATE "DICT_SO_REVERSE";
TRUNCATE "POS_RN_NUM";     
TRUNCATE "SCHEMA_O_PROPS";
TRUNCATE "counter";       

-----------------------------------KeyspaceCumulus정보---------------------
cqlsh> describe keyspace "KeyspaceCumulus";
CREATE KEYSPACE "KeyspaceCumulus" WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': '1'
};
USE "KeyspaceCumulus";
CREATE TABLE "DICT_P" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "DICT_P_REVERSE" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "DICT_SO" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "DICT_SO_REVERSE" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "OSPC" (
  key blob,
  column1 blob,
  column2 blob,
  column3 blob,
  value blob,
  PRIMARY KEY (key, column1, column2, column3)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "POSC" (
  key blob,
  column1 blob,
  column2 blob,
  column3 blob,
  "03" blob,
  PRIMARY KEY (key, column1, column2, column3)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE INDEX index_924575 ON "POSC" ("03");
CREATE TABLE "POS_RN_DT" (
  key blob,
  column1 bigint,
  column2 blob,
  value blob,
  PRIMARY KEY (key, column1, column2)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "POS_RN_NUM" (
  key blob,
  column1 double,
  column2 blob,
  value blob,
  PRIMARY KEY (key, column1, column2)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "PREFIX_TO_NS" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SCHEMA_CLASSES" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SCHEMA_D_PROPS" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SCHEMA_O_PROPS" (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SPOC" (
  key blob,
  column1 blob,
  column2 blob,
  column3 blob,
  value blob,
  PRIMARY KEY (key, column1, column2, column3)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SPO_RN_DT" (
  key blob,
  column1 bigint,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE "SPO_RN_NUM" (
  key blob,
  column1 double,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=0 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='false' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
CREATE TABLE counter (
  key text,
  column1 text,
  value counter,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
번호 제목 글쓴이 날짜 조회 수
321 VPS에서는 root로 실행해도 swap파일을 만들지 못하게 만들어 두었지만 swap파일을 생성하는 방법 총관리자 2017.06.20 120
320 cassandra cluster 문제가 있는 node제거 하기(DN상태의 노드가 있으면 cassandra cluster 전체에 문제가 발생하므로 반드시 제거할것) 총관리자 2017.06.21 309
319 Not enough replica available for query at consistency QUORUM가 발생하는 경우 총관리자 2017.06.21 256
318 http://blog.naver.com... 총관리자 2017.06.23 88
317 elasticsearch 기동시 permission denied on key 'vm.max_map_count' 오류발생시 조치사항 총관리자 2017.06.23 431
316 solr 6.2에 한글 형태소 분석기(arirang 6.x) 적용 및 테스트 file 총관리자 2017.06.27 881
315 mysql에서 외부 디비를 커넥션할 경우 접속 속도가 느려질때 총관리자 2017.06.30 1083
314 solr명령 실행시 "Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect" 오류발생 총관리자 2017.06.30 202
313 python test.py실행시 "ImportError: No module named pyspark" 혹은 "ImportError: No module named py4j.protocol"등의 오류 발생시 조치사항 총관리자 2017.07.04 766
312 halyard 1.3을 다른 서버로 이전하는 방법 총관리자 2017.07.05 66
311 halyard 1.3의 rdf4j-server.war와 rdf4j-workbench.war를 tomcat deploy후 조회시 java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/Cell발생시 조치사항 총관리자 2017.07.05 65
310 halyard의 console스크립트에서 생성한 repository는 RDF4J Web Applications에서 공유가 되지 않는다. 총관리자 2017.07.05 45
309 schema.xml vs managed-schema 지정 사용하기 - 두개를 동시에 사용할 수는 없음 총관리자 2017.07.09 153
308 HBase write 성능 튜닝 file 총관리자 2017.07.18 87
307 HBase 설정 최적화하기(VCNC) file 총관리자 2017.07.18 120
306 Current heap configuration for MemStore and BlockCache exceeds the threshold required for successful cluster operation 총관리자 2017.07.18 892
305 갑자기 DataNode가 java.io.IOException: Premature EOF from inputStream를 반복적으로 발생시키다가 java.lang.OutOfMemoryError: Java heap space를 내면서 죽는 경우 조치방법 총관리자 2017.07.19 1686
304 9대가 hbase cluster로 구성된 서버에서 테스트 data를 halyard에 적재하고 테스트 하는 방법및 절차 총관리자 2017.07.21 56
303 Core with name 'xx_shard4_replica1' already exists. 발생시 조치사항 총관리자 2017.07.22 62
302 LUBM 데이타 생성구문 총관리자 2017.07.24 143

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

위로