Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

Spark+S2RDF 5건의 triple data를 이용하여 특정 작업 폴더에서 작업하는 방법/절차

총관리자 2016.06.16 20:07 조회 수 : 36

1. 작업폴더 생성/이동(/home/hadoop/S2RDF_work에 실행에 필요한 jar파일을 복사하고 작업용 폴더(예, test3)를 만들어 triple data 생성하고 작업함)

가. mkdir /home/hadoop/S2RDF_work

나. cd /home/hadoop/S2RDF_work

다. mkdir test3

라. cd test3

2. triple data파일 생성(test3.nq)

vi test3.nq

===>

<http://www.w3.org/2002/07/owl#Thing> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Resource> .

<http://www.w3.org/2002/07/owl#Thing> <http://www.w3.org/1999/02/22-rdf-syntax-ns#have> <http://www.w3.org/2000/01/rdf-schema#Resource2> .

<http://www.w3.org/2002/07/owl#Thing2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Resource> .

<http://www.w3.org/2002/07/owl#Thing2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#have> <http://www.w3.org/2000/01/rdf-schema#Resource2> .

<http://www.w3.org/2002/07/owl#Thing2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#have> <http://www.w3.org/2000/01/rdf-schema#Resource3> .

3. HDFS에 업로드

가. hadoop fs -mkdir test3

나. hadoop fs -put test3.nq test3

4. DataSetCreator실행(db명 : test3, /home/hadoop/S2RDF_work에서 실행함, test3.nq는 HDFS의 test3폴더 밑에 있음)

가. Generate Vertical Partitioning

$HOME/spark/bin/spark-submit --driver-memory 1g --class runDriver --master yarn --executor-memory 1g --deploy-mode cluster ./datasetcreator_2.10-1.1.jar test3/ test3.nq VP 0.2

==> 작업이 실행된 서버에 /tmp/stat_vp.txt가 만들어짐

==> stat_vp.txt내용(cat stat_vp.txt, 항목은 tab으로 분리됨)

VP Statistic

---------------------------------------------------------

<<http://www.w3.org/1999/02/22-rdf-syntax-ns#have>> 3 5 0.60

<<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>> 2 5 0.40

---------------------------------------------------------

Saved tabels ->2

Unsaved non-empty tables ->0

Empty tables ->0

나. Generate Exteded Vertical Partitioning subset SO

$HOME/spark/bin/spark-submit --driver-memory 1g --class runDriver --master yarn --executor-memory 1g --deploy-mode cluster ./datasetcreator_2.10-1.1.jar test3/ test3.nq SO 0.2

==> 작업이 실행된 서버에 /tmp/stat_so.txt가 만들어짐

==> stat_so.txt내용(at stat_so.txt, 항목은 tab으로 분리됨)

SO Statistic

---------------------------------------------------------

Saved tabels ->0

Unsaved non-empty tables ->0

Empty tables ->4

다. Generate Exteded Vertical Partitioning subset OS

$HOME/spark/bin/spark-submit --driver-memory 1g --class runDriver --master yarn --executor-memory 1g --deploy-mode cluster ./datasetcreator_2.10-1.1.jar test3/ test3.nq OS 0.2

==> 작업이 실행된 서버에 /tmp/stat_os.txt가 만들어짐

==> stat_os.txt내용(at stat_os.txt, 항목은 tab으로 분리됨)

OS Statistic

---------------------------------------------------------

Saved tabels ->0

Unsaved non-empty tables ->0

Empty tables ->4

라. Generate Exteded Vertical Partitioning subset SS

$HOME/spark/bin/spark-submit --driver-memory 1g --class runDriver --master yarn --executor-memory 1g --deploy-mode cluster ./datasetcreator_2.10-1.1.jar test3/ test3.nq SS 0.2

==> 작업이 실행된 서버에 /tmp/stat_ss.txt가 만들어짐

==> stat_ss.txt내용(at stat_ss.txt, 항목은 tab으로 분리됨)

SS Statistic

---------------------------------------------------------

<<http://www.w3.org/1999/02/22-rdf-syntax-ns#have>><<http://www.w3.org/1999/02/22-rdf-syntax-ns#have>> 3 3 1.00 0.60

<<http://www.w3.org/1999/02/22-rdf-syntax-ns#have>><<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>> 3 3 1.00 0.60

<<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>><<http://www.w3.org/1999/02/22-rdf-syntax-ns#have>> 2 2 1.00 0.40

<<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>><<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>> 2 2 1.00 0.40

---------------------------------------------------------

Saved tabels ->0

Unsaved non-empty tables ->2

Empty tables ->2

5. 통계정보 파일을 특정폴더에 취합

위에서 생성된 파일을 /home/hadoop/S2RDF_work/test3/statistics폴더 밑으로 복사해준다.

-rw-rw-r--. 1 hadoop hadoop 201 2016-06-16 17:37 stat_os.txt

-rw-rw-r--. 1 hadoop hadoop 201 2016-06-16 17:37 stat_so.txt

-rw-rw-r--. 1 hadoop hadoop 732 2016-06-16 17:38 stat_ss.txt

-rw-rw-r--. 1 hadoop hadoop 354 2016-06-16 17:36 stat_vp.txt

6. 실행할 sparql이 들어 있는 파일을 만든다.

vi /home/hadoop/S2RDF_work/test3/test3.sparql

내용 : select ?s ?o where {?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o}

7. QueryTranslator실행(/home/hadoop/S2RDF_work에서 실행함,

queryTranslator-1.1.0.jar파일은 원본에서 제공하는 queryTranslator-1.1.jar을 사용하지 않고 소스 일부 수정하고 컴파일하여 다시 jar로 묶어서 만들어짐)

java -jar ./queryTranslator-1.1.0.jar -i ./test3/test3.sparql -o ./test3/test3.sparql -sd ./test3/statistics/ -sUB 0.2

==>실행결과 아래와 같은 로그가 표시되며 log파일과 sql파일은 test3.sparql파일이 있는곳에 생성됨(예,/home/hadoop/S2RDF_work/test3/test3.sparql.sql)

inputFile- =================>./test3/test3.sparql

18:34:25 DEBUG Main :: inputFile-- =================>./test3/test3.sparql

18:34:25 DEBUG JenaIOEnvironment :: Failed to find configuration: location-mapping.ttl;location-mapping.rdf;location-mapping.n3;etc/location-mapping.rdf;etc/location-mapping.n3;etc/location-mapping.ttl

VP STAT Size = 2

OS STAT Size = 0

SO STAT Size = 0

SS STAT Size = 4

THE NUMBER OF ALL SAVED (< ScaleUB) TRIPLES IS -> 5

THE NUMBER OF ALL SAVED (< ScaleUB) TABLES IS -> 2

TABLE-><http__//www.w3.org/1999/02/22-rdf-syntax-ns#type>

8. 7에서 만들어진 sql을 이용하여 실행함.

가. /home/hadoop/S2RDF_work/test3/test3.sparql.sql파일을 수정한다.

(>>>>>>TEST3--SO-OS-SS_VP__test3에서 --, SO, __가 반드시 포함되어 있어야함.. 나중에 이부분은 체크하지 않도록 소스에서 제외시켜야할 필요가 있음)

>>>>>>TEST3--SO-OS-SS_VP__test3

SELECT sub AS s , obj AS o

FROM `_L_http__//www.w3.org/1999/02/22-rdf-syntax-ns#type_B_$$1$$`

++++++Tables Statistic

_L_http__//www.w3.org/1999/02/22-rdf-syntax-ns#type_B_$$1$$ 0 VP _L_http__//www.w3.org/1999/02/22-rdf-syntax-ns#type_B_/

VP <http__//www.w3.org/1999/02/22-rdf-syntax-ns#type> 2

------

나. QueryTranslator실행

$HOME/spark/bin/spark-submit --driver-memory 2g --class runDriver --master yarn --executor-memory 1g --deploy-mode cluster --files ./test3/test3.sparql.sql ./queryexecutor_2.10-1.1.jar test3 test3.sparql.sql

---------------------YARN Application에서 데이타 확인을 위해서 로그를 찍어보면 아래와 같다.------------------

Log Type: stdout

Log Upload Time: 목 6월 16 20:09:59 +0900 2016

Log Length: 2443

queryName ==>TEST3--SO-OS-SS_VP__test3
sqlQuery==>SELECT sub AS s , obj AS o 
	 FROM `_L_http__//www.w3.org/1999/02/22-rdf-syntax-ns#type_B___1__`
	
	

qStat ==>_L_http__//www.w3.org/1999/02/22-rdf-syntax-ns#type_B___1__	0	VP	_L_http__//www.w3.org/1999/02/22-rdf-syntax-ns#type_B_/
	VP	<http__//www.w3.org/1999/02/22-rdf-syntax-ns#type>	2
------

tables==>Map(_L_http__//www.w3.org/1999/02/22-rdf-syntax-ns#type_B___1__ -> queryExecutor.Table@2224c8cc)
queryNames======>TEST3--SO-OS-SS_VP__test3
pr-TEST3pf-SO-OS-SS_VP__test3atTEST3
Test TEST3--SO-OS-SS_VP__test3:
tPath=======>_L_http__//www.w3.org/1999/02/22-rdf-syntax-ns#type_B_/
	Load Table _L_http__//www.w3.org/1999/02/22-rdf-syntax-ns#type_B___1__ from test3/VP/_L_http__/www.w3.org/1999/02/22-rdf-syntax-ns#type_B_.parquet-> 
==_sqlContext.sql result =====================>[sub: string, obj: string]
		Cached 2 Elements in 754ms
tPath=======>_L_http__//www.w3.org/1999/02/22-rdf-syntax-ns#type_B_/
query.query=================>SELECT sub AS s , obj AS o 
	 FROM `_L_http__//www.w3.org/1999/02/22-rdf-syntax-ns#type_B___1__`
	
	

HaLLO
Project [sub#6 AS s#36,obj#7 AS o#37]
 InMemoryColumnarTableScan [sub#6,obj#7], [], (InMemoryRelation [sub#6,obj#7], true, 20000, StorageLevel(true, true, false, true, 1), (PhysicalRDD [sub#6,obj#7], MapPartitionsRDD[6] at repartition at DataFrame.scala:907), Some(_L_http__//www.w3.org/1999/02/22-rdf-syntax-ns#type_B___1__))

HaLL1

	 Run query -> 
t==>[<http://www.w3.org/2002/07/owl#Thing>,<http://www.w3.org/2000/01/rdf-schema#Resource> .]
t==>[<http://www.w3.org/2002/07/owl#Thing2>,<http://www.w3.org/2000/01/rdf-schema#Resource> .]
colname[0] name ===>s,value===>[s: string]
colname[1] name ===>o,value===>[o: string]
temp.toJSON.toString ============>MapPartitionsRDD[23] at mapPartitions at DataFrame.scala:862

	 Run query -> 
t==>[<http://www.w3.org/2002/07/owl#Thing>,<http://www.w3.org/2000/01/rdf-schema#Resource> .]
t==>[<http://www.w3.org/2002/07/owl#Thing2>,<http://www.w3.org/2000/01/rdf-schema#Resource> .]
colname[0] name ===>s,value===>[s: string]
colname[1] name ===>o,value===>[o: string]
temp.toJSON.toString ============>MapPartitionsRDD[34] at mapPartitions at DataFrame.scala:862
MapPartitionsRDD[38] at mapPartitions at DataFrame.scala:862
results============================>Map()
fileName==>/tmp/./results.txt
line ==>Thu Jun 16 20:10:08 KST 2016
fileName==>/tmp/./resultTimes.txt
line ==>Thu Jun 16 20:10:08 KST 2016

이 게시물을

번호	제목	글쓴이	날짜	조회 수
241	Cannot create /var/run/oozie/oozie.pid: Directory nonexistent오류	총관리자	2014.06.03	479
240	DataSetCreator실행시 "Illegal character in fragment at index"오류가 나는 경우 조치방안	총관리자	2016.06.17	480
239	시스템날짜를 현재 정보로 동기화 하는 방법(rdate, ntpdate이용)	총관리자	2014.08.24	481
238	[dovecot]dovecot restart할때 root@gsda4:/usr/lib/dovecot# service dovecot restart 오류 발생시 조치사항	총관리자	2017.06.12	492
237	Incompatible clusterIDs오류 원인및 해결방법	총관리자	2016.04.01	493
236	원격지에서 zio공유기를 통해서 노트북의 mysql접속을 허용하는 방법	총관리자	2014.09.07	495
235	hive기동시 Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D 오류 발생시 조치사항	총관리자	2016.09.25	496
234	compile할때와 exclude할때 대상을 표현하는 명칭이 다르므로 주의할것	총관리자	2016.08.10	503
233	CDH에서 Sentry 개념및 설정	총관리자	2018.06.21	504
232	데이타 제공 사이트 링크	총관리자	2014.08.03	508
231	동시에 많은 요청이 endpoint로 몰려서java.net.NoRouteToHostException가 발생하는 경우의 처리방법	총관리자	2016.10.17	510
230	anaconda3 (v5.2) 설치및 머신러닝 관련 라이브러리 설치 절차	총관리자	2018.07.27	513
229	spark-submit 실행시 "java.lang.OutOfMemoryError: Java heap space"발생시 조치사항	총관리자	2018.02.01	517
228	Ubuntu 16.04 LTS에 4대에 Hadoop 2.8.0설치	총관리자	2017.05.01	521
227	Kafka Offset Monitor로 kafka 상태 모니터링 하기	총관리자	2016.11.08	529
226	hadoop의 data디렉토리를 변경하는 방법	총관리자	2014.08.24	536
225	[Kudu]ERROR: Unable to advance iterator for node with id '2' for Kudu table 'impala::core.pm0_abdasubjct': Network error: recv error from unknown peer: Transport endpoint is not connected (error 107)	gooper	2023.03.16	536
224	spark client프로그램 기동시 "Error initializing SparkContext"오류 발생할때 조치사항	총관리자	2016.05.27	539
223	spark-shell을 실행하면 "Attempted to request executors before the AM has registered!"라는 오류가 발생하면	총관리자	2018.06.08	545
222	외부 기기(usb, 하드)등 mount(연결)하기	총관리자	2014.08.03	546

쓰기 태그

첫 페이지 21 22 23 24 25 26 27 28 29 30 끝 페이지

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

Spark+S2RDF 5건의 triple data를 이용하여 특정 작업 폴더에서 작업하는 방법/절차

댓글 0

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다. 필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

Spark+S2RDF 5건의 triple data를 이용하여 특정 작업 폴더에서 작업하는 방법/절차

댓글 0

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc. We are open to the required minutes. Please send inquiries to gooper@gooper.com.

LOGIN

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.