메뉴 건너뛰기

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.


Spark에서 KafkaUtils.createStream을 이용하여 Kafka의 data를 가져올때 StorageLevel을 StorageLevel.MEMORY_ONLY()로 하는 경우 "Could not compute split, block input-0-1517397051800 not found"형태의 오류가 발생하는데 이는 Spark가 메모리 부족 상황이 되면 해당 데이타를 버리기 때문에 문제가 발생한다.

이때는 StorageLevel.MEMORY_ONLY()을 StorageLevel.MEMORY_AND_DISK_SER()로 변경해준다.



-------------소스코드 일부분-----

JavaPairReceiverInputDStream<byte[], byte[]> kafkaStream = KafkaUtils.createStream(jssc,byte[].class, byte[].class, kafka.serializer.DefaultDecoder.class, kafka.serializer.DefaultDecoder.class,
        conf, topic, StorageLevel.MEMORY_AND_DISK_SER());
JavaDStream<byte[]> lines = kafkaStream.map(tuple2 -> tuple2._2());


-----------------------------------오류 메세지------------------

[2018-01-31 20:17:26,404] [internal.Logging$class] [logError(#70)] [ERROR] Task 0 in stage 1020.0 failed 1 times; aborting job
[2018-01-31 20:17:26,404] [internal.Logging$class] [logError(#91)] [ERROR] Error running job streaming job 1517397060000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1020.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1020.0 (TID 1020, localhost, executor driver): java.lang.Exception: Could not compute split, block input-0-1517397051800 not found
        at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:50)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
        at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
        at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$print$2$$anonfun$foreachFunc$3$1.apply(DStream.scala:734)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$print$2$$anonfun$foreachFunc$3$1.apply(DStream.scala:733)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:256)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:256)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:256)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:255)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Exception: Could not compute split, block input-0-1517397051800 not found
        at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:50)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
        ... 3 more
[2018-01-31 20:17:26,415] [onem2m.AvroOneM2MDataSparkSubscribe$ConsumerT] [go(#142)] [DEBUG] count data from kafka broker stream in AvroOneM2MDataSparkSubscribe: 39981
[2018-01-31 20:17:29,039] [sf.QueryServiceFactory] [create(#28)] [DEBUG] query gubun : FUSEKISPARQL
[2018-01-31 20:17:29,040] [sf.QueryCommon] [makeFinal(#44)] [DEBUG] Count : 0 , Vals : [] 
[2018-01-31 20:17:29,040] [sf.SparqlFusekiQueryImpl] [runModifySparql(#162)] [DEBUG] runModifySparql() on DatWarehouse server start.................................. 
[2018-01-31 20:17:29,040] [sf.SparqlFusekiQueryImpl] [runModifySparql(#165)] [DEBUG] try (first).................................. 
[2018-01-31 20:17:29,042] [sf.SparqlFusekiQueryImpl] [runModifySparql(#207)] [DEBUG] runModifySparql() on DataWarehouse server end.................................. 
[2018-01-31 20:17:29,042] [sf.SparqlFusekiQueryImpl] [runModifySparql(#212)] [DEBUG] runModifySparql() on DataMart server start.................................. 
[2018-01-31 20:17:29,043] [sf.SparqlFusekiQueryImpl] [runModifySparql(#224)] [DEBUG] runModifySparql() on DataMart server end.................................. 
[2018-01-31 20:17:29,044] [sf.QueryCommon] [makeFinal(#44)] [DEBUG] Count : 0 , Vals : [] 
[2018-01-31 20:17:29,044] [sf.SparqlFusekiQueryImpl] [runModifySparql(#162)] [DEBUG] runModifySparql() on DatWarehouse server start.................................. 
[2018-01-31 20:17:29,044] [sf.SparqlFusekiQueryImpl] [runModifySparql(#165)] [DEBUG] try (first).................................. 
[2018-01-31 20:17:29,045] [sf.SparqlFusekiQueryImpl] [runModifySparql(#207)] [DEBUG] runModifySparql() on DataWarehouse server end.................................. 
[2018-01-31 20:17:29,045] [sf.SparqlFusekiQueryImpl] [runModifySparql(#212)] [DEBUG] runModifySparql() on DataMart server start.................................. 
[2018-01-31 20:17:29,046] [sf.SparqlFusekiQueryImpl] [runModifySparql(#224)] [DEBUG] runModifySparql() on DataMart server end.................................. 
[2018-01-31 20:17:29,047] [sf.QueryCommon] [makeFinal(#44)] [DEBUG] Count : 0 , Vals : [] 
[2018-01-31 20:17:29,047] [sf.SparqlFusekiQueryImpl] [runModifySparql(#212)] [DEBUG] runModifySparql() on DataMart server start.................................. 
[2018-01-31 20:17:29,049] [sf.SparqlFusekiQueryImpl] [runModifySparql(#224)] [DEBUG] runModifySparql() on DataMart server end.................................. 
[2018-01-31 20:17:29,049] [sf.TripleService] [makeTripleFile(#333)] [INFO] makeTripleFile start==========================>
[2018-01-31 20:17:29,049] [sf.TripleService] [makeTripleFile(#334)] [DEBUG] makeTripleFile ========triple_path_file=================>/svc/apps/sda/triples/20180131/AvroOneM2MDataSparkSubscribe_TT20180131T201100S0000000991_WRK20180131T201729.nt
[2018-01-31 20:17:29,170] [sf.TripleService] [makeTripleFile(#346)] [INFO] makeTripleFile end==========================>
[2018-01-31 20:17:29,170] [onem2m.AvroOneM2MDataSparkSubscribe] [sendTriples(#288)] [INFO] Sending triples in com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe to DW start.......................
[2018-01-31 20:17:29,170] [sf.TripleService] [sendTripleFileToDW(#382)] [INFO] sendTripleFile to DW start==========================>
[2018-01-31 20:17:29,170] [sf.TripleService] [sendTripleFileToDW(#383)] [DEBUG] sendTripleFile ==============triple_path_file============>/svc/apps/sda/triples/20180131/AvroOneM2MDataSparkSubscribe_TT20180131T201100S0000000991_WRK20180131T201729.nt
[2018-01-31 20:17:29,171] [sf.TripleService] [sendTripleFileToDW(#396)] [DEBUG] sendTripleFile ==============args============>/svc/apps/sda/bin/fuseki/bin/s-post http://166.104.112.43:23030/icbms default /svc/apps/sda/triples/20180131/AvroOneM2MDataSparkSubscribe_TT20180131T201100S0000000991_WRK20180131T201729.nt 
[2018-01-31 20:17:29,171] [sf.TripleService] [sendTripleFileToDW(#399)] [DEBUG] try (first).......................
[2018-01-31 20:17:36,950] [util.Utils] [runShell(#737)] [DEBUG] Thread stdMsgT Status : TERMINATED
[2018-01-31 20:17:36,951] [util.Utils] [runShell(#738)] [DEBUG] Thread errMsgT Status : TERMINATED
[2018-01-31 20:17:36,951] [util.Utils] [runShell(#743)] [DEBUG] notTimeOver ==========================>true
[2018-01-31 20:17:36,951] [sf.TripleService] [sendTripleFileToDW(#402)] [DEBUG] resultStr in TripleService.sendTripleFileToDW() == > [, ]
[2018-01-31 20:17:36,951] [sf.TripleService] [sendTripleFileToDW(#433)] [INFO] sendTripleFile to DW  end==========================>
[2018-01-31 20:17:36,951] [onem2m.AvroOneM2MDataSparkSubscribe] [sendTriples(#290)] [INFO] Sending triples in com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe to DW end.......................
[2018-01-31 20:17:36,951] [onem2m.AvroOneM2MDataSparkSubscribe] [sendTriples(#293)] [INFO] Sending triples in com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe to Halyard start.......................
[2018-01-31 20:17:36,952] [sf.TripleService] [sendTripleFileToHalyard(#486)] [INFO] sendTripleFile to Halyard  start==========================>
[2018-01-31 20:17:37,294] [sf.QueryServiceFactory] [create(#31)] [DEBUG] query gubun : HALYARDSPARQL
[2018-01-31 20:17:37,317] [sf.SparqlHalyardQueryImpl] [insertByPost(#189)] [DEBUG] ------------------------insertByPost-----start-----------------------
[2018-01-31 20:17:37,317] [sf.SparqlHalyardQueryImpl] [insertByPost(#198)] [DEBUG] ------------------------insertByPost-----end-----------------------

번호 제목 글쓴이 날짜 조회 수
560 os가 windows7인 host pc에서 ubuntu가 os인 guest pc에 접근하기 위한 네트워크설정 총관리자 2014.04.20 725
559 hortonworks에서 제공하는 메모리 설정값 계산기 사용법 file 총관리자 2015.06.14 719
558 secureCRT에서 backspace키가 작동하지 않는 경우 해결방법 총관리자 2015.05.11 719
557 update를 많이 하면 heap memory가 많이 소진되고 최종적으로 OOM가 발생하는데 이에 대한 설명 총관리자 2017.04.10 717
556 SASL configuration failed: javax.security.auth.login.LoginException: java.lang.NullPointerException 오류 해결방법 총관리자 2015.04.02 701
555 oozie가 말하는 start시간은..서버에서 확인되는 시간이 아닙니다. 총관리자 2014.05.14 699
554 [Impala 3.2버젼]compute incremental stats db명.테이블명 수행시 ERROR: AnalysisException: Incremental stats size estimate exceeds 2000.00MB. 오류 발생원인및 조치방안 gooper 2022.11.30 697
553 mysql 5.5.34-0ubuntu0.13.04용 설치/진행 화면 총관리자 2014.09.10 697
552 sendmail전송시 421 4.3.0 collect: Cannot write ./dfv5BA2EBS010579 (bfcommit, uid=0, gid=114): No such file or directory 발생시 조치사항 총관리자 2017.06.11 694
551 lateral view 예제 총관리자 2014.09.18 691
550 source의 type을 spooldir로 하는 경우 해당 경로에 파일이 들어오면 파일단위로 전송함 총관리자 2014.05.20 687
549 [springframework]Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Could not read resultset: unexpected end of stream, read 0 bytes from 4 오류 발생시 조치사항 총관리자 2017.01.23 680
548 sqoop으로 mariadb에 접근해서 hive 테이블로 자동으로 생성하기 총관리자 2018.08.03 673
547 znode /hbase recursive하게 지우기 총관리자 2015.05.06 673
546 hadoop cluster에 포함된 노드중에서 문제있는 decommission하는 방법및 절차 file 총관리자 2017.12.28 662
545 서버 5대에 solr 5.5.0 설치하고 index data를 HDFS에 저장/search하도록 설치/설정하는 방법(SolrCloud) 총관리자 2016.04.08 656
544 "File /user/hadoop/share/lib does not exist" 오류 해결방법 총관리자 2015.06.07 655
543 solr 인스턴스 기동후 shard에 서버가 정상적으로 할당되지 않는 경우 해결책 총관리자 2016.04.29 654
542 AIX 7.1에 Python 2.7.11설치하기 총관리자 2016.10.06 651
541 springframework를 이용한 war를 생성하는 build.gradle파일(참고용) 총관리자 2016.08.19 650

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

위로