메뉴 건너뛰기

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.


StringBuffer의 값을 toString()을 이용하여 문자열로 변환할때 "java.lang.OutOfMemoryError: Java heap space"가 발생하는데 이것은 StringBuffer.toString()하는 과정에서 값을 복사하는데 이때 heap메모리가 부족해서 발생하는 오류이다.

이때는 spark-submit에서 --driver-memory 5g처럼 지정하는 메모리를 크게 증가시켜서 -Xmx값을 증가시켜준다.


------------------오류내용------------------------

[2018-02-01 10:12:40,253] [internal.Logging$class] [logError(#70)] [ERROR] Task 0 in stage 20.0 failed 1 times; aborting job
[2018-02-01 10:12:40,267] [internal.Logging$class] [logError(#91)] [ERROR] Error running job streaming job 1517447030000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 1 times, most recent failure: Lost task 0.0 in stage 20.0 (TID 20, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.StringBuffer.toString(StringBuffer.java:671)
        at com.pineone.icbms.sda.sf.TripleService.sendTripleFileToHalyard(TripleService.java:500)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe.sendTriples(AvroOneM2MDataSparkSubscribe.java:296)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe.access$100(AvroOneM2MDataSparkSubscribe.java:34)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe$ConsumerT.go(AvroOneM2MDataSparkSubscribe.java:202)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe$1.call(AvroOneM2MDataSparkSubscribe.java:101)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe$1.call(AvroOneM2MDataSparkSubscribe.java:93)
        at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
        at scala.collection.AbstractIterator.to(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
        at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
        at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$print$2$$anonfun$foreachFunc$3$1.apply(DStream.scala:734)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$print$2$$anonfun$foreachFunc$3$1.apply(DStream.scala:733)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:256)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:256)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:256)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:255)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.StringBuffer.toString(StringBuffer.java:671)
        at com.pineone.icbms.sda.sf.TripleService.sendTripleFileToHalyard(TripleService.java:500)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe.sendTriples(AvroOneM2MDataSparkSubscribe.java:296)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe.access$100(AvroOneM2MDataSparkSubscribe.java:34)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe$ConsumerT.go(AvroOneM2MDataSparkSubscribe.java:202)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe$1.call(AvroOneM2MDataSparkSubscribe.java:101)
        at com.pineone.icbms.sda.kafka.onem2m.AvroOneM2MDataSparkSubscribe$1.call(AvroOneM2MDataSparkSubscribe.java:93)
        at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
        at scala.collection.AbstractIterator.to(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
        at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
        ... 3 more

번호 제목 글쓴이 날짜 조회 수
501 Cloudera가 사용하는 서비스별 디렉토리 총관리자 2018.03.29 124
500 cloudera-scm-agent 설정파일 위치및 재시작 명령문 총관리자 2018.03.29 207
499 [CentOS] 네트워크 설정 총관리자 2018.03.26 108
498 Components of the Impala Server 총관리자 2018.03.21 56
497 HDFS Balancer설정및 수행 총관리자 2018.03.21 96
496 hadoop 클러스터 실행 스크립트 정리 총관리자 2018.03.20 566
495 HA(Namenode, ResourceManager, Kerberos) 및 보안(Zookeeper, Hadoop) 총관리자 2018.03.16 74
494 자주쓰는 유용한 프로그램 총관리자 2018.03.16 735
493 에러 추적(Error Tracking) 및 로그 취합(logging aggregation) 시스템인 Sentry 설치 총관리자 2018.03.14 65
492 update 샘플 총관리자 2018.03.12 789
491 이미지 관리 오픈소스 목록 총관리자 2018.03.11 103
490 Scala에서 countByWindow를 이용하기(예제) 총관리자 2018.03.08 148
489 Scala를 이용한 Streaming예제 총관리자 2018.03.08 44
488 scala application 샘플소스(SparkSession이용) 총관리자 2018.03.07 113
487 fuseki의 endpoint를 이용한 insert, delete하는 sparql예시 총관리자 2018.02.14 51
486 프로세스를 확인해서 프로세스를 삭제하는 shell script예제(cryptonight) 총관리자 2018.02.02 147
» spark-submit 실행시 "java.lang.OutOfMemoryError: Java heap space"발생시 조치사항 총관리자 2018.02.01 294
484 Could not compute split, block input-0-1517397051800 not found형태의 오류가 발생시 조치방법 총관리자 2018.02.01 139
483 Hadoop의 Datanode를 Decommission하고 나서 HBase의 regionservers파일에 해당 노드명을 지웠는데 여전히 "Dead regionser"로 표시되는 경우 처리 총관리자 2018.01.25 108
482 https용 인증서 발급 명령문 예시및 오류 메세지 총관리자 2018.01.24 60

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

위로