메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


Hadoop의 각 데몬을 기동하여 정상 작동중이다가 갑자기 DataNode가 아래와 같은 오류를 내면서 죽는 경우가 있다.

원인은 Heap메모리가 부족하여 발생하는 문제이다. 이때는 아래 내용을 참조하여 HEAP사이즈를 변경하여 각서버에 반영하고 Hadoop전체를 다시

재 기동시켜서 반영해준다.

(분제가 발생하는 노드는 전체 클러스트와 메모리는 같은데 HardDisk용량이 2.5배 정도 되는데 다른 노드에 비해서 데이타 유입량이 더 많아서

동한 다른 노드와 같은 설정을 하면 이용하면서 HEAP메모리 부족현상이 발생되는것으로 보임)


1. hadoop-env.sh에서
export HADOOP_HEAPSIZE을
export HADOOP_HEAPSIZE=3000 으로 설정한다.

export HADOOP_NAMENODE_INIT_HEAPSIZE=""을
export HADOOP_NAMENODE_INIT_HEAPSIZE="2000" 으로 설정한다.


2. mapred-env.sh에서
export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000을
export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=2000 으로 설정한다.


3. yarn-env.sh에서

JAVA_HEAP_MAX=-Xmx1000m 를
JAVA_HEAP_MAX=-Xmx2000m 으로 설정한다.

# YARN_HEAPSIZE=1000을
YARN_HEAPSIZE=2000 으로 설정한다.




-----------------------------------오류내용--------------------------------
2017-07-18 20:20:38,668 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1265ms
GC pool 'PS MarkSweep' had collection(s): count=2 time=1764ms
2017-07-18 20:20:32,678 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-605282214-XXX.XXX.XXX.XXX-1498555165989:blk_1076520983_2780234 received exception java.io.IOException: Premature EOF from inputStream
2017-07-18 20:20:30,934 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-605282214-XXX.XXX.XXX.XXX-1498555165989:blk_1076520963_2780213, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
2017-07-18 20:20:47,191 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-605282214-XXX.XXX.XXX.XXX-1498555165989:blk_1076520963_2780213 received exception java.io.IOException: Premature EOF from inputStream
2017-07-18 20:20:47,191 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-605282214-XXX.XXX.XXX.XXX-1498555165989 (Datanode Uuid d4f1b1f7-0636-483d-91e8-4780b73fb392) service to sda1/XXX.XXX.XXX.XXX:9000 beginning handsh
ake with NN
2017-07-18 20:20:48,073 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: sda2:50010:DataXceiver error processing WRITE_BLOCK operation  src: /XXX.XXX.XXX.XXX:43840 dst: /XXX.XXX.XXX.XXX:50010
java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:501)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:895)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:801)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
        at java.lang.Thread.run(Thread.java:745)
2017-07-18 20:20:48,073 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: sda2:50010:DataXceiver error processing WRITE_BLOCK operation  src: /166.104.112.69:45343 dst: /XXX.XXX.XXX.XXX:50010
java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:501)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:895)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:801)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:253)
        at java.lang.Thread.run(Thread.java:745)
2017-07-18 20:20:57,083 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool Block pool BP-605282214-XXX.XXX.XXX.XXX-1498555165989 (Datanode Uuid d4f1b1f7-0636-483d-91e8-4780b73fb392) service to sda1/XXX.XXX.XXX.XXX:9000 succe
ssfully registered with NN
2017-07-18 20:21:00,044 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1178ms
GC pool 'PS MarkSweep' had collection(s): count=2 time=1677ms
2017-07-18 20:21:05,452 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 2133ms
GC pool 'PS MarkSweep' had collection(s): count=3 time=2632ms
2017-07-18 20:21:12,342 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1229ms
GC pool 'PS MarkSweep' had collection(s): count=2 time=1729ms
2017-07-18 20:21:14,056 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1214ms
GC pool 'PS MarkSweep' had collection(s): count=2 time=1713ms
2017-07-18 20:21:28,386 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected exception in block pool Block pool BP-605282214-XXX.XXX.XXX.XXX-1498555165989 (Datanode Uuid d4f1b1f7-0636-483d-91e8-4780b73fb392) service to sda1/1
66.104.112.43:9000
java.lang.OutOfMemoryError: Java heap space
2017-07-18 20:21:28,386 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-605282214-XXX.XXX.XXX.XXX-1498555165989 (Datanode Uuid d4f1b1f7-0636-483d-91e8-4780b73fb392) service to sda1/166.1
04.112.43:9000
2017-07-18 20:21:29,958 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1205ms
GC pool 'PS MarkSweep' had collection(s): count=2 time=1704ms
2017-07-18 20:21:40,231 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1431ms
GC pool 'PS MarkSweep' had collection(s): count=2 time=1931ms
2017-07-18 20:21:45,597 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 3310ms
GC pool 'PS MarkSweep' had collection(s): count=4 time=3808ms
2017-07-18 20:21:55,800 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-605282214-XXX.XXX.XXX.XXX-1498555165989 (Datanode Uuid d4f1b1f7-0636-483d-91e8-4780b73fb392)
2017-07-18 20:21:58,707 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 5945ms
GC pool 'PS MarkSweep' had collection(s): count=12 time=13105ms
2017-07-18 20:22:00,356 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-605282214-XXX.XXX.XXX.XXX-1498555165989
2017-07-18 20:22:03,000 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2017-07-18 20:22:03,001 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2017-07-18 20:22:03,002 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1085ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=1233ms
2017-07-18 20:22:03,003 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at sda2/XXX.XXX.XXX.XXX
************************************************************/

번호 제목 날짜 조회 수
510 beeline으로 접근시 "User: gooper is not allowed to impersonate anonymous (state=08S01,code=0)"가 발생하면서 "No current connection"이 발생하는 경우 조치 2018.04.15 2490
509 Cloudera Manager 5.x설치시 embedded postgresql를 사용하는 경우의 관리정보 2018.04.13 2020
508 jupyter, zeppelin, rstudio를 이용하여 spark cluster에 job를 실행시키기 위한 정보 2018.04.13 5104
507 Cloudera Manager web UI의 언어를 한글에서 영문으로 변경하기 2018.04.03 2417
506 [우분투] suppoie 채굴 프로세스 발생시 자동으로 삭제하는 shell프로그램 2018.04.01 2674
505 Impala daemon기동시 "Could not create temporary timezone file"오류 발생시 조치사항 2018.03.29 2368
504 각 서버에 설치되는 cloudera서비스 프로그램 목록(CDH 5.14.0의 경우) 2018.03.29 1825
503 Cloudera설치중 실패로 여러번 설치하는 과정에 "Running in non-interactive mode, and data appears to exist in Storage Directory /dfs/nn. Not formatting." 오류가 발생시 조치하는 방법 2018.03.29 2570
502 Cloudera설치중에 "Error, CM server guid updated"오류 발생시 조치방법 2018.03.29 1577
501 Cloudera가 사용하는 서비스별 포트 2018.03.29 2365
500 Cloudera가 사용하는 서비스별 디렉토리 2018.03.29 1911
499 cloudera-scm-agent 설정파일 위치및 재시작 명령문 2018.03.29 2449
498 [CentOS] 네트워크 설정 2018.03.26 1866
497 Components of the Impala Server 2018.03.21 1944
496 HDFS Balancer설정및 수행 2018.03.21 1700
495 hadoop 클러스터 실행 스크립트 정리 2018.03.20 3325
494 HA(Namenode, ResourceManager, Kerberos) 및 보안(Zookeeper, Hadoop) 2018.03.16 1248
493 자주쓰는 유용한 프로그램 2018.03.16 2795
492 에러 추적(Error Tracking) 및 로그 취합(logging aggregation) 시스템인 Sentry 설치 2018.03.14 1464
491 update 샘플 2018.03.12 2993
위로