메뉴 건너뛰기

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.


1. 상황

 가. zookeeper의 zookeeper_trace.log파일이 너무커서 강제로 cat /dev/null > zookeeper_trace.log로 0로 만들고 zkServer.sh로 재기동 했더니 hadoop cluster전체가 이상하게 행동하고 journalnode데몬이 다운되고 이어서 namenode가 다운되는 등의 문제가 발생함.

 나. 많은 WARN발생후에 journalnode데몬이 기동되더라고 namenode가 비 정상적으로 작동하다가 다운되는 문제가 발생함.

 다. 이에 따른 영향으로 hadoop cluster나 zookeeper를 사용하는 각 데몬들이 문제가 발생하면서 다운되는 문제가 발생함.


2. 원인

journal node가 사용하는 editLog가 노드가 sync되지 않고 다른 경우에 발생하는 문제임


3. 조치 

아래와 같은 오류가 발생하면 문제없이 journalnode데몬이 수행되는 곳의 edit log(edits_로 시작되는 파일만)들을 journal node로 사용하는 모든 서버에 복사하고 다시 기동시켜준다.(특히 edits_inprogress_0000000000005681277.empty파일이 포함되어야 한다.)

(예, scp -P 10022 /data/hadoop/journal/data/mycluster/current/edits_* root@gsda2:/data/hadoop/journal/data/mycluster/current)



-------------nodename기동시 오류(INFO)내용---------------

2017-09-14 15:10:33,450 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 73044 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:34,452 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 74046 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:35,453 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 75047 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:36,454 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 76048 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:37,455 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 77049 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:38,456 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 78050 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:39,457 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 79051 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:40,458 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 80052 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:41,459 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 81053 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:42,460 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 82054 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:43,462 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 83056 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:44,463 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 84057 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:45,464 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 85058 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:46,465 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 86059 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:47,466 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 87060 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:48,467 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 88061 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:49,468 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 89062 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:50,469 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 90063 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:51,470 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 91064 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:52,471 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 92065 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:53,472 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 93066 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:54,473 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 94067 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:55,474 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 95068 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]

2017-09-14 15:10:56,475 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 96070 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [XXX.XXX.XXX.XXX:8485]



--------------journalnode 데몬 기동시 WARNING내용(editLog파일 각각 유사한 WARN이 발생함)---------

2017-09-14 17:02:32,882 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: After resync, position is 1036288

2017-09-14 17:02:32,882 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: Caught exception after scanning through 0 ops from /data/hadoop/journal/data/mycluster/current/edits_inprogress_0000000000004738264 while determining its valid length. Position was 1036288

java.io.IOException: Can't scan a pre-transactional edit log.

        at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4924)

        at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245)

        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.scanEditLog(FSEditLogLoader.java:1147)

        at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:329)

        at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:548)

        at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:195)

        at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:155)

        at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:93)

        at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:102)

        at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:190)

        at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:224)

        at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25431)

        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)

        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)

        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)

        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:422)

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)

        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2455)

2017-09-14 17:02:32,882 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: After resync, position is 1036288

2017-09-14 17:02:32,882 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: Caught exception after scanning through 0 ops from /data/hadoop/journal/data/mycluster/current/edits_inprogress_0000000000004738264 while determining its valid length. Position was 1036288

java.io.IOException: Can't scan a pre-transactional edit log.

        at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4924)

        at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245)

        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.scanEditLog(FSEditLogLoader.java:1147)

        at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanEditLog(EditLogFileInputStream.java:329)

        at org.apache.hadoop.hdfs.server.namenode.FileJournalManager$EditLogFile.scanLog(FileJournalManager.java:548)

        at org.apache.hadoop.hdfs.qjournal.server.Journal.scanStorageForLatestEdits(Journal.java:195)

        at org.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:155)

        at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:93)

        at org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:102)

        at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:190)

        at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:224)

        at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25431)

        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)

        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)

        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)

        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:422)

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)

        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2455)

2017-09-14 17:02:32,882 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: After resync, position is 1036288

2017-09-14 17:02:32,882 WARN org.apache.hadoop.hdfs.server.namenode.FSImage: Caught exception after scanning through 0 ops from /data/hadoop/journal/data/mycluster/current/edits_inprogress_0000000000004738264 while determining its valid length. Position was 1036288

java.io.IOException: Can't scan a pre-transactional edit log.

        at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LegacyReader.scanOp(FSEditLogOp.java:4924)

        at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.scanNextOp(EditLogFileInputStream.java:245)

번호 제목 글쓴이 날짜 조회 수
580 oozie 4.1 설치 - maven을 이용한 source compile on hadoop 2.5.2 with postgresql 9.3 총관리자 2015.04.30 862
579 ubuntu에 maven 3.6.1설치 및 환경변수 설정 총관리자 2019.06.02 856
578 Impala의 Queries탭에서 여러조건으로 쿼리 찾기 총관리자 2018.05.09 856
577 [SBT] SBT 사용법 정리(링크) 총관리자 2016.08.04 839
576 Hive JDBC Connection과 유형별 에러및 필요한 jar파일 총관리자 2021.05.24 828
575 dual table만들기 총관리자 2014.05.16 824
574 [shellscript]엑셀파일에서 여러줄에 존재하는 단어를 한줄의 문자열로 합치는 방법(comma로 구분) 총관리자 2019.07.15 811
573 update 샘플 총관리자 2018.03.12 810
572 hive query에서 mapreduce돌리지 않고 select하는 방법 총관리자 2014.05.23 810
571 [Mybatis]Spring과 연동하지 않고 Java+Mybatis 형태의 프로그램 샘플소스 총관리자 2016.09.01 808
570 oozie job 구동시 JA009: User: hadoop is not allowed to impersonate hadoop 오류나는 경우 총관리자 2014.06.02 807
569 sentry설정 방법및 활성화시 설정이 필요한 파일및 설정값, 계정생성 방법 총관리자 2018.08.16 778
568 mongodb aggregation query를 Java code로 변환한 샘플 총관리자 2016.12.15 777
567 python test.py실행시 "ImportError: No module named pyspark" 혹은 "ImportError: No module named py4j.protocol"등의 오류 발생시 조치사항 총관리자 2017.07.04 765
566 oracle to hive data type정리표 총관리자 2018.08.22 764
565 [Kerberos]Kerberos상태의 클러스터에 JDBC로 접근할때 케이스별 오류내용 총관리자 2020.02.14 761
564 beeline실행시 User: root is not allowed to impersonate오류 발생시 조치사항 총관리자 2016.06.03 759
563 [DBeaver 4.3.0]import/export시 "Client home is not specified for connection" 오류발생시 조치사항 총관리자 2017.12.21 753
562 Cloudera Manager web UI의 언어를 한글에서 영문으로 변경하기 총관리자 2018.04.03 743
561 hive metastore ERD file 총관리자 2018.09.20 729

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

위로