메뉴 건너뛰기

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.


1. hdfs-site.xml과 yarn-site.xml의 설정을 다시 확인한다.

 가. hdfs-site.xml

   <property>
     <name>dfs.hosts.exclude</name>
     <value>$HOME/hadoop/etc/hadoop/nodes.exclude</value>
   </property>
   <property>
     <name>dfs.host</name>
     <value>$HOME/hadoop/etc/hadoop/nodes.include</value>
   </property>


 나. yarn-site.xml

  <property>
   <name>yarn.resourcemanager.nodes.include-path</name>
   <value$HOME/hadoop/etc/hadoop/nodes.include</value>
  </property>
  <property>
   <name>yarn.resourcemanager.nodes.exclude-path</name>
   <value$HOME/hadoop/etc/hadoop/nodes.exclude</value>
  </property>


2. hdfs fsck -storagepolicies 혹은 hdfs fsck -blocks / 를 실행하여 Block의 상태를 확인한다.

 결과는 하단 참조


3. 2의 결과가 Status: CORRUPT이면 적절한 조치를 취한다.

 hdfs fsck -delete 혹은 hdfs fsck -move


4. 2을 다시 실행하여 Status: HEALTHY인지 확인한다.

  결과는 하단 참조


5. 필요시 Decommission과정을 다시 수행한다.

hdfs dfsadmin -refreshNodes

yarn rmadmin -refreshNodes


* Decommission이 수일 혹은 수주 동안 진행될수도 있는데 속도를 증가시키는 방법으로 hdfs-site.xml에 다음을 추가/반영시켜준다.

   (참고 : https://community.hortonworks.com/questions/102621/node-decommissioning-progressing-too-slowly.html)

   <property>
     <name>dfs.namenode.replication.max-streams</name>
     <value>50</value>
   </property>
  
   <property>
     <name>dfs.namenode.replication.max-streams-hard-limit</name>
     <value>100</value>
   </property>
  
   <property>
     <name>dfs.namenode.replication.work.multiplier.per.iteration</name>
     <value>200</value>
   </property>



-----------hdfs fsck -storagepolicies실행 결과(Status: CORRUPT)----------

.....

(생략)

....

/user/hadoop/spark/local-1510393605261: MISSING 1 blocks of total size 134217728 B......
/user/hadoop/spark/local-1511150952538:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1078958243_5217532. Target Replicas is 3 but found 1 replica(s).

/user/hadoop/spark/local-1511150952538:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1078982811_5242100. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1511756383245: MISSING 1 blocks of total size 8357126 B.....
/user/hadoop/spark/local-1511848071791:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079042189_5301478. Target Replicas is 3 but found 2 replica(s).
..
/user/hadoop/spark/local-1511858124646: MISSING 1 blocks of total size 40291 B..
/user/hadoop/spark/local-1511858518707:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079043514_5302803. Target Replicas is 3 but found 1 replica(s).
....
/user/hadoop/spark/local-1511861829455:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079044160_5303449. Target Replicas is 3 but found 1 replica(s).
..
/user/hadoop/spark/local-1511921506635:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079057453_5316742. Target Replicas is 3 but found 2 replica(s).
...
/user/hadoop/spark/local-1511931435456: MISSING 1 blocks of total size 702011 B..
/user/hadoop/spark/local-1511932067927:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079058984_5318273. Target Replicas is 3 but found 2 replica(s).
.......
/user/hadoop/spark/local-1511939175974:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079060057_5319346. Target Replicas is 3 but found 2 replica(s).
...
/user/hadoop/spark/local-1511942070784:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079060488_5319777. Target Replicas is 3 but found 2 replica(s).
.
.
/user/hadoop/spark/local-1511945803722:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079061302_5320591. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1511946633083:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079061444_5320733. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1512003403329:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079074161_5333450. Target Replicas is 3 but found 1 replica(s).
.
/user/hadoop/spark/local-1512008787877:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079074799_5334088. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1512018010728:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079076017_5335306. Target Replicas is 3 but found 2 replica(s).
............
/user/hadoop/spark/local-1512121416466:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079096405_5355695. Target Replicas is 3 but found 2 replica(s).
.....
/user/hadoop/spark/local-1512361519396:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079147739_5407029. Target Replicas is 3 but found 2 replica(s).
.....
/user/hadoop/spark/local-1512373036884:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079149109_5408399. Target Replicas is 3 but found 2 replica(s).
..
/user/hadoop/spark/local-1512373950155:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079149191_5408481. Target Replicas is 3 but found 1 replica(s).
..........................
/user/hadoop/spark/local-1512641606927:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079196301_5455600. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1512694548543:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079208186_5467485. Target Replicas is 3 but found 2 replica(s).
.....
/user/hadoop/spark/local-1512712721899:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079272068_5531367. Target Replicas is 3 but found 2 replica(s).
.....
/user/hadoop/spark/local-1512978676213:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079331781_5591080. Target Replicas is 3 but found 2 replica(s).
...
/user/hadoop/spark/local-1513040318768:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079345530_5604829. Target Replicas is 3 but found 2 replica(s).
...
/user/hadoop/spark/local-1513154800735:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079372104_5631403. Target Replicas is 3 but found 1 replica(s).
..
/user/hadoop/spark/local-1513239737761:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079391263_5650563. Target Replicas is 3 but found 2 replica(s).

/user/hadoop/spark/local-1513239737761:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079416745_5676128. Target Replicas is 3 but found 1 replica(s).

/user/hadoop/spark/local-1513239737761:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079454497_5713964. Target Replicas is 3 but found 1 replica(s).
....
/user/hadoop/spark/local-1513933165296:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079584889_5844405. Target Replicas is 3 but found 1 replica(s).
...
/user/pineone/gooper-test/icbms_2017-07-21_13-28-17.nq.gz:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1076598010_2857261. Target Replicas is 3 but found 2 replica(s).
.....
/user/pineone/in/tomcat-juli.jar:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1073741825_1001. Target Replicas is 3 but found 2 replica(s).
.......
/user/pineone/out3/part-r-00000:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1073744812_3988. Target Replicas is 3 but found 2 replica(s).
.......
.....Status: CORRUPT
 Total size:    1136215827409 B (Total open files size: 939529123 B)
 Total dirs:    1415
 Total files:   1864205
 Total symlinks:                0 (Files currently being written: 12)
 Total blocks (validated):      1864295 (avg. block size 609461 B) (Total open file blocks (not validated): 18)
  ********************************
  UNDER MIN REPL'D BLOCKS:      11 (5.900354E-4 %)
  dfs.namenode.replication.min: 1
  CORRUPT FILES:        11
  MISSING BLOCKS:       11
  MISSING SIZE:         321480184 B
  ********************************
 Minimally replicated blocks:   1864284 (99.99941 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       545406 (29.255348 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.661861
 Corrupt blocks:                0
 Missing replicas:              630358 (11.270713 %)
 Number of data-nodes:          8
 Number of racks:               1
FSCK ended at Tue Jan 02 15:42:32 KST 2018 in 196868 milliseconds
FSCK ended at Tue Jan 02 15:42:32 KST 2018 in 196868 milliseconds
fsck encountered internal errors!


Fsck on path '/' FAILED



-----------hdfs fsck -storagepolicies실행 결과(Status: HEALTHY)----------

.....

(생략)

....

/user/hadoop/spark/local-1513239737761:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079416745_5676128. Target Replicas is 3 but found 1 replica(s).

/user/hadoop/spark/local-1513239737761:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079454497_5713964. Target Replicas is 3 but found 1 replica(s).
....
/user/hadoop/spark/local-1513933165296:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079584889_5844405. Target Replicas is 3 but found 1 replica(s).
...
/user/hadoop/spark/local-1514451281961:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079748353_6021470. Target Replicas is 3 but found 1 replica(s).
.
/user/pineone/gooper-test/icbms_2017-07-21_13-28-17.nq.gz:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1076598010_2857261. Target Replicas is 3 but found 2 replica(s).
.....
/user/pineone/in/tomcat-juli.jar:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1073741825_1001. Target Replicas is 3 but found 2 replica(s).
.......
/user/pineone/out3/part-r-00000:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1073744812_3988. Target Replicas is 3 but found 2 replica(s).
........
....Status: HEALTHY
 Total size:    1136067627332 B (Total open files size: 1312 B)
 Total dirs:    1446
 Total files:   1864304
 Total symlinks:                0 (Files currently being written: 4)
 Total blocks (validated):      1864376 (avg. block size 609355 B) (Total open file blocks (not validated): 3)
 Minimally replicated blocks:   1864376 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       526644 (28.247736 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.682025
 Corrupt blocks:                0
 Missing replicas:              592825 (10.599168 %)
 Number of data-nodes:          8
 Number of racks:               1

Blocks satisfying the specified storage policy:
Storage Policy                  # of blocks       % of blocks
DISK:5(HOT)                   796652              42.7302%
DISK:3(HOT)                   595925              31.9638%
DISK:4(HOT)                   471514              25.2907%
DISK:6(HOT)                      274               0.0147%
DISK:1(HOT)                       10               0.0005%
DISK:2(HOT)                        1               0.0001%

All blocks satisfy specified storage policy.
FSCK ended at Tue Jan 02 17:08:00 KST 2018 in 184737 milliseconds


The filesystem under path '/' is HEALTHY

번호 제목 글쓴이 날짜 조회 수
47 Windows7 64bit 환경에서 Apache Hadoop 2.7.1설치하기 총관리자 2017.07.26 235
46 AIX 7.1에 Hadoop설치(정리중) 총관리자 2016.09.12 236
45 editLog의 문제로 발생하는 journalnode 기동 오류 발생시 조치사항 총관리자 2017.09.14 313
44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable원인 총관리자 2015.04.27 322
43 Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 TaskAttempt killed because it ran on unusable node 오류시 조치방법 총관리자 2017.04.06 324
42 Cleaning up the staging area file시 'cannot access' 혹은 'Directory is not writable' 발생시 조치사항 총관리자 2017.05.02 333
41 기준일자 이전의 hdfs 데이타를 지우는 shellscript 샘플 총관리자 2019.06.14 359
40 HDFS상의 /tmp폴더에 Permission denied오류가 발생시 조치사항 총관리자 2017.01.25 359
39 hadoop클러스터를 구성하던 서버중 HA를 담당하는 서버의 hostname등이 변경되어 문제가 발생했을때 조치사항 총관리자 2016.07.29 362
38 namenode오류 복구시 사용하는 명령 총관리자 2016.04.01 375
37 Job이 끝난 log을 볼수 있도록 설정하기 총관리자 2016.05.30 403
36 Error: Could not find or load main class nodemnager 가 발생할때 해결하는 방법 총관리자 2015.06.05 426
35 Hadoop - 클러스터 세팅및 기동 총관리자 2015.04.28 427
34 java.lang.IllegalArgumentException: Does not contain a valid host:port authority: master 오류해결방법 총관리자 2015.05.06 448
33 Incompatible clusterIDs오류 원인및 해결방법 총관리자 2016.04.01 490
32 Ubuntu 16.04 LTS에 4대에 Hadoop 2.8.0설치 총관리자 2017.05.01 520
31 hadoop의 data디렉토리를 변경하는 방법 총관리자 2014.08.24 536
30 Nodes of the cluster (unhealthy)중 1/1 log-dirs are bad: 오류 해결방법 총관리자 2015.05.17 599
29 hadoop 클러스터 실행 스크립트 정리 총관리자 2018.03.20 608
28 hadoop cluster에 포함된 노드중에서 문제있는 decommission하는 방법및 절차 file 총관리자 2017.12.28 660

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

위로