메뉴 건너뛰기

Bigdata, Semantic IoT, Hadoop, NoSQL

Bigdata, Hadoop ecosystem, Semantic IoT등의 프로젝트를 진행중에 습득한 내용을 정리하는 곳입니다.
필요한 분을 위해서 공개하고 있습니다. 문의사항은 gooper@gooper.com로 메일을 보내주세요.


123,534,991건의 data를 hive를 통하여 hbase에 입력하는 중..

거의 5시간이 지나서.. 아래의 오류메세지가 발생했다.

직전에 disk full이 발생해서 일부 정리하고 하여 계속 진행중이 었는데.. 이게 문제를 일으켰나??

---->

disk full동안.. zookeeper와의 session이 timeout되었고.. zookeeper가 해당 node를 삭제했는데..

HMaster는 이전 session으로 zookeeper에게 요청햇으나.. 해당 znode값이 없어서.. down 되고..

다시 HRegionServer가 down되고..하는 연쇄 반응이 발생하여. 작업이 실패함..

start-hbase.sh을 실행하면.. HMaster는 잠시 올라왔다가 내려가는 문제가 있는데..이것은 HMaster가 Zookeeper에 connect를 못해서

znode를 생성하지 못해서 발생하는 문제임.

(hbase zkcli실행후 ls /hbase/table하면 아무것도 없음.. 여기에 table명이 들어 가야 하는데...)


--> 결과적으로 hbase가 깨진거 같은데.. 이럴경우 아래의 명령을 수행하면 복구가 된다.

(hbase hbck -fixMeta -fixAssignments 혹은 hbase hbck -repair)

-- 명령후 오류가  발생하면서 제대로 수행이 안되는 경우가 있는데.. 이때는 regionserver를 죽였다가 다시 살려준다.

(hbase-daemon.sh stop regionserver 이후 hbase-daemon.sh start regionserver)

 

다시정리하면....

 

1. /etc/hosts는 127.0.1.1 부분을 주석처리하고 실제 ip를 입력한다.

127.0.0.1       localhost
#127.0.1.1      bigdata-host
#127.0.0.1      bigdata-host
192.168.8.5     bigdata-host

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters


2. hbase/conf/regionservers파일에는 bigdata-host를 기록한다.

3. regionserver를 재시작한다.

  가. hbase-daemon.sh stop regionserver

  나.hbase-daemonsh start regionserver

4. hbase hbck -fixMeta -fixAssignments를 실행한다.

(실행후 prompt로 빠져나오지 않거나 끝부분에 Status: OK로 표시되지 않으면.. 4.를 반복 실행한다.)

(HMaster와 HRegionServer는 jps로 확인이 되어야 한다.)
 

-------------------------------------jps------------------------------------

hadoop@bigdata-host:~/hbase/logs$ jps
12180 Jps
565 DataNode
1849 HQuorumPeer
907 JobTracker
819 SecondaryNameNode
11834 RunJar
1276 TaskTracker
29547 Main
300 NameNode

------------------------------------------------------hbase shell 에서  로그------------------------------------------------------------------------

.......

2014-04-29 13:03:40,699 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:41,795 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:42,826 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:43,852 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:44,867 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:45,876 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:46,949 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:48,033 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:49,046 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:33:38,482 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:39,781 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:41,036 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:42,160 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:43,166 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:44,188 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:45,229 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:46,320 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:47,372 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:48,422 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
.........

2014-04-29 13:40:05,777 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 911.95 sec
2014-04-29 13:40:06,785 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:40:07,803 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:40:08,809 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:40:09,822 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:40:10,841 Stage-0 map = 100%,  reduce = 100%, Cumulative CPU 906.53 sec
MapReduce Total cumulative CPU time: 15 minutes 6 seconds 530 msec
Ended Job = job_201404290915_0002 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201404290915_0002
Examining task ID: task_201404290915_0002_m_000045 (and more) from job job_201404290915_0002
Examining task ID: task_201404290915_0002_m_000006 (and more) from job job_201404290915_0002

Task with the most failures(4):
-----
Task ID:
  task_201404290915_0002_m_000006

URL:
  http://localhost:50030/taskdetails.jsp?jobid=job_201404290915_0002&tipid=task_201404290915_0002_m_000006
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"year":"1992","month":"10","dayofmonth":"3","dayofweek":"6","deptime":"1859","crsdeptime":"1859","arrtime":"2027","crsarrtime":"2034","uniquecarrier":"AA","flightnum":"701","tailnum":"NA","actualelapsedtime":"148","crselapsedtime":"155","airtime":"NA","arrdelay":"-7","depdelay":"0","origin":"CMH","dest":"DFW","distance":"927","taxiin":"NA","taxiout":"NA","cancelled":"0","cancellationcode":"NA","diverted":"0","carrierdelay":"NA","weatherdelay":"NA","nasdelay":"NA","securitydelay":"NA","lateaircraftdelay":"NA"}
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"year":"1992","month":"10","dayofmonth":"3","dayofweek":"6","deptime":"1859","crsdeptime":"1859","arrtime":"2027","crsarrtime":"2034","uniquecarrier":"AA","flightnum":"701","tailnum":"NA","actualelapsedtime":"148","crselapsedtime":"155","airtime":"NA","arrdelay":"-7","depdelay":"0","origin":"CMH","dest":"DFW","distance":"927","taxiin":"NA","taxiout":"NA","cancelled":"0","cancellationcode":"NA","diverted":"0","carrierdelay":"NA","weatherdelay":"NA","nasdelay":"NA","securitydelay":"NA","lateaircraftdelay":"NA"}
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for h_airline,,99999999999999 after 10 tries.
 at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:240)
 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:515)
 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:571)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
 at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
 at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:652)
 ... 9 more
Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for h_airline,,99999999999999 after 10 tries.
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:980)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:885)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:889)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:846)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:174)
 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:133)
 at org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat.getHiveRecordWriter(HiveHBaseTableOutputFormat.java:82)
 at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:250)
 at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237)
 ... 20 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 44   Cumulative CPU: 906.53 sec   HDFS Read: 1509838444 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 15 minutes 6 seconds 530 msec

 

-------------------------------------------hbase-hadoop-master-bigdata-host.log--------------------------------------------------------------

2014-04-29 12:56:57,247 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 10 catalog row(s) and gc'd 0 unreferenced parent region(s)
2014-04-29 13:01:57,216 DEBUG org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting at row= for max=2147483647 rows using org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@62931a92
2014-04-29 13:01:57,222 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=2 average=2.0 mostloaded=2 leastloaded=2
2014-04-29 13:01:57,222 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1
2014-04-29 13:01:57,222 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1
2014-04-29 13:01:57,222 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=4 average=4.0 mostloaded=4 leastloaded=4
2014-04-29 13:01:57,222 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1
2014-04-29 13:01:57,222 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1
2014-04-29 13:01:57,284 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 10 catalog row(s) and gc'd 0 unreferenced parent region(s)
2014-04-29 13:33:36,780 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1791593ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,789 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1795938ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,781 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1787840ms instead of 1000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,819 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1899402ms instead of 300000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,781 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1899462ms instead of 300000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,781 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1796872ms instead of 60000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,781 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1787532ms instead of 1000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,919 DEBUG org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting at row= for max=2147483647 rows using org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@62931a92
2014-04-29 13:33:36,986 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 1827950ms for sessionid 0x145aad6efba0002, closing socket connection and attempting reconnect
.........

2014-04-29 13:33:39,220 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
2014-04-29 13:33:40,024 ERROR org.apache.hadoop.hbase.master.HMaster: Region server ^@^@bigdata-host,60020,1398730589059 reported a fatal error:
ABORTING region server bigdata-host,60020,1398730589059: regionserver:60020-0x145aad6efba0001 regionserver:60020-0x145aad6efba0001 received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

2014-04-29 13:33:40,406 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x145aad6efba0002 has expired, closing socket connection
2014-04-29 13:33:40,412 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, will automatically reconnect when needed.
2014-04-29 13:33:40,589 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: ZK session expired. This disconnect could have been caused by a network partition or a long-running GC pause, either way it's recommended that you verify your environment.
2014-04-29 13:33:40,505 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x145aad6efba0000 has expired, closing socket connection
2014-04-29 13:33:40,642 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2014-04-29 13:33:40,658 INFO org.apache.hadoop.hbase.master.HMaster: Primary Master trying to recover from ZooKeeper session expiry.
2014-04-29 13:33:40,669 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Closing dead ZooKeeper connection, session was: 0x145aad6efba0000
2014-04-29 13:33:40,683 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2014-04-29 13:33:40,707 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=master:60000-0x145aad6efba0000
2014-04-29 13:33:41,119 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Recreated a ZooKeeper, session is: 0x0
2014-04-29 13:33:41,126 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-04-29 13:33:41,136 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
2014-04-29 13:33:41,186 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x145aad6efba0013, negotiated timeout = 180000
2014-04-29 13:33:41,386 INFO org.apache.hadoop.hbase.master.ActiveMasterManager: Deleting ZNode for /hbase/backup-masters/bigdata-host,60000,1398730591004 from backup master directory
2014-04-29 13:33:41,418 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/backup-masters/bigdata-host,60000,1398730591004 already deleted, and this is not a retry
2014-04-29 13:33:41,419 INFO org.apache.hadoop.hbase.master.ActiveMasterManager: Master=bigdata-host,60000,1398730591004
2014-04-29 13:33:41,428 INFO org.apache.hadoop.hbase.master.SplitLogManager: timeout = 300000
2014-04-29 13:33:41,433 INFO org.apache.hadoop.hbase.master.SplitLogManager: unassigned timeout = 180000
2014-04-29 13:33:41,433 INFO org.apache.hadoop.hbase.master.SplitLogManager: resubmit threshold = 3
2014-04-29 13:33:41,444 INFO org.apache.hadoop.hbase.master.SplitLogManager: found 0 orphan tasks and 0 rescan nodes
2014-04-29 13:33:42,258 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@3242e74f
2014-04-29 13:33:42,402 INFO org.apache.hadoop.hbase.master.HMaster: Server active/primary master; bigdata-host,60000,1398730591004, sessionid=0x145aad6efba0013, cluster-up flag was=true
2014-04-29 13:33:42,432 INFO org.apache.hadoop.hbase.master.snapshot.SnapshotManager: Snapshot feature is not enabled, missing log and hfile cleaners.
......................

2014-04-29 13:38:40,922 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: locateRegionInMeta parentTable=-ROOT-, metaLocation={region=-ROOT-,,0.70236052, hostname=bigdata-host, port=60020}, attempt=3 of 140 failed; retrying after sleep of 2008 because: Connection refused
2014-04-29 13:38:40,925 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Looked up root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@62931a92; serverName=bigdata-host,60020,1398730589059
2014-04-29 13:38:41,293 WARN org.apache.hadoop.hbase.master.SplitLogManager: Expected at least4 tasks in ZK, but actually there are 0
2014-04-29 13:38:41,294 WARN org.apache.hadoop.hbase.master.SplitLogManager: No more task remaining (ZK or task map), splitting should have completed. Remaining tasks in ZK 0, active tasks in map 4
2014-04-29 13:38:41,294 WARN org.apache.hadoop.hbase.master.SplitLogManager: Interrupted while waiting for log splits to be completed
2014-04-29 13:38:41,294 WARN org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs in [hdfs://localhost:9000/hbase/.logs/bigdata-host,60020,1398730589059-splitting] installed = 4 but only 0 done
2014-04-29 13:38:41,319 FATAL org.apache.hadoop.hbase.master.HMaster: master:60000-0x145aad6efba0000 master:60000-0x145aad6efba0000 received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2014-04-29 13:38:41,346 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2014-04-29 13:38:41,346 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads
2014-04-29 13:38:41,347 INFO org.apache.hadoop.hbase.master.HMaster$2: bigdata-host,60000,1398730591004-BalancerChore exiting
2014-04-29 13:38:41,347 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60000
2014-04-29 13:38:41,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60000: exiting
2014-04-29 13:38:41,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60000: exiting
2014-04-29 13:38:41,347 WARN org.apache.hadoop.hbase.master.CatalogJanitor: Failed scan of catalog table
java.io.IOException: Giving up after tries=1
        at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:210)
        at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:188)
        at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:82)
        at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:67)
        at org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:126)
        at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:137)
        at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:93)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:67)
        at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:207)
        ... 8 more
2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 60000
2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60000: exiting
................................

2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 60000
2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60000: exiting
2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60000: exiting
2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60000: exiting
2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60000: exiting
2014-04-29 13:38:41,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60000: exiting
2014-04-29 13:38:41,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60000: exiting
2014-04-29 13:38:41,352 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 1 on 60000: exiting
2014-04-29 13:38:41,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60000: exiting
2014-04-29 13:38:41,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60000: exiting
2014-04-29 13:38:41,354 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 2 on 60000: exiting
2014-04-29 13:38:41,354 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 0 on 60000: exiting
2014-04-29 13:38:41,355 INFO org.apache.hadoop.hbase.master.CatalogJanitor: bigdata-host,60000,1398730591004-CatalogJanitor exiting
2014-04-29 13:38:41,356 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2014-04-29 13:38:41,358 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2014-04-29 13:38:41,358 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2014-04-29 13:38:41,379 INFO org.apache.hadoop.hbase.master.cleaner.LogCleaner: master-bigdata-host,60000,1398730591004.oldLogCleaner exiting
2014-04-29 13:38:41,379 INFO org.apache.hadoop.hbase.master.cleaner.HFileCleaner: master-bigdata-host,60000,1398730591004.archivedHFileCleaner exiting
2014-04-29 13:38:41,379 INFO org.apache.hadoop.hbase.master.HMaster: Stopping infoServer
2014-04-29 13:38:41,419 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60010
2014-04-29 13:38:41,487 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@3242e74f
2014-04-29 13:38:41,495 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimerUpdater: bigdata-host,60000,1398730591004.timerUpdater exiting
2014-04-29 13:38:41,500 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2014-04-29 13:38:41,500 INFO org.apache.zookeeper.ZooKeeper: Session: 0x145aad6efba0013 closed
2014-04-29 13:38:41,500 INFO org.apache.hadoop.hbase.master.HMaster: HMaster main thread exiting
2014-04-29 13:38:41,504 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: HMaster Aborted
java.lang.RuntimeException: HMaster Aborted
        at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2129)

 

----------------------------------------------------------------HMaster와 HRegionServer가 살아 있고..

아래의 문구가 반복되면 복구 명령을 날려준다.-----------------------------------------------

 

2014-04-30 14:06:58,098 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 139472 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.

번호 제목 글쓴이 날짜 조회 수
740 bananapi 5대(ubuntu계열 리눅스)에 yarn(hadoop 2.6.0)설치하기-ResourceManager HA/HDFS HA포함, JobHistory포함 총관리자 2015.04.24 19143
739 mapreduce appliction을 실행시 "is running beyond virtual memory limits" 오류 발생시 조치사항 총관리자 2017.05.04 16896
738 org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-root/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible. 구퍼 2013.03.11 14781
737 drop table로 삭제했으나 tablet server에는 여전히 존재하는 테이블 삭제방법 총관리자 2021.07.09 7550
» insert hbase by hive ... error occured after 5 hours..HMaster가 뜨지 않는 장애에 대한 복구 방법 총관리자 2014.04.29 7129
735 Resource temporarily unavailable(자원이 일시적으로 사용 불가능함) 오류조치 총관리자 2015.11.19 6823
734 HBase shell로 작업하기 구퍼 2013.03.15 5834
733 dr.who로 공격들어오는 경우 조치방법 file 총관리자 2018.06.09 5603
732 하둡 분산 파일 시스템을 기반으로 색인하고 검색하기 구퍼 2013.03.15 5573
731 [Decommission]시 시간이 많이 걸리면서(수일) Decommission이 완료되지 않는 경우 조치 총관리자 2018.01.03 5296
730 Ubuntu 16.04LTS 설치후 초기에 주어야 하는 작업(php, apache, mariadb설치및 OS보안설정등) file 총관리자 2017.05.23 5268
729 hive 2.0.1 설치및 mariadb로 metastore 설정 총관리자 2016.06.03 5184
728 Hive Query Examples from test code (2 of 2) 총관리자 2014.03.26 5002
727 Spark에서 Serializable관련 오류및 조치사항 총관리자 2017.04.21 4901
726 [gson]mongodb의 api를 이용하여 데이타를 가져올때 "com.google.gson.stream.MalformedJsonException: Unterminated object at line..." 오류발생시 조치사항 총관리자 2017.12.11 4397
725 import 혹은 export할때 hive파일의 default 구분자는 --input-fields-terminated-by "x01"와 같이 지정해야함 총관리자 2014.05.20 4244
724 checking for termcap functions library... configure: error: No curses/termcap library found 구퍼 2013.03.08 4120
723 sqoop작업시 hdfs의 개수보다 더많은 값이 중복되어 oracle에 입력되는 경우가 있음 총관리자 2014.09.02 4093
722 다수의 로그 에이전트로 부터 로그를 받아 각각의 파일로 저장하는 방법(interceptor및 multiplexing) 총관리자 2014.04.04 4089
721 .git폴더를 삭제하고 다시 git에 추가하고 서버에 반영하는 방법 총관리자 2017.06.19 4077

A personal place to organize information learned during the development of such Hadoop, Hive, Hbase, Semantic IoT, etc.
We are open to the required minutes. Please send inquiries to gooper@gooper.com.

위로