메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


1. flume설치파일 다운로드

apache-flume-1.5.2-bin.tar.gz


2. 압축풀기

tar xvfz apache-flume-1.5.2-bin.tar.gz


3. 링크생성

ln -s apache-flume-1.5.2-bin flume


4. 환경변수 수정(vi /home/hadoop/.bashrc)

export FLUME_HOME=/hadoop/flume

export PATH=$PATH:$FLUME_HOME/bin

* 변경사항 반영 : source /home/hadoop/.bashrc


5. Flume Conf

  cd $FLUME_HOME/conf

  cp flume-conf.properties.template flume.conf

  vi flume.conf


agent.sources = seqGenSrc

agent.channels = memoryChannel

agent.sinks = hdfsSink


# For each one of the sources, the type is defined

agent.sources.seqGenSrc.type = exec

agent.sources.seqGenSrc.command = tail -F /home/bigdata/hadoop-1.2.1/logs/hadoop-hadoop-namenode-localhost.localdomain.log

#가상분산환경에서 테스트용으로 잡은것.


# The channel can be defined as follows.

agent.sources.seqGenSrc.channels = memoryChannel


# Each sink's type must be defined

agent.sinks.hdfsSink.type = hdfs

agent.sinks.hdfsSink.hdfs.path = hdfs://mycluster/flume/data     #테스트용

agent.sinks.hdfsSink.rollInterval = 30

agent.sinks.hdfsSink.sink.batchSize = 100


#Specify the channel the sink should use

agent.sinks.hdfsSink.channel = memoryChannel


# Each channel's type is defined.

agent.channels.memoryChannel.type = memory


# Other config values specific to each type of channel(sink or source)

# can be defined as well

# In this case, it specifies the capacity of the memory channel

agent.channels.memoryChannel.capacity = 100000

agent.channels.memoryChannel.transactionCapacity = 10000


6. agent기동

[hadoop@master]$ flume-ng agent -conf-file ./flume.conf --name agent

Info: Including Hadoop libraries found via (/usr/local/flume/bin/hadoop) for HDFS access

Info: Excluding /usr/local/flume/share/usr/local/common/lib/slf4j-api-1.7.5.jar from classpath

Info: Excluding /usr/local/flume/share/usr/local/common/lib/slf4j-log4j12-1.7.5.jar from classpath

Info: Including HBASE libraries found via (/usr/local/hbase/bin/hbase) for HBASE access

Info: Excluding /usr/local/hbase/lib/slf4j-api-1.6.4.jar from classpath

Info: Excluding /usr/local/hbase/lib/slf4j-log4j12-1.6.4.jar from classpath

Info: Excluding /usr/local/flume/share/usr/local/common/lib/slf4j-api-1.7.5.jar from classpath

Info: Excluding /usr/local/flume/share/usr/local/common/lib/slf4j-log4j12-1.7.5.jar from classpath

.....

15/05/21 17:38:57 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting

15/05/21 17:38:57 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:./flume.conf

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Processing:hdfsSink

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Processing:hdfsSink

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Added sinks: hdfsSink Agent: agent

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Processing:hdfsSink

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Processing:hdfsSink

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Processing:hdfsSink

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent]

15/05/21 17:38:57 INFO node.AbstractConfigurationProvider: Creating channels

15/05/21 17:38:57 INFO channel.DefaultChannelFactory: Creating instance of channel memoryChannel type memory

15/05/21 17:38:57 INFO node.AbstractConfigurationProvider: Created channel memoryChannel

15/05/21 17:38:57 INFO source.DefaultSourceFactory: Creating instance of source seqGenSrc, type exec

15/05/21 17:38:57 INFO sink.DefaultSinkFactory: Creating instance of sink: hdfsSink, type: hdfs

15/05/21 17:38:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

15/05/21 17:38:58 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false

15/05/21 17:38:58 INFO node.AbstractConfigurationProvider: Channel memoryChannel connected to [seqGenSrc, hdfsSink]

15/05/21 17:38:58 INFO node.Application: Starting new configuration:{ sourceRunners:{seqGenSrc=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:seqGenSrc,state:IDLE} }} sinkRunners:{hdfsSink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3b201837 counterGroup:{ name:null counters:{} } }} channels:{memoryChannel=org.apache.flume.channel.MemoryChannel{name: memoryChannel}} }

15/05/21 17:38:58 INFO node.Application: Starting Channel memoryChannel

15/05/21 17:38:58 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: memoryChannel: Successfully registered new MBean.

15/05/21 17:38:58 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memoryChannel started

15/05/21 17:38:58 INFO node.Application: Starting Sink hdfsSink

15/05/21 17:38:58 INFO node.Application: Starting Source seqGenSrc


7. hdfs확인

cat aaa >> /home/hadoop/test.log로 데이타를 넣고 아래 명령으로 확인해본다.


[hadoop@master]$ hadoop fs -lsr /flume

drwxr-xr-x   - hadoop supergroup          0 2015-05-21 17:39 /flume/data

-rw-r--r--   3 hadoop supergroup        208 2015-05-21 17:39 /flume/data/FlumeData.1432197542415

번호 제목 날짜 조회 수
630 spark submit용 jar파일을 만드는 sbt 용 build.sbt설정 파일(참고용) 2016.08.19 4794
629 AIX 7.1에 Python 2.7.11설치하기 2016.10.06 4780
628 [Hive canary]Hive에 Metastore canary red alert및 hive log파일에 Duplicate entry '123456' for key 'NOTIFICATION_LOG_EVENT_ID'가 발생시 조치사항 2023.03.10 4777
627 Hadoop 설치 및 시작하기 file 2013.03.06 4776
626 hbase CustomFilter만들기 (0.98.X이상) 2015.05.08 4774
625 beeline실행시 User: root is not allowed to impersonate오류 발생시 조치사항 2016.06.03 4749
624 [shellscript]엑셀파일에서 여러줄에 존재하는 단어를 한줄의 문자열로 합치는 방법(comma로 구분) 2019.07.15 4735
623 9대가 hbase cluster로 구성된 서버에서 테스트 data를 halyard에 적재하고 테스트 하는 방법및 절차 2017.07.21 4733
622 physical memory used되면서 mapper가 kill되는 경우 오류 발생시 조치 2018.09.20 4730
621 [EncryptionZone]User:hdfs not allowed to do 'DECRYPT_EEK on 'enc_key'오류 2023.11.02 4718
620 solr설치및 적용관련 file 2014.09.27 4718
619 zookeeper 3.4.6 설치(3대) 2015.04.28 4710
618 hive metadata(hive, impala, kudu 정보가 있음) 테이블에서 db, table, owner, location를 조회하는 쿼리 2020.02.07 4707
617 기준일자 이전의 hdfs 데이타를 지우는 shellscript 샘플 2019.06.14 4705
616 서버 5대에 solr 5.5.0 설치하고 index data를 HDFS에 저장/search하도록 설치/설정하는 방법 2016.04.08 4704
615 hive테이블의 물리적인 위치인 HDFS에 여러개의 데이터 파일이 존재할때 한개의 파일로 merge하여 동일한 테이블에 입력하는 방법 2019.05.23 4701
614 Cloudera Manager설치및 Uninstall 방법(순서) 2018.05.28 4701
613 Ubuntu 16.04 LTS에 MariaDB 10.1설치 및 포트변경 및 원격접속 허용 2017.05.01 4698
612 [CDP7.1.3]Ranger WebUI에서 Error! Connection refused: Please check the KMS provider URL and whether the Ranager KMS is running발생시 조치 방법 2023.06.07 4687
611 hadoop cluster에 포함된 노드중에서 문제있는 decommission하는 방법및 절차 file 2017.12.28 4677
위로