메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


1. flume설치파일 다운로드

apache-flume-1.5.2-bin.tar.gz


2. 압축풀기

tar xvfz apache-flume-1.5.2-bin.tar.gz


3. 링크생성

ln -s apache-flume-1.5.2-bin flume


4. 환경변수 수정(vi /home/hadoop/.bashrc)

export FLUME_HOME=/hadoop/flume

export PATH=$PATH:$FLUME_HOME/bin

* 변경사항 반영 : source /home/hadoop/.bashrc


5. Flume Conf

  cd $FLUME_HOME/conf

  cp flume-conf.properties.template flume.conf

  vi flume.conf


agent.sources = seqGenSrc

agent.channels = memoryChannel

agent.sinks = hdfsSink


# For each one of the sources, the type is defined

agent.sources.seqGenSrc.type = exec

agent.sources.seqGenSrc.command = tail -F /home/bigdata/hadoop-1.2.1/logs/hadoop-hadoop-namenode-localhost.localdomain.log

#가상분산환경에서 테스트용으로 잡은것.


# The channel can be defined as follows.

agent.sources.seqGenSrc.channels = memoryChannel


# Each sink's type must be defined

agent.sinks.hdfsSink.type = hdfs

agent.sinks.hdfsSink.hdfs.path = hdfs://mycluster/flume/data     #테스트용

agent.sinks.hdfsSink.rollInterval = 30

agent.sinks.hdfsSink.sink.batchSize = 100


#Specify the channel the sink should use

agent.sinks.hdfsSink.channel = memoryChannel


# Each channel's type is defined.

agent.channels.memoryChannel.type = memory


# Other config values specific to each type of channel(sink or source)

# can be defined as well

# In this case, it specifies the capacity of the memory channel

agent.channels.memoryChannel.capacity = 100000

agent.channels.memoryChannel.transactionCapacity = 10000


6. agent기동

[hadoop@master]$ flume-ng agent -conf-file ./flume.conf --name agent

Info: Including Hadoop libraries found via (/usr/local/flume/bin/hadoop) for HDFS access

Info: Excluding /usr/local/flume/share/usr/local/common/lib/slf4j-api-1.7.5.jar from classpath

Info: Excluding /usr/local/flume/share/usr/local/common/lib/slf4j-log4j12-1.7.5.jar from classpath

Info: Including HBASE libraries found via (/usr/local/hbase/bin/hbase) for HBASE access

Info: Excluding /usr/local/hbase/lib/slf4j-api-1.6.4.jar from classpath

Info: Excluding /usr/local/hbase/lib/slf4j-log4j12-1.6.4.jar from classpath

Info: Excluding /usr/local/flume/share/usr/local/common/lib/slf4j-api-1.7.5.jar from classpath

Info: Excluding /usr/local/flume/share/usr/local/common/lib/slf4j-log4j12-1.7.5.jar from classpath

.....

15/05/21 17:38:57 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting

15/05/21 17:38:57 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:./flume.conf

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Processing:hdfsSink

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Processing:hdfsSink

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Added sinks: hdfsSink Agent: agent

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Processing:hdfsSink

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Processing:hdfsSink

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Processing:hdfsSink

15/05/21 17:38:57 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent]

15/05/21 17:38:57 INFO node.AbstractConfigurationProvider: Creating channels

15/05/21 17:38:57 INFO channel.DefaultChannelFactory: Creating instance of channel memoryChannel type memory

15/05/21 17:38:57 INFO node.AbstractConfigurationProvider: Created channel memoryChannel

15/05/21 17:38:57 INFO source.DefaultSourceFactory: Creating instance of source seqGenSrc, type exec

15/05/21 17:38:57 INFO sink.DefaultSinkFactory: Creating instance of sink: hdfsSink, type: hdfs

15/05/21 17:38:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

15/05/21 17:38:58 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false

15/05/21 17:38:58 INFO node.AbstractConfigurationProvider: Channel memoryChannel connected to [seqGenSrc, hdfsSink]

15/05/21 17:38:58 INFO node.Application: Starting new configuration:{ sourceRunners:{seqGenSrc=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:seqGenSrc,state:IDLE} }} sinkRunners:{hdfsSink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3b201837 counterGroup:{ name:null counters:{} } }} channels:{memoryChannel=org.apache.flume.channel.MemoryChannel{name: memoryChannel}} }

15/05/21 17:38:58 INFO node.Application: Starting Channel memoryChannel

15/05/21 17:38:58 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: memoryChannel: Successfully registered new MBean.

15/05/21 17:38:58 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memoryChannel started

15/05/21 17:38:58 INFO node.Application: Starting Sink hdfsSink

15/05/21 17:38:58 INFO node.Application: Starting Source seqGenSrc


7. hdfs확인

cat aaa >> /home/hadoop/test.log로 데이타를 넣고 아래 명령으로 확인해본다.


[hadoop@master]$ hadoop fs -lsr /flume

drwxr-xr-x   - hadoop supergroup          0 2015-05-21 17:39 /flume/data

-rw-r--r--   3 hadoop supergroup        208 2015-05-21 17:39 /flume/data/FlumeData.1432197542415

번호 제목 날짜 조회 수
150 Ubuntu 16.04 LTS에 MariaDB 10.1설치 및 포트변경 및 원격접속 허용 2017.05.01 4700
149 Cloudera Manager설치및 Uninstall 방법(순서) 2018.05.28 4701
148 hive테이블의 물리적인 위치인 HDFS에 여러개의 데이터 파일이 존재할때 한개의 파일로 merge하여 동일한 테이블에 입력하는 방법 2019.05.23 4702
147 기준일자 이전의 hdfs 데이타를 지우는 shellscript 샘플 2019.06.14 4707
146 hive metadata(hive, impala, kudu 정보가 있음) 테이블에서 db, table, owner, location를 조회하는 쿼리 2020.02.07 4709
145 zookeeper 3.4.6 설치(3대) 2015.04.28 4716
144 solr설치및 적용관련 file 2014.09.27 4718
143 [EncryptionZone]User:hdfs not allowed to do 'DECRYPT_EEK on 'enc_key'오류 2023.11.02 4719
142 로그 파일에 대해 Elasticsearch 사용하기 2014.09.25 4720
141 Nodes of the cluster (unhealthy)중 1/1 log-dirs are bad: 오류 해결방법 2015.05.17 4722
140 [Hue admin]Add/Sync LDAP user, Sync LDAP users/groups 버튼 기능 설명 2023.08.09 4725
139 console명령과 API비교 2015.12.21 4731
138 physical memory used되면서 mapper가 kill되는 경우 오류 발생시 조치 2018.09.20 4732
137 9대가 hbase cluster로 구성된 서버에서 테스트 data를 halyard에 적재하고 테스트 하는 방법및 절차 2017.07.21 4735
136 [shellscript]엑셀파일에서 여러줄에 존재하는 단어를 한줄의 문자열로 합치는 방법(comma로 구분) 2019.07.15 4736
135 core 'gc_shard3_replica2' is already locked라는 오류가 발생할때 조치사항 2017.09.14 4739
134 beeline실행시 User: root is not allowed to impersonate오류 발생시 조치사항 2016.06.03 4749
133 Hadoop 설치 및 시작하기 file 2013.03.06 4776
132 AIX 7.1에 Python 2.7.11설치하기 2016.10.06 4781
131 hbase CustomFilter만들기 (0.98.X이상) 2015.05.08 4794
위로