메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


LA가 전송한 로그 파일을 구별하여 각각의 파일로 저장하는 경우의 설정방법

 

 가. 수집되는 파일 :  /home/hadoop/log_data/log1.log, /home/hadoop/log_data/log1.log

 나. 저장되는 파일 : /home/hadoop/save_data/fs01, /home/hadoop/save_data/fs01

 다. 파일의 구분키 : state

 라. 구분값 : SMS, VOICE

 

1. 로그를 수집하는 LC의 flume설정파일(hadoop@bigdata-host:~/flume/conf$ cat flume-conf-multi.properties)

lc01.sources = avroGenSrc_src01
lc01.channels = fileChannel_fc01 fileChannel_fc02
lc01.sources.avroGenSrc_src01.selector.type=multiplexing
lc01.sources.avroGenSrc_src01.selector.header = state
lc01.sources.avroGenSrc_src01.selector.mapping.SMS=fileChannel_fc01
lc01.sources.avroGenSrc_src01.selector.mapping.VOICE=fileChannel_fc02

lc01.sinks = fileSink_fs01 fileSink_fs02

# For each one of the sources, the type is defined
lc01.sources.avroGenSrc_src01.type = avro
lc01.sources.avroGenSrc_src01.bind = localhost
lc01.sources.avroGenSrc_src01.port = 5555

# The channel can be defined as follows.
lc01.sources.avroGenSrc_src01.channels = fileChannel_fc01 fileChannel_fc02

# Each sink's type must be defined
lc01.sinks.fileSink_fs01.type = file_roll
lc01.sinks.fileSink_fs01.sink.directory=/home/hadoop/save_data/fs01
lc01.sinks.fileSink_fs01.sink.rollInterval = 10
lc01.sinks.fileSink_fs01.sink.batchSize = 10

#Specify the channel the sink should use
lc01.sinks.fileSink_fs01.channel = fileChannel_fc01

# Each channel's type is defined.
lc01.channels.fileChannel_fc01.type = file
lc01.channels.fileChannel_fc01.maxFileSize = 214643507
lc01.channels.fileChannel_fc01.checkpointDir = /home/hadoop/flume/fc01/checkpoint
lc01.channels.fileChannel_fc01.dataDirs = /home/hadoop/flume/fc01/data

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
lc01.channels.fileChannel_fc01.capacity = 100
lc01.channels.fileChannel_fc01.transactionCapacity = 10


# another setting
#lc02.sources = avroGenSrc_src02
#lc02.channels = fileChannel_fc02
#lc02.sinks = fileSink_fs02

# For each one of the sources, the type is defined
#lc02.sources.avroGenSrc_src02.type = avro
#lc02.sources.avroGenSrc_src02.bind = localhost
#lc02.sources.avroGenSrc_src02.port = 4444

# The channel can be defined as follows.
#lc01.sources.avroGenSrc_src01.channels = fileChannel_fc02
# Each sink's type must be defined
lc01.sinks.fileSink_fs02.type = file_roll
lc01.sinks.fileSink_fs02.sink.directory=/home/hadoop/save_data/fs02
lc01.sinks.fileSink_fs02.sink.rollInterval = 10
lc01.sinks.fileSink_fs02.sink.batchSize = 10

#Specify the channel the sink should use
lc01.sinks.fileSink_fs02.channel = fileChannel_fc02

# Each channel's type is defined.
lc01.channels.fileChannel_fc02.type = file
lc01.channels.fileChannel_fc02.maxFileSize = 214643507
lc01.channels.fileChannel_fc02.checkpointDir = /home/hadoop/flume/fc02/checkpoint
lc01.channels.fileChannel_fc02.dataDirs = /home/hadoop/flume/fc02/data
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
lc01.channels.fileChannel_fc02.capacity = 100
lc01.channels.fileChannel_fc02.transactionCapacity =10

--------------------------------

2. 로그를 전송하는 LA의 설정정보(hadoop@bigdata-host:~/flume/conf$ cat flume-conf-multi-agent.properties)

: la01과 la02의 두개 설정정보가 같이 들어있고 각각을 기동하여 2개의 LA가 파일을 읽어 들어는것으로 가정함


la01.sources = execGenSrc_la01
la01.channels = fileChannel_la01
la01.sinks = avroSink_la01

# For each one of the sources, the type is defined
la01.sources.execGenSrc_la01.type = exec
la01.sources.execGenSrc_la01.command = tail -f /home/hadoop/log_data/log1.log
la01.sources.execGenSrc_la01.batchSize = 10

la01.sources.execGenSrc_la01.interceptors = i1
la01.sources.execGenSrc_la01.interceptors.i1.type=static
la01.sources.execGenSrc_la01.interceptors.i1.key=state
la01.sources.execGenSrc_la01.interceptors.i1.value = SMS

# The channel can be defined as follows.
la01.sources.execGenSrc_la01.channels = fileChannel_la01

# Each sink's type must be defined
la01.sinks.avroSink_la01.type = avro
la01.sinks.avroSink_la01.hostname=localhost
la01.sinks.avroSink_la01.port=5555
la01.sinks.avroSink_la01.batch-size = 10

#Specify the channel the sink should use
la01.sinks.avroSink_la01.channel = fileChannel_la01

# Each channel's type is defined.
la01.channels.fileChannel_la01.type = file
la01.channels.fileChannel_la01.maxFileSize = 214643507
la01.channels.fileChannel_la01.checkpointDir = /home/hadoop/flume/la01/checkpoint
la01.channels.fileChannel_la01.dataDirs = /home/hadoop/flume/la01/data

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
la01.channels.fileChannel_la01.capacity = 10000
la01.channels.fileChannel_la01.transctionCapacity = 10000


#another logagent conf....
la02.sources = execGenSrc_la02
la02.channels = fileChannel_la02
la02.sinks = avroSink_la02

# For each one of the sources, the type is defined
la02.sources.execGenSrc_la02.type = exec
la02.sources.execGenSrc_la02.command = tail -f /home/hadoop/log_data/log2.log
la02.sources.execGenSrc_la02.batchSize = 10
la02.sources.execGenSrc_la02.interceptors = i2
la02.sources.execGenSrc_la02.interceptors.i2.type=static
la02.sources.execGenSrc_la02.interceptors.i2.key= state
la02.sources.execGenSrc_la02.interceptors.i2.value = VOICE


# The channel can be defined as follows.
la02.sources.execGenSrc_la02.channels = fileChannel_la02

# Each sink's type must be defined
la02.sinks.avroSink_la02.type = avro
la02.sinks.avroSink_la02.hostname=localhost
la02.sinks.avroSink_la02.port=5555
la02.sinks.avroSink_la02.batch-size = 10

#Specify the channel the sink should use
la02.sinks.avroSink_la02.channel = fileChannel_la02

# Each channel's type is defined.
la02.channels.fileChannel_la02.type = file
la02.channels.fileChannel_la02.maxFileSize = 214643507
la02.channels.fileChannel_la02.checkpointDir = /home/hadoop/flume/la02/checkpoint
la02.channels.fileChannel_la02.dataDirs = /home/hadoop/flume/la02/data

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the file channel
la02.channels.fileChannel_la02.capacity = 10000
la02.channels.fileChannel_la02.transctionCapacity = 10000

번호 제목 날짜 조회 수
427 import 혹은 export할때 hive파일의 default 구분자는 --input-fields-terminated-by "x01"와 같이 지정해야함 2014.05.20 6701
426 Last transaction was partial에 따른 Unable to load database on disk오류 발생시 조치사항 2018.08.03 6652
425 hadoop 2.6.0 기동(에코시스템 포함)및 wordcount 어플리케이션을 이용한 테스트 2015.05.05 6533
» 다수의 로그 에이전트로 부터 로그를 받아 각각의 파일로 저장하는 방법(interceptor및 multiplexing) 2014.04.04 6439
423 hadoop및 ecosystem에서 사용되는 명령문 정리 2014.05.28 6435
422 [impala]insert into db명.table명 select a, b from db명.table명 쿼리 수행시 "Memory limit exceeded: Failed to allocate memory for Parquet page index"오류 조치 방법 2023.05.31 6326
421 특정파일이 생성되어야 action이 실행되는 oozie job만들기(coordinator.xml) 2014.05.20 6260
420 Hadoop Cluster 설치 (Hadoop+Zookeeper+Hbase) file 2013.03.07 6097
419 HBase 설치하기 – Fully-distributed 2013.03.12 6013
418 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from...오류해결방법 2015.06.16 5943
417 Hbase Shell 명령 정리 2013.04.01 5924
416 [CDP7.1.7]impala-shell을 이용하여 kudu table에 insert/update수행시 발생하는 오류(Transport endpoint is not connected (error 107)) 발생시 확인할 내용 2023.11.30 5875
415 HBASE Client API : 기본 기능 정리 file 2013.04.01 5852
414 "java.net.NoRouteToHostException: 호스트로 갈 루트가 없음" 오류시 확인및 조치할 사항 2016.04.01 5809
413 hbase shell 필드 검색 방법 2015.05.24 5702
412 JobHistory 서버 기동시 HDFS상에 특정 폴더를 생성할 수 없어서 기동하지 못하는 경우 조치 2018.05.29 5651
411 이클립스에서 생성한 jar 파일 hadoop 으로 실행하기 file 2013.03.06 5537
410 의사분산모드에서 presto설치하기 2014.03.31 5515
409 AIX 7.1에 MariaDB 10.2 소스 설치 2016.09.24 5451
408 sqoop 1.4.4 설치및 테스트 2014.04.21 5445
위로