Hadoop Logging
General
http://stackoverflow.com/questions/22918720/custom-log4j-appender-in-hadoop-2
默认情况下,HDFS/YARN的daemon(namenode,datanode,resourcemanager,nodemanager)都使用log4j.properties里面的log4j.appender.RFA(Rolling File Appender),
输出格式为
log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
在sbin/hadoop-daemon.sh:
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh
export HADOOP_ROOT_LOGGER=${HADOOP_ROOT_LOGGER:-"INFO,RFA"}
libexec/hadoop-config.sh
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.root.logger=${HADOOP_ROOT_LOGGER:-INFO,console}"
要求改loglevel
有2种办法:
- hadoop-env.sh里面,修改
HADOOP_ROOT_LOGGER=WARN,RFA
- hadoop.root.logger in log4j.properties
Get/Set the log level for each daemon
hadoop daemonlog -getlevel <host:port> <name>
This command internally connects to http://host:port/logLevel?log=name. 所以使用的都是HTTP端口。
hadoop daemonlog -getlevel 127.0.0.1:50070 org.apache.hadoop.hdfs.server.namenode
hadoop daemonlog -getlevel 127.0.0.1:50075 org.apache.hadoop.hdfs.server.namenode
hadoop daemonlog -getlevel 127.0.0.1:8088 org.apache.hadoop.hdfs.server.namenode
hadoop daemonlog -getlevel 127.0.0.1:8042 org.apache.hadoop.hdfs.server.namenode
Because:
dfs.namenode.http-address = 50070
dfs.datanode.http.address = 50075
yarn.resourcemanager.webapp.address = 8088
yarn.nodemanager.webapp.address = 8042
HDFS
Daemon logs
VM options:
- hadoop.log.maxfilesize
- hadoop.log.maxbackupindex
-Dhadoop.log.maxfilesize=1GB -Dhadoop.log.maxbackupindex=120
Audit log
HDFS audit log 仅仅NameNode需要,DataNode不需要。
hdfs-site.xml
- dfs.namenode.audit.loggers default
- default value: default
- List of classes implementing audit loggers that will receive audit events. These should be implementations of org.apache.hadoop.hdfs.server.namenode.AuditLogger. The special value "default" can be used to reference the default audit logger, which uses the configured log system. Installing custom audit loggers may affect the performance and stability of the NameNode. Refer to the custom logger's documentation for more details.
hadoop-env.sh:
export HDFS_AUDIT_LOGGER=INFO,RFAAUDIT // add this line to enable hdfs audit log
export HADOOP_NAMENODE_OPTS="..."
log4j.properties里面默认有配置:
#
# hdfs audit logging
#
hdfs.audit.logger=INFO,NullAppender
hdfs.audit.log.maxfilesize=256MB
hdfs.audit.log.maxbackupindex=20
log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.RFAAUDIT.MaxFileSize=${hdfs.audit.log.maxfilesize}
log4j.appender.RFAAUDIT.MaxBackupIndex=${hdfs.audit.log.maxbackupindex}
YARN
Daemon logs
调用关系:(1)yarn-daemon.sh -> (2)yarn-config.sh -> (3)hadoop-config.sh -> (4)yarn-env.sh -> (5)bin/yarn
sbin/yarn-daemon.sh:
export YARN_ROOT_LOGGER=${YARN_ROOT_LOGGER:-INFO,RFA}
yarn-config.sh:
. ${HADOOP_LIBEXEC_DIR}/hadoop-config.sh
hadoop-config.sh:
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.root.logger=${HADOOP_ROOT_LOGGER:-INFO,console}"
yarn-env.sh:
YARN_OPTS="$YARN_OPTS -Dhadoop.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"
YARN_OPTS="$YARN_OPTS -Dyarn.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"
bin/yarn:
elif [ "$COMMAND" = "resourcemanager" ] ; then
CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/rm-config/log4j.properties
CLASS='org.apache.hadoop.yarn.server.resourcemanager.ResourceManager'
YARN_OPTS="$YARN_OPTS $YARN_RESOURCEMANAGER_OPTS"
...
YARN_OPTS="$YARN_OPTS -Dhadoop.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"
YARN_OPTS="$YARN_OPTS -Dyarn.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"
...
exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $YARN_OPTS -classpath "$CLASSPATH" $CLASS "$@"
结论:
- yarn.root.logger没有在任何已经提供的log4j.properties中使用
- 如果要同时修改RM和NM的日志输出,修改YARN_ROOT_LOGGER,比较合适。
- 如果要单独修改ResourceManager的日志输出,修改YARN_RESOURCEMANAGER_OPTS,是比较合适的。
yarn.nodemanager.log-dirs
yarn.nodemanager.log-dirs = ${yarn.log.dir}/userlogs
Where to store container logs. An application's localized log directory will be found in ${yarn.nodemanager.log-dirs}/application${appid}. Individual containers' log directories will be below this, in directories named container{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container.
YarnConfiguration.NM_LOG_DIRS
container log
/Users/seanmao/workbench/vipdp-src-study/hadoop-2.5.0-cdh5.3.2-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/resources/container-log4j.properties
# Define some default values that can be overridden by system properties
hadoop.root.logger=DEBUG,CLA
# Define the root logger to the system property "hadoop.root.logger".
log4j.rootLogger=${hadoop.root.logger}, EventCounter
# Logging Threshold
log4j.threshold=ALL
#
# ContainerLog Appender
#
#Default values
yarn.app.container.log.dir=null
yarn.app.container.log.filesize=100
log4j.appender.CLA=org.apache.hadoop.yarn.ContainerLogAppender
log4j.appender.CLA.containerLogDir=${yarn.app.container.log.dir}
log4j.appender.CLA.totalLogFileSize=${yarn.app.container.log.filesize}
log4j.appender.CLA.layout=org.apache.log4j.PatternLayout
log4j.appender.CLA.layout.ConversionPattern=%d{ISO8601} %p [%t] %c: %m%n
log4j.appender.CRLA=org.apache.hadoop.yarn.ContainerRollingLogAppender
log4j.appender.CRLA.containerLogDir=${yarn.app.container.log.dir}
log4j.appender.CRLA.maximumFileSize=${yarn.app.container.log.filesize}
log4j.appender.CRLA.maxBackupIndex=${yarn.app.container.log.backups}
log4j.appender.CRLA.layout=org.apache.log4j.PatternLayout
log4j.appender.CRLA.layout.ConversionPattern=%d{ISO8601} %p [%t] %c: %m%n
#
# Event Counter Appender
# Sends counts of logging messages at different severity levels to Hadoop Metrics.
#
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
Audit log
log4j.properties配置,缺省没有log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger=${mapred.audit.logger},手工增加:
mapred.audit.logger=INFO,MRAUDIT
mapred.audit.log.maxfilesize=256MB
mapred.audit.log.maxbackupindex=20
log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger=${mapred.audit.logger}
log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger}
log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false
log4j.appender.MRAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log
log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.MRAUDIT.MaxFileSize=${mapred.audit.log.maxfilesize}
log4j.appender.MRAUDIT.MaxBackupIndex=${mapred.audit.log.maxbackupindex}
log aggregation
http://zh.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
Log4j revisit
Variable Substitution
All option values admit variable substitution. The syntax of variable substitution is similar to that of Unix shells. The string between an opening "${" and closing "}" is interpreted as a key. The value of the substituted variable can be defined as a system property or in the configuration file itself. The value of the key is first searched in the system properties, and if not found there, it is then searched in the configuration file being parsed. The corresponding value replaces the ${variableName} sequence. For example, if java.home system property is set to /home/xyz, then every occurrence of the sequence ${java.home} will be interpreted as /home/xyz.
system properties 优先于 log4j.properties中定义的
Reference:
- https://www.ibm.com/developerworks/community/blogs/Dougclectica/entry/controlling_log4j_log_level_at_runtime?lang=zh
- http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PropertyConfigurator.html
- variable substitution, the log4j complete manual, p53. https://books.google.com.hk/books?id=hZBimlxiyAcC&pg=PA53