Hadoop Benchmark

Overview

http://epaulson.github.io/HadoopInternals/benchmarks.html

NNThroughputBenchmark

org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark

Main class for a series of name-node benchmarks. Each benchmark measures throughput and average execution time of a specific name-node operation, e.g. file creation or block reports. The benchmark does not involve any other hadoop components except for the name-node. Each operation is executed by calling directly the respective name-node method. The name-node here is real all other components are simulated. Command line arguments for the benchmark include:

  • total number of operations to be performed,
  • number of threads to run these operations,
  • followed by operation specific input parameters.
  • -logLevel L specifies the logging level when the benchmark runs. The default logging level is Level.ERROR.
  • -UGCacheRefreshCount G will cause the benchmark to call NameNodeRpcServer.refreshUserToGroupsMappings after every G operations, which purges the name-node's user group cache. By default the refresh is never called.
  • -keepResults do not clean up the name-space after execution.
  • -useExisting do not recreate the name-space, use existing data.

The benchmark first generates inputs for each thread so that the input generation overhead does not effect the resulting statistics. The number of operations performed by threads is practically the same. Precisely, the difference between the number of operations performed by any two threads does not exceed 1. Then the benchmark executes the specified number of operations using the specified number of threads and outputs the resulting stats.

hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -op create -threads 10 -files 10

results matching ""

    No results matching ""