The Offline Image Viewer is a tool to dump the contents of hdfs fsimage files to a human-readable format and provide read-only WebHDFS API in order to allow offline analysis and examination of an Hadoop cluster's namespace. The tool is able to process very large image files relatively quickly. The tool handles the layout formats that were included with Hadoop versions 2.4 and up. If you want to handle older layout formats, you can use the Offline Image Viewer of Hadoop 2.3. If the tool is not able to process an image file, it will exit cleanly. The Offline Image Viewer does not require a Hadoop cluster to be running; it is entirely offline in its operation.
The Offline Image Viewer provides several output processors:
Web processor launches a HTTP server which exposes read-only WebHDFS API. Users can specify the address to listen by -addr option (default by localhost:5978).
bash$ bin/hdfs oiv -i fsimage 14/04/07 13:25:14 INFO offlineImageViewer.WebImageViewer: WebImageViewer started. Listening on /127.0.0.1:5978. Press Ctrl+C to stop the viewer.
Users can access the viewer and get the information of the fsimage by the following shell command:
bash$ bin/hdfs dfs -ls webhdfs://127.0.0.1:5978/ Found 2 items drwxrwx--- - root supergroup 0 2014-03-26 20:16 webhdfs://127.0.0.1:5978/tmp drwxr-xr-x - root supergroup 0 2014-03-31 14:08 webhdfs://127.0.0.1:5978/user
To get the information of all the files and directories, you can simply use the following command:
bash$ bin/hdfs dfs -ls -R webhdfs://127.0.0.1:5978/
Users can also get JSON formatted FileStatuses via HTTP REST API.
bash$ curl -i http://127.0.0.1:5978/webhdfs/v1/?op=liststatus HTTP/1.1 200 OK Content-Type: application/json Content-Length: 252 {"FileStatuses":{"FileStatus":[ {"fileId":16386,"accessTime":0,"replication":0,"owner":"theuser","length":0,"permission":"755","blockSize":0,"modificationTime":1392772497282,"type":"DIRECTORY","group":"supergroup","childrenNum":1,"pathSuffix":"user"} ]}}
The Web processor now supports the following operations:
XML Processor is used to dump all the contents in the fsimage. Users can specify input and output file via -i and -o command-line.
bash$ bin/hdfs oiv -p XML -i fsimage -o fsimage.xml
This will create a file named fsimage.xml contains all the information in the fsimage. For very large image files, this process may take several minutes.
Applying the Offline Image Viewer with XML processor would result in the following output:
<?xml version="1.0"?> <fsimage> <NameSection> <genstampV1>1000</genstampV1> <genstampV2>1002</genstampV2> <genstampV1Limit>0</genstampV1Limit> <lastAllocatedBlockId>1073741826</lastAllocatedBlockId> <txid>37</txid> </NameSection> <INodeSection> <lastInodeId>16400</lastInodeId> <inode> <id>16385</id> <type>DIRECTORY</type> <name></name> <mtime>1392772497282</mtime> <permission>theuser:supergroup:rwxr-xr-x</permission> <nsquota>9223372036854775807</nsquota> <dsquota>-1</dsquota> </inode> ...remaining output omitted...
Flag | Description |
-i|--inputFile input file | Specify the input fsimage file to process. Required. |
-o|--outputFile output file | Specify the output filename, if the specified output processor generates one. If the specified file already exists, it is silently overwritten. (output to stdout by default) |
-p|--processor processor | Specify the image processor to apply against the image file. Currently valid options are Web (default), XML and FileDistribution. |
-addr address | Specify the address(host:port) to listen. (localhost:5978 by default). This option is used with Web processor. |
-maxSize size | Specify the range [0, maxSize] of file sizes to be analyzed in bytes (128GB by default). This option is used with FileDistribution processor. |
-step size | Specify the granularity of the distribution in bytes (2MB by default). This option is used with FileDistribution processor. |
-h|--help | Display the tool usage and help information and exit. |