libhdfs is a JNI based C API for Hadoop’s Distributed File System (HDFS). It provides C APIs to a subset of the HDFS APIs to manipulate HDFS files and the filesystem. libhdfs is part of the Hadoop distribution and comes pre-compiled in $HADOOP_HDFS_HOME/lib/native/libhdfs.so
. libhdfs is compatible with Windows and can be built on Windows by running mvn compile
within the hadoop-hdfs-project/hadoop-hdfs
directory of the source tree.
The libhdfs APIs are a subset of the Hadoop FileSystem APIs.
The header file for libhdfs describes each API in detail and is available in $HADOOP_HDFS_HOME/include/hdfs.h
.
#include "hdfs.h" int main(int argc, char **argv) { hdfsFS fs = hdfsConnect("default", 0); const char* writePath = "/tmp/testfile.txt"; hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY |O_CREAT, 0, 0, 0); if(!writeFile) { fprintf(stderr, "Failed to open %s for writing!\n", writePath); exit(-1); } char* buffer = "Hello, World!"; tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); if (hdfsFlush(fs, writeFile)) { fprintf(stderr, "Failed to 'flush' %s\n", writePath); exit(-1); } hdfsCloseFile(fs, writeFile); }
See the CMake file for test_libhdfs_ops.c
in the libhdfs source directory (hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
) or something like: gcc above_sample.c -I$HADOOP_HDFS_HOME/include -L$HADOOP_HDFS_HOME/lib/native -lhdfs -o above_sample
The most common problem is the CLASSPATH
is not set properly when calling a program that uses libhdfs. Make sure you set it to all the Hadoop jars needed to run Hadoop itself as well as the right configuration directory containing hdfs-site.xml
. Wildcard entries in the CLASSPATH
are now supported by libhdfs.
libdhfs is thread safe.
Concurrency and Hadoop FS “handles”
The Hadoop FS implementation includes an FS handle cache which caches based on the URI of the namenode along with the user connecting. So, all calls to hdfsConnect
will return the same handle but calls to hdfsConnectAsUser
with different users will return different handles. But, since HDFS client handles are completely thread safe, this has no bearing on concurrency.
Concurrency and libhdfs/JNI
The libhdfs calls to JNI should always be creating thread local storage, so (in theory), libhdfs should be as thread safe as the underlying calls to the Hadoop FS.