BulkDelete
The BulkDelete
interface provides an API to perform bulk delete of files/objects in an object store or filesystem.
The API is designed to match the semantics of the AWS S3 Bulk Delete REST API call, but it is not exclusively restricted to this store. This is why the “provides no guarantees” restrictions do not state what the outcome will be when executed on other stores.
org.apache.hadoop.fs.BulkDeleteSource
The interface BulkDeleteSource
is offered by a FileSystem/FileContext class if it supports the API. The default implementation is implemented in base FileSystem class that returns an instance of org.apache.hadoop.fs.impl.DefaultBulkDeleteOperation
. The default implementation details are provided in below sections.
@InterfaceAudience.Public @InterfaceStability.Unstable public interface BulkDeleteSource { BulkDelete createBulkDelete(Path path) throws UnsupportedOperationException, IllegalArgumentException, IOException; }
org.apache.hadoop.fs.BulkDelete
This is the bulk delete implementation returned by the createBulkDelete()
call.
@InterfaceAudience.Public @InterfaceStability.Unstable public interface BulkDelete extends IOStatisticsSource, Closeable { int pageSize(); Path basePath(); List<Map.Entry<Path, String>> bulkDelete(List<Path> paths) throws IOException, IllegalArgumentException; }
bulkDelete(paths)
if length(paths) > pageSize: throw IllegalArgumentException
All paths which refer to files are removed from the set of files.
FS'Files = FS.Files - [paths]
No other restrictions are placed upon the outcome.
The BulkDeleteSource
interface is exported by FileSystem
and FileContext
storage clients which is available for all FS via org.apache.hadoop.fs.impl.DefaultBulkDeleteSource
. For integration in applications like Apache Iceberg to work seamlessly, all implementations of this interface MUST NOT reject the request but instead return a BulkDelete instance of size >= 1.
Use the PathCapabilities
probe fs.capability.bulk.delete
.
store.hasPathCapability(path, "fs.capability.bulk.delete")
The need for many libraries to compile against very old versions of Hadoop means that most of the cloud-first Filesystem API calls cannot be used except through reflection -And the more complicated The API and its data types are, The harder that reflection is to implement.
To assist this, the class org.apache.hadoop.io.wrappedio.WrappedIO
has few methods which are intended to provide simple access to the API, especially through reflection.
public static int bulkDeletePageSize(FileSystem fs, Path path) throws IOException; public static int bulkDeletePageSize(FileSystem fs, Path path) throws IOException; public static List<Map.Entry<Path, String>> bulkDelete(FileSystem fs, Path base, Collection<Path> paths);
The default implementation which will be used by all implementation of FileSystem
of the BulkDelete
interface is org.apache.hadoop.fs.impl.DefaultBulkDeleteOperation
which fixes the page size to be 1 and calls FileSystem.delete(path, false)
on the single path in the list.
The S3A implementation is org.apache.hadoop.fs.s3a.impl.BulkDeleteOperation
which implements the multi object delete semantics of the AWS S3 API Bulk Delete For more details please refer to the S3A Performance documentation.