OutputFormat (Apache Hadoop Main 3.5.0 API)

All Known Implementing Classes:: DBOutputFormat, FileOutputFormat, FilterOutputFormat, LazyOutputFormat, MapFileOutputFormat, MultipleOutputFormat, MultipleSequenceFileOutputFormat, MultipleTextOutputFormat, NullOutputFormat, SequenceFileAsBinaryOutputFormat, SequenceFileOutputFormat, TextOutputFormat

@Public @Stable public interface OutputFormat<K,V>

OutputFormat describes the output-specification for a Map-Reduce job.

The Map-Reduce framework relies on the OutputFormat of the job to:

Validate the output-specification of the job. For e.g. check that the output directory doesn't already exist.
Provide the RecordWriter implementation to be used to write out the output files of the job. Output files are stored in a FileSystem.

See Also:

Method Summary

Modifier and Type

Method

Description

void

checkOutputSpecs(FileSystem ignored, JobConf job)

Check for validity of the output-specification for the job.

RecordWriter<K,V>

getRecordWriter(FileSystem ignored, JobConf job, String name, Progressable progress)

Get the RecordWriter for the given job.

Method Details
- getRecordWriter
  
  RecordWriter<K,V> getRecordWriter(FileSystem ignored, JobConf job, String name, Progressable progress) throws IOException
  
  Get the RecordWriter for the given job.
  
  Parameters:
  
  ignored -
  
  job - configuration for the job whose output is being written.
  
  name - the unique name for this part of the output.
  
  progress - mechanism for reporting progress while writing to file.
  
  Returns:
  
  a RecordWriter to write the output for the job.
  
  Throws:
  
  IOException
- checkOutputSpecs
  
  void checkOutputSpecs(FileSystem ignored, JobConf job) throws IOException
  
  Check for validity of the output-specification for the job.
  This is to validate the output specification for the job when it is a job is submitted. Typically checks that it does not already exist, throwing an exception when it already exists, so that output is not overwritten.
  Implementations which write to filesystems which support delegation tokens usually collect the tokens for the destination path(s) and attach them to the job configuration.
  
  Parameters:
  
  ignored -
  
  job - job configuration.
  
  Throws:
  
  IOException - when output should not be attempted