Here are some lower level details and hints on troubleshooting and tuning the S3A client.
The AWS SDK and the Apache HTTP components can be configured to log at more detail, as can S3A itself.
log4j.logger.org.apache.hadoop.fs.s3a=DEBUG log4j.logger.com.amazonaws.request=DEBUG log4j.logger.org.apache.http=DEBUG log4j.logger.org.apache.http.wire=ERROR
Be aware that logging HTTP headers may leak sensitive AWS account information, so should not be shared.
An example of this is covered in HADOOP-13871.
For public data, use curl:
curl -O https://landsat-pds.s3.amazonaws.com/scene_list.gz
Consider reducing the connection timeout of the s3a connection.
<property> <name>fs.s3a.connection.timeout</name> <value>15000</value> </property>
This may cause the client to react faster to network pauses.