Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.net.UnknownHostException: ec2.us-east-1.amazonaws.com #1503

Closed
caldwecr opened this issue Mar 12, 2018 · 16 comments
Closed

java.net.UnknownHostException: ec2.us-east-1.amazonaws.com #1503

caldwecr opened this issue Mar 12, 2018 · 16 comments
Labels
guidance Question that needs advice or information. response-requested Waiting on additional info or feedback. Will move to "closing-soon" in 5 days.

Comments

@caldwecr
Copy link

caldwecr commented Mar 12, 2018

Recently encountered an issue that seems related to this particular GitHub issue. (#813)

Circumstance is a redundant network (2N+1) in the midst of a true partition event. Access to EC2 was lost to some users within organization. (around 2018-03-02; see relevant AWS status incidents). After partition event was resolved user connectivity issues to EC2 resolved, but our existing Scala process (running the java-sdk) was unable to recover - this went on for at least 5 hours. However stopping the process and starting it again immediately restored the connectivity.

The relevant part of the stack traces:

Caused by: java.net.UnknownHostException: ec2.us-east-1.amazonaws.com
	at java.net.InetAddress.getAllByName0(InetAddress.java:1280) ~[na:1.8.0_131]
	at java.net.InetAddress.getAllByName(InetAddress.java:1192) ~[na:1.8.0_131]
	at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[na:1.8.0_131]
	at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27) ~[com.amazonaws.aws-java-sdk-core-1.11.129.jar:na]
	at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38) ~[com.amazonaws.aws-java-sdk-core-1.11.129.jar:na]
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:111) ~[org.apache.httpcomponents.httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) ~[org.apache.httpcomponents.httpclient-4.5.2.jar:4.5.2]
	at sun.reflect.GeneratedMethodAccessor137.invoke(Unknown Source) ~[na:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_131]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_131]
	at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) ~[com.amazonaws.aws-java-sdk-core-1.11.129.jar:na]
	at com.amazonaws.http.conn.$Proxy25.connect(Unknown Source) ~[na:na]
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) ~[org.apache.httpcomponents.httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) ~[org.apache.httpcomponents.httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[org.apache.httpcomponents.httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[org.apache.httpcomponents.httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[org.apache.httpcomponents.httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) ~[org.apache.httpcomponents.httpclient-4.5.2.jar:4.5.2]
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) ~[com.amazonaws.aws-java-sdk-core-1.11.129.jar:na]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1190) ~[com.amazonaws.aws-java-sdk-core-1.11.129.jar:na]
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030) ~[com.amazonaws.aws-java-sdk-core-1.11.129.jar:na]
	... 47 common frames omitted
2018-03-03 20:46:52,328 level=ERROR *OBFUSCATED*
com.amazonaws.SdkClientException: Unable to execute HTTP request: ec2.us-east-1.amazonaws.com
@millems
Copy link
Contributor

millems commented Mar 13, 2018

The SDK resolves hosts via InetAddress.getAllByName(), so it sounds like a DNS caching issue in the JVM or the OS.

I'd suggest checking your specific JVM implementation for its caching behavior to make sure it's not caching the unknown-host indefinitely. If it isn't, it may be a behavioral issue with the OS.

You can implement your own DnsResolver and set it in the ClientConfiguration if your JVM or OS makes it impossible to work around their behavior.

@caldwecr
Copy link
Author

I'll check into the JVM specific implementation details. The OS seems a bit unlikely since stopping and starting the process immediately resolved the issue.

We thought there was a possibility that the SDK was perhaps doing something clever (perhaps some caching, request sharding, or other optimization) where it integrates with the core Java libraries.

In particular these two lines caught our attention:

com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) ~[com.amazonaws.aws-java-sdk-core-1.11.129.jar:na]
com.amazonaws.http.conn.$Proxy25.connect(Unknown Source) ~[na:na]

I'll reply with the JVM specifics.

@caldwecr
Copy link
Author

Closing for now as I don't think we're going to get further in this thread. Thank you.

@millems
Copy link
Contributor

millems commented Mar 14, 2018

Cool, please let us know if you find a solution to this issue. We'd love to have an answer to share with other customers who encounter similar issues, or ways in which we might be able to mitigate the issue within the SDK.

@tj13
Copy link

tj13 commented Jul 2, 2018

@caldwecr any update for this issue?

@WoozyG
Copy link

WoozyG commented Jul 25, 2018

I see this same error, same stack, same source line #s, with a different service. It started this morning, without any code or configuration changes. It is for a process that starts an EC2 instance, runs a job, and shuts the instance back down once a night.

Now, I consistently get this similar stack trace every time I run the job. The only possible changes were SDK version bumps (currently at the latest, 1.11.372) and Amazon Linux system package changes (yum update is run every day).

The server in question can be resolved just fine from the command line immediately prior to running the Java code that then fails to resolve the same address:

email.us-east-1.amazonaws.com

System is an EC2 t2.small instance in a VPC, running Amzon Linux, uname:

4.9.77-41.59.amzn2.x86_64 #1 SMP Thu Feb 1 19:26:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

java -version:
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-b10)
OpenJDK 64-Bit Server VM (build 25.171-b10, mixed mode)

IAM policies are unchanged since the last successful run.

We build the SES client using the default builder, and call with a basic request with a plain text body.

AmazonSimpleEmailServiceClientBuilder.defaultClient()

stack trace:

com.amazonaws.SdkClientException: Unable to execute HTTP request: email.us-east-1.amazonaws.com
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1114)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1064)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
	at com.amazonaws.services.simpleemail.AmazonSimpleEmailServiceClient.doInvoke(AmazonSimpleEmailServiceClient.java:5123)
	at com.amazonaws.services.simpleemail.AmazonSimpleEmailServiceClient.invoke(AmazonSimpleEmailServiceClient.java:5099)
	at com.amazonaws.services.simpleemail.AmazonSimpleEmailServiceClient.executeSendEmail(AmazonSimpleEmailServiceClient.java:3469)
	at com.amazonaws.services.simpleemail.AmazonSimpleEmailServiceClient.sendEmail(AmazonSimpleEmailServiceClient.java:3446)
[...our calling frames...]
Caused by: java.net.UnknownHostException: email.us-east-1.amazonaws.com
	at java.net.InetAddress.getAllByName0(InetAddress.java:1280)
	at java.net.InetAddress.getAllByName(InetAddress.java:1192)
	at java.net.InetAddress.getAllByName(InetAddress.java:1126)
	at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)
	at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:111)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
	at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
	at com.amazonaws.http.conn.$Proxy18.connect(Unknown Source)
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1236)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
	... 13 more

I can't think of any more relevant info atm, let me know if there are additional puzzle pieces I can fill in.

@WoozyG
Copy link

WoozyG commented Jul 25, 2018

I then immediately upon searching found this thread which seems to indicate Java (or in this case at least OpenJDK) is issuing the wrong kind of DNS query. It could be that AWS recently made a change to internal DNS resolving that fails to serve results for these requests?

@zoewangg
Copy link
Contributor

@WoozyG Hi, can you try setting the JVM TTL for DNS Name Lookups less than 60 seconds?

https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-jvm-ttl.html

@WoozyG
Copy link

WoozyG commented Jul 25, 2018

Java is restarted for every batch run, so I doubt that is a factor, but easy enough to test.

I shut down the instance, then started again, as is done nightly, changed the Java global setting in the java.security file per that link, and set it to 30. I then ran the SDK based app with the updated setting.

As before, after the timeout duration, it failed to resolve the host, and exited with the same stack trace.

@zoewangg
Copy link
Contributor

Thanks for the information. To confirm, so you can do nslookup from command line but running Java would fail and it would always fail for every request?

@zoewangg
Copy link
Contributor

What Java SDK version were you using previously?

@WoozyG
Copy link

WoozyG commented Jul 25, 2018

(solved)
It only fails for SES, interestingly. The Java process I'm running also accesses S3, RDS (aurora), DynamoDB, and SNS. Those all work.

I created a pcap file with tcpdump out of curiosity, and noticed something screwy with the internal AWS DNS for SES vs other services.

That led me to dig into the code for the process, which turns out to be creating a VPN connection out of AWS to a client data center, reading some data, then closing the VPN.

Further, the code is initializing clients for S3, SNS, and DynamoDB BEFORE initiating the VPN.

The SES call was the first thing done AFTER closing the VPN. That made me suspicious, since the VPN connection is handled outside Java by an OS package.

I modified the code to add a delay before initializing the SNS client, and suddenly it worked!

So in my case, the generic stack trace was truly because the name was not resolvable when the code ran, starting today. It looks like a timing issue. I see a huge quantity of system packages updated last night, so something in there changed the timing of the network interface realignment after the VPN process exits. I think my solutions going forward are to keep a delay after closing, or initialize the SES client early, with the other service clients. That may be the safest course for future-proofing.

Since automatic package updates are part of our security model, we realize there will be some periodic hiccups like this, so I guess this isn't an SDK issue from my perspective, unless there is more documentation that could be helpful somewhere. Probably this thread is the most helpful if others see this stack, since there are any number of reasons DNS lookups could fail.

Thank you for your questions, they got me thinking in the right direction.

@WoozyG
Copy link

WoozyG commented Jul 26, 2018

It started failing while using SDK version 1.11.252, and updating to 1.11.372 made no difference (see previous comment as to how it was a process flow timing issue)

@zoewangg
Copy link
Contributor

Glad it got solved! Thank you for the detailed explanation and it can definitely help other customers who might face the same issues.

As to the best practices of handling automatic package updates, AWS Forum is be a good place for this question, https://forums.aws.amazon.com/forum.jspa?forumID=30.

@ejoebstl
Copy link

I wanted to add my two cents, since I faced a similar issue in ECS:

In my case, I was running out of file descriptors (due to a bug), which made the DNS resolution fail. The hint was an additional "System Error" right before the UnknownHostExceptions started.

@srchase srchase added guidance Question that needs advice or information. needs-response and removed Question labels Jan 4, 2019
@debora-ito debora-ito added response-requested Waiting on additional info or feedback. Will move to "closing-soon" in 5 days. and removed needs-response labels Feb 25, 2020
@hanzalaAmway
Copy link

Hi Team
We have set this property networkaddress.cache.ttl=60 .But still we are seeing intermediate issues.Can you please let us know ,if we need to use any particular JVM version

java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information. response-requested Waiting on additional info or feedback. Will move to "closing-soon" in 5 days.
Projects
None yet
Development

No branches or pull requests

9 participants