Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gluster FS possibly causing high latency on high workload #127

Open
hosungsmsft opened this issue May 24, 2018 · 7 comments
Open

Gluster FS possibly causing high latency on high workload #127

hosungsmsft opened this issue May 24, 2018 · 7 comments
Labels
bug Something isn't working Priority 0 (will address) Pull requests welcome, failing that we will get to this ASAP.

Comments

@hosungsmsft
Copy link

We've been experiencing high latency with Gluster FS on the time-gated exam scenario. We don't know exactly if Gluster is the issue and why in that case, but replacing Gluster with NFS makes the high latency issue go away, so naturally suspecting Gluster.

We'll need to dive deeper on how Gluster works and why it might cause perf bottleneck like we've been experiencing. In the meantime, we might need to provide an alternative, like HA NFS that's described in places like the following:

These are pretty old, and still the top results I get from my related web search, making me think that this is not a widely used solution, but we should evaluate the option. Any PR of an ARM template deploying an HA NFS 2-VM cluster would be highly appreciated.

@enovationIT
Copy link
Contributor

just a suggestion, but you could enable accelerated networking on your already deployed environment and retest, that could give us a good insight into network latency impact on this

@SorraTheOrc
Copy link
Contributor

Thanks for the suggestion. I believe @hosungs is running some tests on this as I type.

@SorraTheOrc SorraTheOrc added bug Something isn't working Priority 0 (will address) Pull requests welcome, failing that we will get to this ASAP. labels May 24, 2018
@hosungsmsft
Copy link
Author

Yes. I already tried the AN on all VMs, and the latency number wasn't much better (and still far from being acceptable).

I even suspected if the different subnet for the Gluster VMs could be a reason, and tried another load testing with the gluster VMs moved to the web subnet, and the latency number was still far from being acceptable. At this point, I exhausted all possible workarounds to improve latency with Gluster FS under high workload...

@enovationIT
Copy link
Contributor

Could you please share your testing methodology and exact results? I would like to conduct similar test in our deployments for comparison.

@hosungsmsft
Copy link
Author

Sure, but the methodology has been always available at https://github.com/Azure/Moodle/tree/master/loadtest . In that README.md, there's a link to a shared Excel spreadsheet which shows many test results, though my most recent ones for this specific engagement are not entered there. The jMeter results zip files are rather big and not appropriate to check in here, so I hope that I can send them to you by email? Please feel free to email me. My email address is available on my GitHub profile page at https://github.com/hosungsmsft .

I should mention that the specific test plan for this Gluster perf issue is just checked in (merged to master) as https://github.com/Azure/Moodle/blob/master/loadtest/time-gated-exam-test.jmx . You'll need to change the host name and other params as needed for your case. If you are not familiar with jMeter, I'd be happy to walk you through on that in another way. Thanks.

@SorraTheOrc
Copy link
Contributor

On a similar topic. If you have specific scenarios that need testing we would encourage you to issue PR with a test plan. As a community we can't guarantee run all test plans, but the more we have access to the more we will collectively run as we validate improvements to the templates.

@UmakanthOS
Copy link
Contributor

We've moved to Azure Files Premium over GlusterFS as the default file-share. While the testing is getting wrapped up, will keep this issue open for any input and will close in the coming days/weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Priority 0 (will address) Pull requests welcome, failing that we will get to this ASAP.
Projects
None yet
Development

No branches or pull requests

4 participants