-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java 8 does not properly flush buffers to disk on OSX #153
Comments
Does anyone have this same performance problem with java 8 in linux? I believe I could never (or almost since a long time ago) perform a full test in linux without interrupting it, because of |
Is this an issue you are seeing for all java versions or just java 8?
That was always my assumption, the change between java 8 and java 9 only seems to only be an issue on OSX. |
In java 11 too, and it also takes travis a while:
In my machine takes a lot longer, don't know if travis performs the test in an in memory filesystem or ssd disk:
7 minutes and counting ... |
Hmm, what type of hard disk is your Linux system using? Also what distribution and kernel version is it? |
It's a standard magnetic hard drive. Kernel and system programs are in an SSD. Kernel version is 5.4.0, distribution Ubuntu 20.04 Still running:
|
I tried debugging it once, it was not stuck just advancing really slow. |
Hmm, what does |
Not sure which part of the output are you interested in, I don't have any other problem with this setup. |
Looking for the output of that(model might be enough) to rule out it being a bad drive. |
I really don't think is the harddrive, I could not be using this machine for developing and as a daily driver if that was the case but since you insist:
I'm starting to think the problem is some kind of a deadlock with ext4 filesystem, building on the SSD disk and the same thing happens. It's being happening at least for the last year, so there were several kernel and OS upgrades. Will have to create some partition to test with another FS, would love to hear from someone running linux with ext4 FS that can run this test without problem. |
My bad, in fact it's using same the tmp dir in both cases 🤦, but it's just writing very slowly to the tmp file.
|
Most daily development doesn't involve forced flushing of dirty pages to the disk so your page cache will compensate for poor drive performance if you have enough ram.
Yeah, this is a DM-SMR drive(according to this at least) which means it will have terrible random write performance to the point where say rebuilding a ZFS array would likely fail due to it being so slow that the filesystem driver would think the drive died. This test forces a flush of dirty pages to the disk on Linux from my understanding while most writes in general just end up in the page cache and are written to disk periodically. A number of storage vendors recently got caught swapping out CMR drives for DM-SMR drives without telling people.
Weird, I suspect the fsync operation is somehow hitting the DM-SMR drive and slowing down the test somehow, maybe the way the call is being done is forcing a full fsync for all disks including those that the file isn't being written to? Should probably check the codepaths called by FileChannel.force in Linux or try seeing what's going on in strace. |
Yes, I didn't this was related to random access, clearly a magnetic drive is not good for that. In that case this test has no point to be run on developers machines since not everyone has tmp mounted in a SSD drive. I run it in my raspberry pi, that has tmp in the sdcard and at least it finished (in 22m) I would suggest to disable this test in the normal run and just running it on purpose. But more importantly we need to define if this is the expected behavior of the module in a magnetic drive or we need to fine tune it, and keep the test out until we do that |
Well it's more than just that, DM-SMR drives are far worse than even a typical low end magnetic drive from 10 years ago. It's kind of crazy that vendors are putting them in normal laptops/desktops when they are really only suitable as non-RAID infrequent access file archive drives.
Well I'm assuming the flushing to disk behavior is intentional to ensure data integrity. |
I refactored binlog to use the new nio file api's in #160, I'm assuming it probably won't make much difference for this particular issue but might be worth trying out as it should in theory have some performance advantages. |
Updated #160 again with some more changes to use the AsynchronousFileChannel api instead of the regular FileChannel api. |
In investigating a performance regression in Java 9 and newer it was discovered that on Java 8 does not properly flush the buffers to disk. So we should probably see if we can fix this on Java 8 and find a workaround for the performance issue on Java 9 and newer on OSX. See #152 and this for details.
The text was updated successfully, but these errors were encountered: