{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":791996747,"defaultBranch":"master","name":"aws-ofi-nccl","ownerLogin":"a-szegel","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2024-04-25T19:18:15.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/97712042?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1725574485.0","currentOid":""},"activityList":{"items":[{"before":null,"after":"aef78db767c0be02995ebfe52a96910c4563959d","ref":"refs/heads/stop-running-aws-ofi-nccl-functional-tests","pushedAt":"2024-09-05T22:14:45.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Stop running ofi nccl functional tests\n\nWe have an internal tracking ticket NCCLOFITICKET-642 which is tracking\nthe re-enablement of the functional tests. The functional tests are\ncausing kernel panics on p5 on multiple OS's. Remove this testing in\norder to stabalize the PR CI until it is fixed.\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Stop running ofi nccl functional tests"}},{"before":"2ea72448d45a3d4161ca99a916396ddd3a9f350c","after":"77d8cbd8d32d3d2d52947a80862401d23f095bd7","ref":"refs/heads/pin-p4-p5-amis","pushedAt":"2024-09-05T16:59:03.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Pin p4/p5 ami's to AMI's from 8/7/24\n\nIn order to attempt to stabalize the aws-ofi-nccl plugin GH PR CI, the\nplan is to pin the AMI's to 8/7 before we started running into a bunch\nof CUDA version related issues. When these are fixed, we will unpin the\nAMI's.\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Pin p4/p5 ami's to AMI's from 8/7/24"}},{"before":null,"after":"2ea72448d45a3d4161ca99a916396ddd3a9f350c","ref":"refs/heads/pin-p4-p5-amis","pushedAt":"2024-09-05T16:44:43.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Pin p4/p5 ami's to AMI's from 8/7/24\n\nIn order to attempt to stabalize the aws-ofi-nccl plugin GH PR CI, the\nplan is to pin the AMI's to 8/7 before we started running into a bunch\nof CUDA version related issues. When these are fixed, we will unpin the\nAMI's.\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Pin p4/p5 ami's to AMI's from 8/7/24"}},{"before":"daf037be8ca7056e7d451829f7ad35ec5caee318","after":"85667a51dacb4729f10577dd06e5f545e93c4b59","ref":"refs/heads/unpin-p3dn-ami","pushedAt":"2024-09-05T06:11:39.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Unpin al2 p3dn ami\n\nThe al2 p3dn AMI was pinned to a manually created AMI by running\nPortaFiducia's setup_instances.py on a p3dn.24xlarge. In order to keep\nup to date with OS security updates, EC2 Image Builder is used to create\nnew updated AMI's daily. B/c p3dn is in short supply, EC2 Image Builder\nuses a g3 instance to generate the p3dn AMI. The AMI building script\nwas working successfully when run on p3dn. The same script when run on\ng3dn would run without issues, but would not generate an AMI with CUDA\ninstalled correctly (nvidia-smi wouldn't work). Recently, we were able\nto fix the bug in how we were installing CUDA in our AMI Builder, which\nallows us to unpin the p3dn AMI, and use the newest AMI with the latest\ngreatest security fixes.\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Unpin al2 p3dn ami"}},{"before":null,"after":"daf037be8ca7056e7d451829f7ad35ec5caee318","ref":"refs/heads/unpin-p3dn-ami","pushedAt":"2024-09-03T18:14:17.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Unpin al2 p3dn ami\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Unpin al2 p3dn ami"}},{"before":null,"after":"88833e86d88ee38d9b950b319ea14f3f4a93a50c","ref":"refs/heads/lower-test-iters-to-5","pushedAt":"2024-08-30T19:45:53.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Decrease NCCL_TEST iterations to 5\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Decrease NCCL_TEST iterations to 5"}},{"before":"098d616bcd1a880c0bb3bd5710911435647a1204","after":"1fa8722dae3a866eb29c05c109526ff5c87c8989","ref":"refs/heads/ci-make-failure-happen-in-correct-box","pushedAt":"2024-08-22T19:48:10.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Make failures happen in correct stage\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Make failures happen in correct stage"}},{"before":"0cb22e4d8b8105d051674aac747c52f5d586a902","after":"098d616bcd1a880c0bb3bd5710911435647a1204","ref":"refs/heads/ci-make-failure-happen-in-correct-box","pushedAt":"2024-08-20T16:42:58.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Make failures happen in correct stage\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Make failures happen in correct stage"}},{"before":"e1351a8a8ae6c07184b73ac23ac01cf44b939f6f","after":"0cb22e4d8b8105d051674aac747c52f5d586a902","ref":"refs/heads/ci-make-failure-happen-in-correct-box","pushedAt":"2024-08-19T23:48:32.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":"TEST COMMIT: FORCE FAILURE DO NOT MERGE\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":"TEST COMMIT: FORCE FAILURE DO NOT MERGE"}},{"before":"0592acb2a7d49f2278b073fa953cf7076e03492e","after":"e1351a8a8ae6c07184b73ac23ac01cf44b939f6f","ref":"refs/heads/ci-make-failure-happen-in-correct-box","pushedAt":"2024-08-19T18:09:20.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":"TEST COMMIT: FORCE FAILURE DO NOT MERGE\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":"TEST COMMIT: FORCE FAILURE DO NOT MERGE"}},{"before":"ff0ecc5101558f5eeccb6a0e386a2fac18631698","after":"428a93ccb67250ee9ab2002a3309cfd2d8142b28","ref":"refs/heads/add-g4dn-to-ci","pushedAt":"2024-08-19T18:04:00.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Add g4dn testing to PR CI\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Add g4dn testing to PR CI"}},{"before":"3e454704b4b7e79293d49cd37279af03a5f283e7","after":"429eff22b96b267527c74990110010f17fe08adc","ref":"refs/heads/v1.9.x-aws-removeYamlFromCI","pushedAt":"2024-08-19T17:54:48.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":"[v1.9.x-aws] .ci/aws: Merge config file with Jenkinsfile\n\nSigned-off-by: Seth Zegelstein \n(cherry picked from commit 20150d532867cd8d4a4b93456207d659c9521e80)","shortMessageHtmlLink":"[v1.9.x-aws] .ci/aws: Merge config file with Jenkinsfile"}},{"before":null,"after":"2783b41aed7ac6d7e359619346d0b7775184323b","ref":"refs/heads/revert-trainium-tests","pushedAt":"2024-08-18T06:07:58.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":"Revert \".ci/aws: Add trainium tests to CI\"\n\nThe trainium tests are unstable, and causing CI to fail frequently,\nslowing down plugin development.\n\nThis reverts commit 59bfaa40636eeb7cc84fed064cb8cfa04c05f2a1.","shortMessageHtmlLink":"Revert \".ci/aws: Add trainium tests to CI\""}},{"before":"4e9c3014f19ca33f5eb1704e360a9afb0959b8a4","after":"0592acb2a7d49f2278b073fa953cf7076e03492e","ref":"refs/heads/ci-make-failure-happen-in-correct-box","pushedAt":"2024-08-15T17:57:42.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Make failures happen in correct stage\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Make failures happen in correct stage"}},{"before":null,"after":"4e9c3014f19ca33f5eb1704e360a9afb0959b8a4","ref":"refs/heads/ci-make-failure-happen-in-correct-box","pushedAt":"2024-08-15T17:55:24.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Make failures happen in correct stage\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Make failures happen in correct stage"}},{"before":null,"after":"ff0ecc5101558f5eeccb6a0e386a2fac18631698","ref":"refs/heads/add-g4dn-to-ci","pushedAt":"2024-08-12T16:18:59.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Add g4dn testing to PR CI\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Add g4dn testing to PR CI"}},{"before":"75bc8a010ac48aa3181c33b13b131d307b40ee99","after":"7c445337b474459585c1cf7ed3ce4494c2c97a33","ref":"refs/heads/add-trn1-tests","pushedAt":"2024-08-12T04:54:15.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Add trainium tests to CI\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Add trainium tests to CI"}},{"before":null,"after":"58c929d1ea8df708f89d7725739fca18205ba75d","ref":"refs/heads/update-nccl-version","pushedAt":"2024-08-07T00:44:51.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Bump NCCL Version to v2.22.3-1\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Bump NCCL Version to v2.22.3-1"}},{"before":"15f3b420c0a4f68ce0e36d1873423cb8e89f2dd3","after":"75bc8a010ac48aa3181c33b13b131d307b40ee99","ref":"refs/heads/add-trn1-tests","pushedAt":"2024-08-07T00:42:06.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Add trainium tests to CI\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Add trainium tests to CI"}},{"before":"8d1ab2926253b0976649bc4840c336a0c9000335","after":"15f3b420c0a4f68ce0e36d1873423cb8e89f2dd3","ref":"refs/heads/add-trn1-tests","pushedAt":"2024-08-07T00:04:06.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Add trainium tests to CI\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Add trainium tests to CI"}},{"before":"c8703168a501123b6b4ec60ddfad1151d8a899c6","after":"3e454704b4b7e79293d49cd37279af03a5f283e7","ref":"refs/heads/v1.9.x-aws-removeYamlFromCI","pushedAt":"2024-08-07T00:02:27.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":"[v1.9.x-aws] .ci/aws: Merge config file with Jenkinsfile\n\nSigned-off-by: Seth Zegelstein \n(cherry picked from commit 20150d532867cd8d4a4b93456207d659c9521e80)","shortMessageHtmlLink":"[v1.9.x-aws] .ci/aws: Merge config file with Jenkinsfile"}},{"before":null,"after":"c8703168a501123b6b4ec60ddfad1151d8a899c6","ref":"refs/heads/v1.9.x-aws-removeYamlFromCI","pushedAt":"2024-08-07T00:00:16.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Merge config file with Jenkinsfile\n\nSigned-off-by: Seth Zegelstein \n(cherry picked from commit 20150d532867cd8d4a4b93456207d659c9521e80)","shortMessageHtmlLink":".ci/aws: Merge config file with Jenkinsfile"}},{"before":null,"after":"37451b2f6249cc501c34142fa553164edd6f064b","ref":"refs/heads/v1.10.x-aws-removeYamlFromCI","pushedAt":"2024-08-06T23:52:50.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":"[v1.10.x-aws] .ci/aws: Merge config file with Jenkinsfile\n\nSigned-off-by: Seth Zegelstein \n(cherry picked from commit 20150d532867cd8d4a4b93456207d659c9521e80)","shortMessageHtmlLink":"[v1.10.x-aws] .ci/aws: Merge config file with Jenkinsfile"}},{"before":"2dcef505ba19f3420e1b033ef4cdb66bfb2129f0","after":"c62fe106d14410f944651324ba807aad7f02625b","ref":"refs/heads/remove-yaml-fromCI","pushedAt":"2024-08-06T16:29:49.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Merge config file with Jenkinsfile\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Merge config file with Jenkinsfile"}},{"before":"c677ff36246656449bac39bc06d51100c9a8c23a","after":"2dcef505ba19f3420e1b033ef4cdb66bfb2129f0","ref":"refs/heads/remove-yaml-fromCI","pushedAt":"2024-08-06T04:22:15.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Merge config file with Jenkinsfile\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Merge config file with Jenkinsfile"}},{"before":null,"after":"c677ff36246656449bac39bc06d51100c9a8c23a","ref":"refs/heads/remove-yaml-fromCI","pushedAt":"2024-08-06T00:27:02.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Merge config file with Jenkinsfile\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Merge config file with Jenkinsfile"}},{"before":"0df41318a789ea8ead6500e36c1cf6f2c5e04be6","after":"8d1ab2926253b0976649bc4840c336a0c9000335","ref":"refs/heads/add-trn1-tests","pushedAt":"2024-08-05T19:32:14.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".ci/aws: Add trainium tests to CI\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".ci/aws: Add trainium tests to CI"}},{"before":"f54c70ebd0a0b99334635c6456b973e7b1e66685","after":"0df41318a789ea8ead6500e36c1cf6f2c5e04be6","ref":"refs/heads/add-trn1-tests","pushedAt":"2024-08-01T04:33:10.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".github/workflows: Remove EPEL from AL2 b/c it doesn't exist anymore\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".github/workflows: Remove EPEL from AL2 b/c it doesn't exist anymore"}},{"before":null,"after":"56390cefcf23fa12e9e95c4d37102f7fb30596e5","ref":"refs/heads/delete-al2-epel","pushedAt":"2024-08-01T03:54:33.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":".github/workflows: Remove EPEL from AL2 b/c it doesn't exist anymore\n\nSigned-off-by: Seth Zegelstein ","shortMessageHtmlLink":".github/workflows: Remove EPEL from AL2 b/c it doesn't exist anymore"}},{"before":"2c191c970ea518d7d95b95cc6bff7662211b52d8","after":"4763d60bbe71ded606f3e88feb8df911e0c72b05","ref":"refs/heads/master","pushedAt":"2024-08-01T03:53:31.000Z","pushType":"push","commitsCount":109,"pusher":{"login":"a-szegel","name":"Seth Zegelstein","path":"/a-szegel","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97712042?s=80&v=4"},"commit":{"message":"rdma: wait for connect response msg in connect()\n\nThe RDMA protocol connection establishment code currently doesn't wait\nfor the connect response msg (from `accept()` side) before completing.\n\nThis can lead to a situation where `connect()` completes, and the\ncorresponding `accept()` hangs because the connect side does not have\nenough recv buffers available to receive the connect response msg from\nthe accept side.\n\nWe particularly see a deadlock in zero-copy mode, where plugin is\nresponsible for posting all receive buffers, likely because plugin posts\nfewer receive buffers than Libfabric does.\n\nThis commit makes `connect` wait to return a valid `s_comm` until the\nconnect response message is received. It also removes code in `send` and\n`send_close` to handle cases where the `s_comm` connection is not yet\ncomplete; these cases aren't necessary if `connect` doesn't return\nbefore connection completion.\n\nSigned-off-by: Eric Raut ","shortMessageHtmlLink":"rdma: wait for connect response msg in connect()"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAErlew_gA","startCursor":null,"endCursor":null}},"title":"Activity ยท a-szegel/aws-ofi-nccl"}