Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Sarkars/batchnorm update #318

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

sayantan-nervana
Copy link
Contributor

Opening a PR for future ngraph BN API change: https://github.com/NervanaSystems/ngraph/pull/2046/files

Copy link
Contributor Author

@sayantan-nervana sayantan-nervana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future-proofing for this change: NervanaSystems/ngraph#2046

@@ -912,6 +912,81 @@ TEST(NNOps, Conv2DBackpropInputNHWCWithDilation) {
}
} // end of op Conv2DBackpropInputNHWCWithDilation

// FusedBatchNorm : Forward pass, training = true
// TODO fix this test
TEST(NNOps, DISABLED_FusedBatchNormNHWCTrainTrue) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does not pass. Sample output:

[ RUN      ] NNOps.DISABLE_FusedBatchNormNHWCTrainTrue
2018-11-19 01:03:39.177831: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool wit
h default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.   
/localdisk/sarkars/workspace1/tf_ngtf_7_mkl_1_12/ngraph-tf/test/test_utilities.h:126: Failure  
Value of: rt
  Actual: false
Expected: true
 TF output 20.955995559692383
 NG output 20.606725692749023
/localdisk/sarkars/workspace1/tf_ngtf_7_mkl_1_12/ngraph-tf/test/test_utilities.h:126: Failure  
Value of: rt
  Actual: false
Expected: true
 TF output 21.971120834350586
 NG output 21.604936599731445
[  FAILED  ] NNOps.DISABLE_FusedBatchNormNHWCTrainTrue (125 ms)

ng_y = make_shared<ng::op::GetOutputElement>(ng_batch_norm, 0);
ng_mean = make_shared<ng::op::GetOutputElement>(ng_batch_norm, 1);
ng_variance = make_shared<ng::op::GetOutputElement>(ng_batch_norm, 2);
shared_ptr<ngraph::Node> ng_y_out, ng_mean_out, ng_variance_out;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must misunderstand about the training op orders. In ngraph, shouldn't the output order be {gamma, beta, input}? Could you please explain this a little bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this PR was to sync with this PR in ngraph, which reorders batch norm: NervanaSystems/ngraph#2046 (comment)

But apparently that has been closed, and will come later, so I suppose we don't have to do anything now.

@sayantan-nervana
Copy link
Contributor Author

TESTNOW

1 similar comment
@sayantan-nervana
Copy link
Contributor Author

TESTNOW

@sayantan-nervana
Copy link
Contributor Author

Fail message:

self = <test_tanhgrad.TestTanhGradOp object at 0x7f72180e2b50>

    def test_tanhgrad_2d(self):
        y = constant_op.constant(
            self.generate_random_numbers(30, 1.0, 10.0), shape=[10, 3])
        y_delta = constant_op.constant(
            self.generate_random_numbers(30, 0.0, 10.0), shape=[10, 3])
    
        out = tanh_grad(y, y_delta)
    
        def run_test(sess):
            return sess.run(out)
    
>       assert np.allclose(
            self.with_ngraph(run_test), self.without_ngraph(run_test))
E       assert False
E        +  where False = <function allclose at 0x7f736018d488>(array([[-5.6790155e+02, -1.0593641e+02, -5.1666357e+02],\n       [-9.5448242e+0...e+02],\n       [-4.1972357e+02, -3.4175458e+02, -1.0048141e+02]], dtype=float32), array([[-5.6790155e+02, -1.0593641e+02, -5.1666357e+02],\n       [-9.5448242e+0...e+02],\n       [-4.1972357e+02, -3.4175458e+02, -1.0048141e+02]], dtype=float32))
E        +    where <function allclose at 0x7f736018d488> = np.allclose
E        +    and   array([[-5.6790155e+02, -1.0593641e+02, -5.1666357e+02],\n       [-9.5448242e+0...e+02],\n       [-4.1972357e+02, -3.4175458e+02, -1.0048141e+02]], dtype=float32) = <bound method TestTanhGradOp.with_ngraph of <test_tanhgrad.TestTanhGradOp object at 0x7f72180e2b50>>(<function run_test at 0x7f72300a0a28>)
E        +      where <bound method TestTanhGradOp.with_ngraph of <test_tanhgrad.TestTanhGradOp object at 0x7f72180e2b50>> = <test_tanhgrad.TestTanhGradOp object at 0x7f72180e2b50>.with_ngraph
E        +    and   array([[-5.6790155e+02, -1.0593641e+02, -5.1666357e+02],\n       [-9.5448242e+0...e+02],\n       [-4.1972357e+02, -3.4175458e+02, -1.0048141e+02]], dtype=float32) = <bound method TestTanhGradOp.without_ngraph of <test_tanhgrad.TestTanhGradOp object at 0x7f72180e2b50>>(<function run_test at 0x7f72300a0a28>)
E        +      where <bound method TestTanhGradOp.without_ngraph of <test_tanhgrad.TestTanhGradOp object at 0x7f72180e2b50>> = <test_tanhgrad.TestTanhGradOp object at 0x7f72180e2b50>.without_ngraph

test_tanhgrad.py:45: AssertionError
=============== 1 failed, 79 passed, 51 skipped in 4.62 seconds ================

@jianyinglang
Copy link
Contributor

One more comment: in ngraph core, it requires the input dimension >=2. I think it might be good if we add this as a confirmation constraint.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants