Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Mingshan/Adding resnet50 validation script #478

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

mingshan-wang
Copy link
Contributor

This PR added the validation script for resnet50 training with both synthetic data and real data.

The tf result references under tfGPU/ folder is collected running the same command in the script on TF GPU.

The patch to make the data loader for real data deterministic is also included, and also the patch to eliminate the average_loss encapsulates in the training graph.

Copy link
Contributor

@shresthamalik shresthamalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

@avijit-nervana avijit-nervana deleted the mingshan/validate_resnet50 branch April 12, 2019 16:33
@avijit-nervana avijit-nervana restored the mingshan/validate_resnet50 branch April 12, 2019 17:14
test/validate_resnet50/validation.py Show resolved Hide resolved
test/validate_resnet50/validation.py Show resolved Hide resolved
test/validate_resnet50/validation.py Show resolved Hide resolved
def check_validation_results(norm_dict, metric):
test_pass = True
for norm in norm_dict:
if norm_dict[norm] > 0.1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if we get ref accuracy = 75, and ng accuracy = 75.3, then is it a failure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is not comparing the accuracy. It compares the training loss value at every iteration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for loss. If ref loss is 1, and we get 0.8, is the test passing?

return total_loss, top1_acc, top5_acc


def parse_reference_file(filename):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_reference_file and parse_training_output can be a single function... I think they are separate because one parses a file, and the other parses string. Maybe we keep the string parsing function and just read the file into a string and reuse.

@avijit-nervana avijit-nervana added the Release Candidate PRs needed for the next release label Apr 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Release Candidate PRs needed for the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants