cifar10 transfer learning example #2029

Deanplayerljx · 2020-06-22T15:24:38Z

No description provided.

PiperOrigin-RevId: 307995072

- Uses absolute import rather than relative import - Uses native keras model with the generic trainer. - Uses hyphen(-) instead of underscore(_) PiperOrigin-RevId: 308160455

PiperOrigin-RevId: 308169394

PiperOrigin-RevId: 308187069

PiperOrigin-RevId: 308194360

…hon function component. PiperOrigin-RevId: 308194919

- All pusher now always copy the model into a ModelPush artifact, if push was succeeded. - Introduced `Versioning` semantic to be used across multiple Pushers. There are two methods in Versioning: UNIX_TIMESTAMP and MODEL_ARTIFACT_ID. - Unified MLMD custom property: - `pushed` is a boolean flag whether push was successful or not. (not changed) - `pushed_model` points to the URI of the foreign serving system. (CAIP pusher and default pusher changed) - `pushed_version` stores the version value that is generated according to the Versioning semantic. It might be omitted if foreign serving system lacks the version concept (eg. BQML pusher). Closes #1553 PiperOrigin-RevId: 308236911

PiperOrigin-RevId: 308296241

PiperOrigin-RevId: 308337128

…imental`. PiperOrigin-RevId: 308662021

PiperOrigin-RevId: 308688404

PiperOrigin-RevId: 308713076

Please approve this CL. It will be submitted automatically, and its GitHub pull request will be marked as merged. Imported from GitHub PR #1667 Also included: - Fix some tests which fails when executed externally, mostly due to usage of testdata before CWD change, or required environment variables. - Refreshes test dependency and remove unnecessary ones. Copybara import of the project: - fff7078 Refreshes the contributing.md. by Zhitao Li <[email protected]> - e0b304f Merge fff7078 into dd6a3... by Zhitao <[email protected]> COPYBARA_INTEGRATE_REVIEW=#1667 from zhitaoli:check_test fff7078 PiperOrigin-RevId: 308741011

Usage: $ pylint <path_to_file> (pylintrc in working directory should be picked up by default.) The new pylintrc is based on TF pylintrc, but added some more exceptions to accomodate existing code base. But there are 345 warnings in tfx codebase as of today. The new github action will check all incoming PRs with pylint and pytest. These checks will run against modified / added files only. PiperOrigin-RevId: 308741511

PiperOrigin-RevId: 308768208

PiperOrigin-RevId: 308834214

PiperOrigin-RevId: 308870509

PiperOrigin-RevId: 308922417

PiperOrigin-RevId: 308925109

Because the dataset is quite small, it oftentimes drops under 0.9. (I've seen 0.68 in my test.). Lowering accuracy threshold to 0.6 to make tests stable. PiperOrigin-RevId: 308927472

PiperOrigin-RevId: 308943211

Needed because of upstream https://issues.apache.org/jira/browse/BEAM-4032, as the portability stager is now used for Dataflow jobs as well. PiperOrigin-RevId: 308968052

PiperOrigin-RevId: 309064363

PiperOrigin-RevId: 309103386

PiperOrigin-RevId: 309135057

…ders. The executor can be used with all container launchers. PiperOrigin-RevId: 309135987

PiperOrigin-RevId: 309241542

PiperOrigin-RevId: 309346614

`packaging` is added as a dependency of `pytest` PiperOrigin-RevId: 309421408

PiperOrigin-RevId: 309439164

1025KB

can we use mnist for image example? we just removed cifar10 from examples,

we want to keep a reasonable amount of examples otherwise it would be hard to maintain

davidzats-eng · 2020-06-23T15:52:43Z

@1025KB I understand the concern about too many example types. However, this example demonstrates how TFX can perform high-quality image classification on real-world datasets. As such, I don't think the MNSIT dataset is appropriate.

zhitaoli · 2020-06-23T20:00:27Z

@davidzats-eng Can you make sure you set yourself as owner of this example once pulled in? Can we also think about a secondary endorser (who is familiar with the modeling technique or so)?

davidzats-eng · 2020-06-23T20:32:25Z

@zhitaoli Ack will do.

tfx/examples/cifar10/cifar10_pipeline_native_keras.py

davidzats-eng · 2020-06-23T15:58:14Z

tfx/examples/cifar10/cifar10_pipeline_native_keras.py

+                  class_name='SparseCategoricalAccuracy',
+                  threshold=tfma.config.MetricThreshold(
+                      value_threshold=tfma.GenericValueThreshold(
+                          lower_bound={'value': 0.8})))


Lets also add in a change threshold for comletenes: https://github.com/tensorflow/model-analysis/blob/d18828330cd1efc47d35c8458350979a8d62fd15/tensorflow_model_analysis/proto/config.proto#L194

davidzats-eng · 2020-06-23T16:00:56Z

tfx/examples/cifar10/cifar10_utils_native_keras.py

+  return dataset
+
+def _build_keras_model() -> tf.keras.Model:
+  """Creates a MobileNet model pretrained on ImageNet for classifying


Nit: first row of comment should be standalone summary. If more content needed, then skip a line and add in more details.

davidzats-eng · 2020-06-23T16:02:23Z

tfx/examples/cifar10/cifar10_utils_native_keras.py

+
+  # Freeze all layers in the base model except last conv block
+  for layer in base_model.layers:
+    if '13' not in layer.name:


Is there a way to get this programmatically instead of hard-coding? Also some use-cases may decide to freeze more or less of the model. Can we make this easily changeable?

tfx/examples/cifar10/cifar10_utils_native_keras.py

davidzats-eng · 2020-06-23T16:07:28Z

tfx/examples/cifar10/cifar10_utils_native_keras.py

+  # We resize CIFAR10 images to match that size
+  image_features = tf.image.resize(image_features, [224, 224])
+
+  image_features = tf.ensure_shape(image_features, (None, 224, 224, 3))


Is this just a check or did the computation not work without it?

Hmm it's actually redundant, since resize will provide the tensor shape information. It will be necessary if we don't have the resize function, because then tfx will then complain that "all the dimensions except batch dimension should be known".

tfx/examples/cifar10/cifar10_pipeline_native_keras.py

davidzats-eng · 2020-07-05T21:25:11Z

tfx/examples/cifar10/cifar10_pipeline_native_keras.py

+                  class_name='SparseCategoricalAccuracy',
+                  threshold=tfma.config.MetricThreshold(
+                      value_threshold=tfma.GenericValueThreshold(
+                          lower_bound={'value': 0.8}),


What value for accuracy are we getting here? Can / should this be adjusted?

it was about 0.9, lower_bound of 0.8 should be fine

Was it consistently greater than 0.8 with the small dataset? If it wasn't then we will have flaky tests so lets adjust it down. If it was, then ok to leave as-is / resolve.

davidzats-eng · 2020-07-05T21:25:50Z

tfx/examples/cifar10/cifar10_pipeline_native_keras.py

+                          lower_bound={'value': 0.8}),
+                      change_threshold=tfma.GenericChangeThreshold(
+                          direction=tfma.MetricDirection.HIGHER_IS_BETTER,
+                          absolute={'value': -1e-10})))


This seems extremely tight, how about -1e-3 or so? By the way, where did this number come from? Because it seems too tight for most use-cases.

Sure we can do that, i copied it from the iris example i remember

@1025KB Should we change our examples? This seems to tight for most use-cases.

davidzats-eng · 2020-07-05T21:27:33Z

tfx/examples/cifar10/cifar10_utils_native_keras.py

+    model = _build_keras_model()
+
+  steps_per_epoch = _TRAIN_DATA_SIZE / _TRAIN_BATCH_SIZE
+  epochs = int(fn_args.train_steps / steps_per_epoch)


@1025KB is this the right thing to do here? IIRC there were issues with multi-epoch training.

davidzats-eng · 2020-07-05T21:27:53Z

tfx/examples/cifar10/cifar10_utils_native_keras.py

+
+from tfx.components.trainer.executor import TrainerFnArgs
+
+# cifar10 dataset has 50000 train records, and 10000 val records


s/val/eval?

hmmm i think the usual terms are train/validation/test sets?

val was unclear to me. so then s/val/validation :)

davidzats-eng · 2020-07-05T21:47:56Z

tfx/examples/cifar10/cifar10_utils_native_keras.py

+      input_shape=(224, 224, 3), include_top=False, weights='imagenet',
+      pooling='avg')
+
+  _freeze_model_by_percentage(base_model, 0.9)


My understanding is that the best practice is to train for a bit using the new final layer before unfreezing any part of the model to ensure that the new weights do not pollute the existing model. Can we please do that here?

I have tried that setting, i.e. train new layers for 6 epochs and unfreeze all layers for another 6 epochs. But it didn't show improvement... Should I still add it?

its good to highlight best practices so yes. and maybe write a comment that this part is "optional".

tfx/examples/cifar10/cifar10_utils_native_keras.py

davidzats-eng · 2020-07-05T21:51:00Z

tfx/examples/cifar10/cifar10_utils_native_keras.py

+  image_features = tf.map_fn(tf.keras.applications.mobilenet.preprocess_input,
+                             image_features, dtype=tf.float32)
+
+  outputs[transformed_name(IMAGE_KEY)] = (image_features)


Why the parents around (image_features)?

davidzats-eng · 2020-07-05T21:52:28Z

tfx/examples/cifar10/cifar10_utils_native_keras.py

+  # We resize CIFAR10 images to match that size
+  image_features = tf.image.resize(image_features, [224, 224])
+
+  image_features = tf.map_fn(tf.keras.applications.mobilenet.preprocess_input,


Nit: Can we combine the two map functions into one? It might be more readable?

davidzats-eng · 2020-07-05T21:54:03Z

tfx/examples/cifar10/cifar10_utils_native_keras.py

+
+  # The MobileNet we use was trained on ImageNet, which has image size 224 x 224.
+  # We resize CIFAR10 images to match that size
+  image_features = tf.image.resize(image_features, [224, 224])


There are multiple ways of resizing. Why do we believe this provides the best quality for our dataset?

See https://www.tensorflow.org/api_docs/python/tf/image/ResizeMethod for options.

hmm I think the default option (bilinear) is the most common practice. It worked well as the model got ~90% acc on validation set.

lets push to see how high we can go. Just because we get ~90% accuracy on an easy dataset doesn't mean that we can't push for more.

davidzats-eng · 2020-07-05T22:17:20Z

tfx/examples/cifar10/README.md

@@ -0,0 +1,54 @@
+
+# CIFAR-10 Transfer Learning Example


lets also add a unit test like so:
https://github.com/tensorflow/tfx/blob/master/tfx/examples/mnist/mnist_pipeline_native_keras_e2e_test.py

tfx/examples/cifar10/cifar10_pipeline_native_keras.py

davidzats-eng · 2020-07-06T02:50:08Z

tfx/examples/cifar10/cifar10_pipeline_native_keras.py

+                          lower_bound={'value': 0.8}),
+                      change_threshold=tfma.GenericChangeThreshold(
+                          direction=tfma.MetricDirection.HIGHER_IS_BETTER,
+                          absolute={'value': -1e-10})))


@1025KB Should we change our examples? This seems to tight for most use-cases.

davidzats-eng · 2020-07-06T02:50:56Z

tfx/examples/cifar10/cifar10_utils_native_keras.py

+
+from tfx.components.trainer.executor import TrainerFnArgs
+
+# cifar10 dataset has 50000 train records, and 10000 val records


val was unclear to me. so then s/val/validation :)

davidzats-eng · 2020-07-06T02:51:48Z

tfx/examples/cifar10/cifar10_utils_native_keras.py

+      input_shape=(224, 224, 3), include_top=False, weights='imagenet',
+      pooling='avg')
+
+  _freeze_model_by_percentage(base_model, 0.9)


its good to highlight best practices so yes. and maybe write a comment that this part is "optional".

tfx/examples/cifar10/cifar10_utils_native_keras.py

liamcrawford · 2020-07-07T00:40:49Z

tfx/examples/cifar10/cifar10_pipeline_native_keras.py

+                     serving_model_dir: Text,
+                     metadata_path: Text,
+                     direct_num_workers: int) -> pipeline.Pipeline:
+  """Implements the cifar10 image classification pipeline using TFX."""


nit: standardize capitalization in documentation (i.e., "CIFAR10" vs. "cifar10")

ditto in other files

tfx/examples/cifar10/README.md

davidzats-eng · 2020-07-11T18:41:43Z

tfx/examples/cifar10/README.md

+```
+Finally, run the `metadata_writer.py` script to write the metadata into model
+```
+python ~/cifar10/meta_data_writer -model_file PATH_TO_MODEL -label_file data/labels.txt -export_directory exported


It is great that you created this script. As a next step, lets see whether we can integrate it as part of the tfx pipeline. Lets leave a todo and follow-up with another pull request.

davidzats-eng · 2020-07-11T18:45:32Z

tfx/examples/cifar10/cifar10_pipeline_native_keras.py

+                  class_name='SparseCategoricalAccuracy',
+                  threshold=tfma.config.MetricThreshold(
+                      value_threshold=tfma.GenericValueThreshold(
+                          lower_bound={'value': 0.8}),


Was it consistently greater than 0.8 with the small dataset? If it wasn't then we will have flaky tests so lets adjust it down. If it was, then ok to leave as-is / resolve.

tfx/examples/cifar10/cifar10_pipeline_native_keras.py

tfx/examples/cifar10/cifar10_pipeline_native_keras_e2e_test.py

tfx/examples/cifar10/meta_data_writer.py

tfx/examples/cifar10/cifar10_utils_native_keras.py

davidzats-eng · 2020-07-11T21:36:42Z

tfx/examples/cifar10/cifar10_utils_native_keras.py

+
+  return dataset
+
+def _freeze_model_by_percentage(model: tf.keras.Model,


There is also a model.trainable parameter lets make sure that it is not overriding the settings here. Can you please verify?

I will verify that

model.trainable is like a meta switch that turns all parameters trainable or not. if we freeze some layers in the model and then call model.trainable=True, all layers will be unfrozen; if we call model.trainable=False after freezing some of layers, all layers will be frozen.

Makes sense. The question is what is the default and does it get in the way.

The default for model.trainable is True. As long as we don't modify it it will not get in the way.

googlebot · 2020-07-16T15:52:49Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

Deanplayerljx · 2020-07-16T15:57:06Z

@googlebot I fixed it.

googlebot · 2020-07-16T18:00:12Z

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

Deanplayerljx · 2020-07-18T02:12:01Z

It seems I messed up this PR with bunch of other people's commits. will open a new clean PR for this

tfx-copybara and others added 30 commits April 23, 2020 00:41

Internal changes.

36d8e8a

PiperOrigin-RevId: 307995072

Updates template notebook with the new release.

43792f4

- Uses absolute import rather than relative import - Uses native keras model with the generic trainer. - Uses hyphen(-) instead of underscore(_) PiperOrigin-RevId: 308160455

Removed python-snappy from [all] extra dependency list.

22a305d

PiperOrigin-RevId: 308169394

test only change.

ddfdfdc

PiperOrigin-RevId: 308187069

Implementation of TFJS Rewriter.

021f7cc

PiperOrigin-RevId: 308194360

Implement support for execution parameters and optional inputs in Pyt…

89f37ba

…hon function component. PiperOrigin-RevId: 308194919

Automated rollback of commit 307bda1

9158889

PiperOrigin-RevId: 308296241

internal change

783c979

PiperOrigin-RevId: 308337128

Move Python function component code to under `tfx.dsl.component.exper…

997440a

…imental`. PiperOrigin-RevId: 308662021

Automated rollback of commit a0a4da2

1c74e0a

PiperOrigin-RevId: 308688404

test only

34a286d

PiperOrigin-RevId: 308713076

test only

5c43b78

PiperOrigin-RevId: 308768208

internal clean up

0b13e84

PiperOrigin-RevId: 308834214

internal

a31d5c1

PiperOrigin-RevId: 308870509

Guard optional import error not to propagate to outer scope.

86ac08c

PiperOrigin-RevId: 308922417

Cleanup CAIP tutorial and add Dataflow quota note.

0431c18

PiperOrigin-RevId: 308925109

Lower accuracy threshold to 0.6.

65e5d2e

Because the dataset is quite small, it oftentimes drops under 0.9. (I've seen 0.68 in my test.). Lowering accuracy threshold to 0.6 to make tests stable. PiperOrigin-RevId: 308927472

test only

6f33907

PiperOrigin-RevId: 308943211

Do not use requirements.txt when specifying TFX dependency with Beam.

b2db929

Needed because of upstream https://issues.apache.org/jira/browse/BEAM-4032, as the portability stager is now used for Dataflow jobs as well. PiperOrigin-RevId: 308968052

Uses > Use

f314866

PiperOrigin-RevId: 309064363

Add execution properties to ExecutionOutput proto

0767d2e

PiperOrigin-RevId: 309103386

internal

4827c3c

PiperOrigin-RevId: 309135057

Added the TemplatedExecutorContainerSpec class that supports placehol…

993b055

…ders. The executor can be used with all container launchers. PiperOrigin-RevId: 309135987

no op

6c14fac

PiperOrigin-RevId: 309241542

Add KFP resource configuration discussion.

e2739f3

PiperOrigin-RevId: 309346614

Adds license of the package packaging.

fef0e5a

`packaging` is added as a dependency of `pytest` PiperOrigin-RevId: 309421408

Update TFX native keras tests

5d9896d

PiperOrigin-RevId: 309439164

davidzats-eng requested a review from liamcrawford June 22, 2020 16:23

1025KB reviewed Jun 22, 2020

View reviewed changes

davidzats-eng reviewed Jun 24, 2020

View reviewed changes

address concerns

c14b6df

davidzats-eng suggested changes Jul 5, 2020

View reviewed changes

davidzats-eng suggested changes Jul 6, 2020

View reviewed changes

liamcrawford reviewed Jul 7, 2020

View reviewed changes

Deanplayerljx added 9 commits July 7, 2020 20:31

address concerns and add finetuning

dfe9c9f

add tiny dataset(100 image)

37aefdc

add test for pipeline

8370937

add metadata writer

91ddc1c

update test

67c3ff5

fix syntax in test

a2c024c

address concerns for the pipeline

360a0bb

add label map and clean up comments

517a180

update readme

2e28136

davidzats-eng suggested changes Jul 11, 2020

View reviewed changes

add data augmentation, needs debugging evaluator

bc4ba13

googlebot added cla: no and removed cla: yes labels Jul 16, 2020

Deanplayerljx closed this Jul 18, 2020


		from tfx.components.trainer.executor import TrainerFnArgs

		# cifar10 dataset has 50000 train records, and 10000 val records


		return dataset

		def _freeze_model_by_percentage(model: tf.keras.Model,

cifar10 transfer learning example #2029

cifar10 transfer learning example #2029

Conversation

Deanplayerljx commented Jun 22, 2020

1025KB left a comment

Choose a reason for hiding this comment

davidzats-eng commented Jun 23, 2020

zhitaoli commented Jun 23, 2020

davidzats-eng commented Jun 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Deanplayerljx Jun 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Deanplayerljx Jul 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Deanplayerljx Jul 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

googlebot commented Jul 16, 2020

Deanplayerljx commented Jul 16, 2020

googlebot commented Jul 16, 2020

Deanplayerljx commented Jul 18, 2020

Deanplayerljx Jun 25, 2020 •

edited

Loading

Deanplayerljx Jul 6, 2020 •

edited

Loading

Deanplayerljx Jul 13, 2020 •

edited

Loading