Find a way to generate schema cache: `db/schema_cache.yml` #862

schneems · 2019-03-22T18:38:50Z

There's a cool feature called the schema cache: https://kirshatrov.com/2016/12/13/schema-cache/

Here's some problems with using this feature on Heroku.

Problem 1

The officially sanctioned method to run database migrations is via release phase:

release: rake db:migrate

That is because this fires on slug promotion from staging to production and other times when no build is triggered. In short it fires every time your app is released.

The big limitation is that modifications to disk are not preserved in the release phase. So if you run migrations and then generate a schema cache, welp, that schema cache isn't going to do anything because it's saved as a disk modification which is discarded via release phase.

Problem 2 See update at the bottom

Original:

We could generate the schema dump at build time instead of at release time, however what happens if you are deploying with a new migration. What will happen is this:

app deploys
- schema cache is generated
release phase fires
- new schema is updated
app boots using the data from the schema cache that is outdated.

Update: It turns out this is not correct. The schema cache falls back to the old behavior if it detects that the versions are not up to date https://github.com/rails/rails/blob/1d2f553d16d8e3ee1dd6622b96ad98a72ea98d2d/activerecord/lib/active_record/railtie.rb#L136-L141.

So in the best case, after deploys that migrate the database you'll not take advantage of the schema cache. In the worst case if you're deploying with a migration on every deploy, you would be no worse off.

The text was updated successfully, but these errors were encountered:

schneems · 2019-03-22T18:45:02Z

If you want to do this manually instead of automatically you can hook into rake assets:precompile for example:

Rake::Task["assets:precompile"].enhance do
  Rake::Task["db:migrate"].invoke
  Rake::Task["db:schema:cache:dump"].invoke
end

Then you also need to have the db:migrate task in your release phase if you're using pipelines.

The reason this works is it guarantees your schema cache is always updated right after the current migrations have been run. Then even if your slug is promoted from staging to production and your production database is migrated without generating a new schema cache, it should still hold all the values from the last time it was generated since that came from the same migration.

yoniamir · 2019-03-22T22:34:55Z

@schneems Thank you for looking into this.
About the suggestion above:

Rake::Task["assets:precompile"].enhance do
  Rake::Task["db:migrate"].invoke
  Rake::Task["db:schema:cache:dump"].invoke
end

Got some questions to make sure I understand:

If db:migrate is executed during precompile instead of release phase, then a bad migration can't prevent the release from becoming live. This guarantees the atomicity of those actions happening at once.
you mentioned this solution is for doing this manually, do you mean this would only work if you run rake assets:precompile on the developers local machine every time before deploy rather than on heroku? if thats the intention then i'm not sure how that requires db:migrate in the release phase. (which we have anyway right now because we're using pipelines)

yoniamir · 2019-03-22T22:38:39Z

@schneems about the first suggestion for running the schema cache during the build phase- if you could add that to the buildpack it would a win for every rails-on-heroku developer out there.
(not perfect but certainly better than nothing)

schneems · 2019-03-25T16:54:27Z

If db:migrate is executed during precompile instead of release phase, then a bad migration can't prevent the release from becoming live. This guarantees the atomicity of those actions happening at once.

If you're deploying with Rails, then we force the build to stop if the assets:precompile fails. I believe that if you manually invoke the task and it fails then it will cause the assets:precompile to fail. I would still keep the db:migrate in your release phase though. If it's a duplicate migration then it will run almost instantaneously. If you switch to using pipelines and ever need to depend on release phase firing, then you'll want the db:migrate in your Procfile. Yes, the downside is that the command will technically fire twice on normal builds, however it shouldn't add too much overhead and it will ensure that all environments are consistent.

you mentioned this solution is for doing this manually, do you mean this would only work if you run rake assets:precompile on the developers local machine every time before deploy rather than on heroku? if thats the intention then i'm not sure how that requires db:migrate in the release phase. (which we have anyway right now because we're using pipelines)

By "manually" I meant hooking into assets:precompile at build time. When I say manual, I'm really just indicating, that you'll have to add that logic to your project, rather than being something that comes from Heroku automagically.

if you could add that to the buildpack it would a win for every rails-on-heroku developer out there.
(not perfect but certainly better than nothing)

I am certainly considering it. I wouldn't expect it to come any time soon though. We need a solution that will work for 100% of the cases and until I can figure out the best way to make that happen, explicitly adding this in right after assets:precompile is my officially recommended solution.

yoniamir · 2019-03-26T00:45:34Z

Thank you for @schneems the elaborate response.
I think this makes sense to me. One more question if you don't mind:
One of the reasons we moved to using release phases for our deployments was to have the web and worker dynos as close as possible in time frame to the change of the database structure change.
In other words, when the database schema changes in a migration (e.g. a changed column name), it can happen that old dynos that are still running are accepting requests and when they try to communicate with the database, the structure does not match the objects in memory, causing errors.
When the release phase runs the migration and immediately start up the new dynos, that time frame is minimized.
I was wondering, if we now change the migration to run in the assets:precompile phase, it can result in wider time window where those two are out of sync. (we currently do not even use "preboot").
Any thoughts on this?
Thanks again.

schneems · 2019-03-27T17:33:35Z

Related issue rails/rails#35770, you might need to remove the

  ActiveRecord::Base.establish_connection

Call from config/puma.rb to get this to work.

jeffblake · 2020-02-19T17:30:24Z

Is it possible to have a deploy specific ENV variable? I'd like to have a simple script like this that disables preboot and invoking db:migrate in the assets:precompilestep for unsafe migrations, such as removing a column.

Rakefile

Rake::Task["assets:precompile"].enhance do
  Rake::Task["db:migrate"].invoke if ENV['UNSAFE_MIGRATION']
  Rake::Task["db:schema:cache:dump"].invoke
end

deploy script

yarn version --patch
heroku features:disable preboot -r production
UNSAFE_MIGRATION=1 git push production master
heroku features:enable preboot -r production

schneems · 2020-09-16T19:59:03Z

Related ticket https://heroku.support/918801

geoffharcourt · 2021-03-12T11:40:59Z

@jeffblake removing columns is going to result in exceptions even without preboot if you have any request volume because AR will raise exceptions if the shape of a table has changed when you read or write (even if those columns aren't directly referenced). One way to get around this is by deploying a commit where you add the column to ignored_columns and then deploying your unsafe migration in the following deploy. Then the AR schema cache doesn't consider the column, and dropping it doesn't raise any exceptions.

dalezak · 2021-04-09T15:17:22Z

This article suggests you can run multiple rake commands in the release phase like this:

release: rake db:migrate other:thing whatever
web: bundle exec puma -t 5:5 -p ${PORT:-3000} -e ${RACK_ENV:-development}

So I edited my Procfile to do both db:migrate and db:schema:cache:dump.

web: bin/start-pgbouncer-stunnel bundle exec puma -C config/puma.rb
worker: bundle exec sidekiq -C config/sidekiq.yml
release: bundle exec rails db:migrate db:schema:cache:dump

Will this work?

geoffharcourt · 2021-04-09T18:39:10Z

@dalezak by the time the release phase runs, it's too late. The slug has already been compiled. The workaround above only works because it does the schema cache dumping before the slug is packaged.

dalezak · 2021-04-23T15:39:20Z

@geoffharcourt where do you extend assets:precompile block so that it's registered? Inside the /conf/initializers folder? Or should it be a rake task?

Rake::Task["assets:precompile"].enhance do
  Rake::Task["db:migrate"].invoke
  Rake::Task["db:schema:cache:dump"].invoke
end

Will this get called during deployment to Heroku? Or does it manually need to be called locally?

geoffharcourt · 2021-04-23T17:18:16Z

@dalezak we do this in Rakefile, and it happens in relevant deploys:

Rake::Task["assets:precompile"].enhance do
  unless Rails.env.development? ||
      Rails.env.test? ||
      ENV["HEROKU_BRANCH"] ||
      ENV["DISABLE_SCHEMA_CACHE_DUMP"]

    Rake::Task["db:migrate"].invoke
    Rake::Task["db:schema:cache:dump"].invoke
  end
end

dalezak · 2021-04-23T17:26:48Z

Thanks @geoffharcourt, where does this live? Inside a rake file like /lib/task/asset_precompile_enhance.rake?

geoffharcourt · 2021-04-23T17:28:08Z

It's in Rakefile at our repository root.

justin808 · 2021-05-19T02:14:03Z

Hi everybody! I must be missing something.

What's wrong with including db/schema_cache.yml in the git repo, just like we include db/schema.rb?

and having CI also generate the file and failing CI if the generated file is different from the one on Github?

Note, this strategy required us to monkey patch the SchemaCache class so that Hashes and Arrays are always sorted so that YAML output shows as ordered.

Any disadvantage to that?

yoniamir · 2021-05-20T02:56:13Z

@justin808 Nothing wrong with it, it simply means that as a developer you would need to run the command to generate the schema cache every time you run rake db:migrate.

dalezak · 2021-05-20T03:41:46Z

@justin808 I was doing this is the past, but a few times forgot to manually recreate the db/schema_cache.yml which resulted in some performance issues. So I'm now using the technique suggested above.

Rake::Task["assets:precompile"].enhance do
  Rake::Task["db:migrate"].invoke
  Rake::Task["db:schema:cache:dump"].invoke
end

justin808 · 2021-05-22T02:03:46Z

Related ticket https://heroku.support/918801

@schneems Should that URL still work?

schneems · 2021-08-16T16:54:25Z

@schneems Should that URL still work?

It's for internal use only. It's helpful to link both ways between support tickets so later engineers can see real-world use cases to get a fuller context on the issue.

agrobbin · 2022-06-23T23:12:39Z

@schneems can you expand on why enhancing assets:precompile would work? I thought DATABASE_URL (and all other config variables) wouldn't be available in the build stage, which would result in db:schema:cache:dump failing to connect to the database.

berniechiu · 2023-02-20T06:56:21Z

Wondering how we could resolve this in 2022 😆, the issue seems to persist ?

flood4life · 2024-01-16T11:34:45Z

We recently had to come up with a novel way to implement this, and I'd like to share it.

Add a new rake task that generates the schema file and stores it in Rails Cache (Redis in our case), and run it on release after migrations. Then, modify our Procfile to run another new rake task that reads the schema file from Rails Cache and stores it in the local filesystem, before running the actual Rails process.

The main caveat we're aware of is a possible race condition: if two releases are running at roughly the same time, and the second one has a DB migration, the it's possible for the first one to finish later and overwrite the cache value with an older schema version. However, this would be resolved with the next release anyway, so we chose not to address this at the moment.

Following this approach allows us to keep build and release tasks separate, while benefitting from a fresh ActiveRecord cache most of the time.

@schneems I'm not sure if it doing something similar on the buildpack level might be feasible (perhaps not, because we're modifying the filesystem at process runtime), but as we haven't seen this approach anywhere yet, we thought it'd be useful to share.

Sample code:

Rake tasks

namespace :active_record_schema_cache do
  desc 'Dumps AR schema cache, then writes it to the Rails cache store (Redis)'
  task dump: :environment do
    exit unless ENV.fetch('AR_SCHEMA_CACHING_ENABLED', 'false') == 'true'

    Rake::Task['db:schema:cache:dump'].invoke
    dump_data = File.read('db/schema_cache.yml')
    Rails.cache.write('active_record_schema_cache_data', dump_data)
  rescue => e
    # RedisCacheStore swallows any Redis errors: https://github.com/rails/rails/blob/7-1-stable/activesupport/lib/active_support/cache/redis_cache_store.rb#L478-L483
    # this is here as an additional failsafe to prevent blocking releases and process startups
    Rails.logger.error(message: e)
  end

  desc 'Gets the AR schema cache from the Rails cache store (Redis) and saves it to filesystem'
  task load: :environment do
    exit unless ENV.fetch('AR_SCHEMA_CACHING_ENABLED', 'false') == 'true'

    dump_data = Rails.cache.read('active_record_schema_cache_data')
    exit unless dump_data

    File.write('db/schema_cache.yml', dump_data)
  rescue => e
    # RedisCacheStore swallows any Redis errors: https://github.com/rails/rails/blob/7-1-stable/activesupport/lib/active_support/cache/redis_cache_store.rb#L478-L483
    # this is here as an additional failsafe to prevent blocking releases and process startups
    Rails.logger.error(message: e)
  end
end

Procfile

web: rails active_record_schema_cache:load && bundle exec puma -C config/puma.rb
release: rails db:migrate && rails active_record_schema_cache:dump

schneems changed the title ~~Find a way to generate schema.yml~~ Find a way to generate schema cache Mar 22, 2019

schneems mentioned this issue Mar 22, 2019

ActiveRecord schema info is not preloaded in production on first request rails/rails#24133

Closed

schneems changed the title ~~Find a way to generate schema cache~~ Find a way to generate schema cache: db/schema_cache.yml Mar 26, 2019

schneems added a commit to codetriage/CodeTriage that referenced this issue Mar 26, 2019

Use a schema cache heroku/heroku-buildpack-ruby#862

9de6915

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find a way to generate schema cache: `db/schema_cache.yml` #862

Find a way to generate schema cache: `db/schema_cache.yml` #862

schneems commented Mar 22, 2019 •

edited

Loading

schneems commented Mar 22, 2019

yoniamir commented Mar 22, 2019

yoniamir commented Mar 22, 2019

schneems commented Mar 25, 2019

yoniamir commented Mar 26, 2019

schneems commented Mar 27, 2019

jeffblake commented Feb 19, 2020

schneems commented Sep 16, 2020

geoffharcourt commented Mar 12, 2021

dalezak commented Apr 9, 2021

geoffharcourt commented Apr 9, 2021

dalezak commented Apr 23, 2021 •

edited

Loading

geoffharcourt commented Apr 23, 2021 •

edited

Loading

dalezak commented Apr 23, 2021

geoffharcourt commented Apr 23, 2021

justin808 commented May 19, 2021

yoniamir commented May 20, 2021

dalezak commented May 20, 2021

justin808 commented May 22, 2021

schneems commented Aug 16, 2021

agrobbin commented Jun 23, 2022

berniechiu commented Feb 20, 2023

flood4life commented Jan 16, 2024

Find a way to generate schema cache: db/schema_cache.yml #862

Find a way to generate schema cache: db/schema_cache.yml #862

Comments

schneems commented Mar 22, 2019 • edited Loading

Problem 1

Problem 2 See update at the bottom

schneems commented Mar 22, 2019

yoniamir commented Mar 22, 2019

yoniamir commented Mar 22, 2019

schneems commented Mar 25, 2019

yoniamir commented Mar 26, 2019

schneems commented Mar 27, 2019

jeffblake commented Feb 19, 2020

schneems commented Sep 16, 2020

geoffharcourt commented Mar 12, 2021

dalezak commented Apr 9, 2021

geoffharcourt commented Apr 9, 2021

dalezak commented Apr 23, 2021 • edited Loading

geoffharcourt commented Apr 23, 2021 • edited Loading

dalezak commented Apr 23, 2021

geoffharcourt commented Apr 23, 2021

justin808 commented May 19, 2021

yoniamir commented May 20, 2021

dalezak commented May 20, 2021

justin808 commented May 22, 2021

schneems commented Aug 16, 2021

agrobbin commented Jun 23, 2022

berniechiu commented Feb 20, 2023

flood4life commented Jan 16, 2024

Find a way to generate schema cache: `db/schema_cache.yml` #862

Find a way to generate schema cache: `db/schema_cache.yml` #862

schneems commented Mar 22, 2019 •

edited

Loading

dalezak commented Apr 23, 2021 •

edited

Loading

geoffharcourt commented Apr 23, 2021 •

edited

Loading