Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a way to generate schema cache: db/schema_cache.yml #862

Open
schneems opened this issue Mar 22, 2019 · 23 comments
Open

Find a way to generate schema cache: db/schema_cache.yml #862

schneems opened this issue Mar 22, 2019 · 23 comments

Comments

@schneems
Copy link
Contributor

schneems commented Mar 22, 2019

There's a cool feature called the schema cache: https://kirshatrov.com/2016/12/13/schema-cache/

Here's some problems with using this feature on Heroku.

Problem 1

The officially sanctioned method to run database migrations is via release phase:

release: rake db:migrate

That is because this fires on slug promotion from staging to production and other times when no build is triggered. In short it fires every time your app is released.

The big limitation is that modifications to disk are not preserved in the release phase. So if you run migrations and then generate a schema cache, welp, that schema cache isn't going to do anything because it's saved as a disk modification which is discarded via release phase.

Problem 2 See update at the bottom

Original:

We could generate the schema dump at build time instead of at release time, however what happens if you are deploying with a new migration. What will happen is this:

  • app deploys
    • schema cache is generated
  • release phase fires
    • new schema is updated
  • app boots using the data from the schema cache that is outdated.

Update: It turns out this is not correct. The schema cache falls back to the old behavior if it detects that the versions are not up to date https://github.com/rails/rails/blob/1d2f553d16d8e3ee1dd6622b96ad98a72ea98d2d/activerecord/lib/active_record/railtie.rb#L136-L141.

So in the best case, after deploys that migrate the database you'll not take advantage of the schema cache. In the worst case if you're deploying with a migration on every deploy, you would be no worse off.

@schneems schneems changed the title Find a way to generate schema.yml Find a way to generate schema cache Mar 22, 2019
@schneems
Copy link
Contributor Author

If you want to do this manually instead of automatically you can hook into rake assets:precompile for example:

Rake::Task["assets:precompile"].enhance do
  Rake::Task["db:migrate"].invoke
  Rake::Task["db:schema:cache:dump"].invoke
end

Then you also need to have the db:migrate task in your release phase if you're using pipelines.

The reason this works is it guarantees your schema cache is always updated right after the current migrations have been run. Then even if your slug is promoted from staging to production and your production database is migrated without generating a new schema cache, it should still hold all the values from the last time it was generated since that came from the same migration.

@yoniamir
Copy link

@schneems Thank you for looking into this.
About the suggestion above:

Rake::Task["assets:precompile"].enhance do
  Rake::Task["db:migrate"].invoke
  Rake::Task["db:schema:cache:dump"].invoke
end

Got some questions to make sure I understand:

  1. If db:migrate is executed during precompile instead of release phase, then a bad migration can't prevent the release from becoming live. This guarantees the atomicity of those actions happening at once.
  2. you mentioned this solution is for doing this manually, do you mean this would only work if you run rake assets:precompile on the developers local machine every time before deploy rather than on heroku? if thats the intention then i'm not sure how that requires db:migrate in the release phase. (which we have anyway right now because we're using pipelines)

@yoniamir
Copy link

@schneems about the first suggestion for running the schema cache during the build phase- if you could add that to the buildpack it would a win for every rails-on-heroku developer out there.
(not perfect but certainly better than nothing)

@schneems
Copy link
Contributor Author

If db:migrate is executed during precompile instead of release phase, then a bad migration can't prevent the release from becoming live. This guarantees the atomicity of those actions happening at once.

If you're deploying with Rails, then we force the build to stop if the assets:precompile fails. I believe that if you manually invoke the task and it fails then it will cause the assets:precompile to fail. I would still keep the db:migrate in your release phase though. If it's a duplicate migration then it will run almost instantaneously. If you switch to using pipelines and ever need to depend on release phase firing, then you'll want the db:migrate in your Procfile. Yes, the downside is that the command will technically fire twice on normal builds, however it shouldn't add too much overhead and it will ensure that all environments are consistent.

you mentioned this solution is for doing this manually, do you mean this would only work if you run rake assets:precompile on the developers local machine every time before deploy rather than on heroku? if thats the intention then i'm not sure how that requires db:migrate in the release phase. (which we have anyway right now because we're using pipelines)

By "manually" I meant hooking into assets:precompile at build time. When I say manual, I'm really just indicating, that you'll have to add that logic to your project, rather than being something that comes from Heroku automagically.

if you could add that to the buildpack it would a win for every rails-on-heroku developer out there.
(not perfect but certainly better than nothing)

I am certainly considering it. I wouldn't expect it to come any time soon though. We need a solution that will work for 100% of the cases and until I can figure out the best way to make that happen, explicitly adding this in right after assets:precompile is my officially recommended solution.

@yoniamir
Copy link

Thank you for @schneems the elaborate response.
I think this makes sense to me. One more question if you don't mind:
One of the reasons we moved to using release phases for our deployments was to have the web and worker dynos as close as possible in time frame to the change of the database structure change.
In other words, when the database schema changes in a migration (e.g. a changed column name), it can happen that old dynos that are still running are accepting requests and when they try to communicate with the database, the structure does not match the objects in memory, causing errors.
When the release phase runs the migration and immediately start up the new dynos, that time frame is minimized.
I was wondering, if we now change the migration to run in the assets:precompile phase, it can result in wider time window where those two are out of sync. (we currently do not even use "preboot").
Any thoughts on this?
Thanks again.

@schneems schneems changed the title Find a way to generate schema cache Find a way to generate schema cache: db/schema_cache.yml Mar 26, 2019
schneems added a commit to codetriage/CodeTriage that referenced this issue Mar 26, 2019
@schneems
Copy link
Contributor Author

Related issue rails/rails#35770, you might need to remove the

  ActiveRecord::Base.establish_connection

Call from config/puma.rb to get this to work.

@jeffblake
Copy link

Is it possible to have a deploy specific ENV variable? I'd like to have a simple script like this that disables preboot and invoking db:migrate in the assets:precompilestep for unsafe migrations, such as removing a column.

Rakefile

Rake::Task["assets:precompile"].enhance do
  Rake::Task["db:migrate"].invoke if ENV['UNSAFE_MIGRATION']
  Rake::Task["db:schema:cache:dump"].invoke
end

deploy script

yarn version --patch
heroku features:disable preboot -r production
UNSAFE_MIGRATION=1 git push production master
heroku features:enable preboot -r production

@schneems
Copy link
Contributor Author

Related ticket https://heroku.support/918801

@geoffharcourt
Copy link

@jeffblake removing columns is going to result in exceptions even without preboot if you have any request volume because AR will raise exceptions if the shape of a table has changed when you read or write (even if those columns aren't directly referenced). One way to get around this is by deploying a commit where you add the column to ignored_columns and then deploying your unsafe migration in the following deploy. Then the AR schema cache doesn't consider the column, and dropping it doesn't raise any exceptions.

@dalezak
Copy link

dalezak commented Apr 9, 2021

This article suggests you can run multiple rake commands in the release phase like this:

release: rake db:migrate other:thing whatever
web: bundle exec puma -t 5:5 -p ${PORT:-3000} -e ${RACK_ENV:-development}

So I edited my Procfile to do both db:migrate and db:schema:cache:dump.

web: bin/start-pgbouncer-stunnel bundle exec puma -C config/puma.rb
worker: bundle exec sidekiq -C config/sidekiq.yml
release: bundle exec rails db:migrate db:schema:cache:dump

Will this work?

@geoffharcourt
Copy link

@dalezak by the time the release phase runs, it's too late. The slug has already been compiled. The workaround above only works because it does the schema cache dumping before the slug is packaged.

@dalezak
Copy link

dalezak commented Apr 23, 2021

@geoffharcourt where do you extend assets:precompile block so that it's registered? Inside the /conf/initializers folder? Or should it be a rake task?

Rake::Task["assets:precompile"].enhance do
  Rake::Task["db:migrate"].invoke
  Rake::Task["db:schema:cache:dump"].invoke
end

Will this get called during deployment to Heroku? Or does it manually need to be called locally?

@geoffharcourt
Copy link

geoffharcourt commented Apr 23, 2021

@dalezak we do this in Rakefile, and it happens in relevant deploys:

Rake::Task["assets:precompile"].enhance do
  unless Rails.env.development? ||
      Rails.env.test? ||
      ENV["HEROKU_BRANCH"] ||
      ENV["DISABLE_SCHEMA_CACHE_DUMP"]

    Rake::Task["db:migrate"].invoke
    Rake::Task["db:schema:cache:dump"].invoke
  end
end

@dalezak
Copy link

dalezak commented Apr 23, 2021

Thanks @geoffharcourt, where does this live? Inside a rake file like /lib/task/asset_precompile_enhance.rake?

@geoffharcourt
Copy link

It's in Rakefile at our repository root.

@justin808
Copy link

Hi everybody! I must be missing something.

What's wrong with including db/schema_cache.yml in the git repo, just like we include db/schema.rb?

and having CI also generate the file and failing CI if the generated file is different from the one on Github?

Note, this strategy required us to monkey patch the SchemaCache class so that Hashes and Arrays are always sorted so that YAML output shows as ordered.

Any disadvantage to that?

@yoniamir
Copy link

@justin808 Nothing wrong with it, it simply means that as a developer you would need to run the command to generate the schema cache every time you run rake db:migrate.

@dalezak
Copy link

dalezak commented May 20, 2021

@justin808 I was doing this is the past, but a few times forgot to manually recreate the db/schema_cache.yml which resulted in some performance issues. So I'm now using the technique suggested above.

Rake::Task["assets:precompile"].enhance do
  Rake::Task["db:migrate"].invoke
  Rake::Task["db:schema:cache:dump"].invoke
end

@justin808
Copy link

Related ticket https://heroku.support/918801

@schneems Should that URL still work?

@schneems
Copy link
Contributor Author

@schneems Should that URL still work?

It's for internal use only. It's helpful to link both ways between support tickets so later engineers can see real-world use cases to get a fuller context on the issue.

@agrobbin
Copy link

@schneems can you expand on why enhancing assets:precompile would work? I thought DATABASE_URL (and all other config variables) wouldn't be available in the build stage, which would result in db:schema:cache:dump failing to connect to the database.

@berniechiu
Copy link

Wondering how we could resolve this in 2022 😆, the issue seems to persist ?

@flood4life
Copy link

We recently had to come up with a novel way to implement this, and I'd like to share it.

Add a new rake task that generates the schema file and stores it in Rails Cache (Redis in our case), and run it on release after migrations. Then, modify our Procfile to run another new rake task that reads the schema file from Rails Cache and stores it in the local filesystem, before running the actual Rails process.

The main caveat we're aware of is a possible race condition: if two releases are running at roughly the same time, and the second one has a DB migration, the it's possible for the first one to finish later and overwrite the cache value with an older schema version. However, this would be resolved with the next release anyway, so we chose not to address this at the moment.

Following this approach allows us to keep build and release tasks separate, while benefitting from a fresh ActiveRecord cache most of the time.

@schneems I'm not sure if it doing something similar on the buildpack level might be feasible (perhaps not, because we're modifying the filesystem at process runtime), but as we haven't seen this approach anywhere yet, we thought it'd be useful to share.

Sample code:

Rake tasks

namespace :active_record_schema_cache do
  desc 'Dumps AR schema cache, then writes it to the Rails cache store (Redis)'
  task dump: :environment do
    exit unless ENV.fetch('AR_SCHEMA_CACHING_ENABLED', 'false') == 'true'

    Rake::Task['db:schema:cache:dump'].invoke
    dump_data = File.read('db/schema_cache.yml')
    Rails.cache.write('active_record_schema_cache_data', dump_data)
  rescue => e
    # RedisCacheStore swallows any Redis errors: https://github.com/rails/rails/blob/7-1-stable/activesupport/lib/active_support/cache/redis_cache_store.rb#L478-L483
    # this is here as an additional failsafe to prevent blocking releases and process startups
    Rails.logger.error(message: e)
  end

  desc 'Gets the AR schema cache from the Rails cache store (Redis) and saves it to filesystem'
  task load: :environment do
    exit unless ENV.fetch('AR_SCHEMA_CACHING_ENABLED', 'false') == 'true'

    dump_data = Rails.cache.read('active_record_schema_cache_data')
    exit unless dump_data

    File.write('db/schema_cache.yml', dump_data)
  rescue => e
    # RedisCacheStore swallows any Redis errors: https://github.com/rails/rails/blob/7-1-stable/activesupport/lib/active_support/cache/redis_cache_store.rb#L478-L483
    # this is here as an additional failsafe to prevent blocking releases and process startups
    Rails.logger.error(message: e)
  end
end

Procfile

web: rails active_record_schema_cache:load && bundle exec puma -C config/puma.rb
release: rails db:migrate && rails active_record_schema_cache:dump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants