Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add way to generate DataFrame from active_record with aggregated fields #80

Open
janpeterka opened this issue Mar 29, 2021 · 0 comments

Comments

@janpeterka
Copy link

I stumbled on problem with speed when using daru, and found that a way to speed things up was to do more work in database - namely group and aggregate in ActiveRecord/database instead of on dataframe.

Here is what I wanted to use:

active_record = Provider.left_join(zip: :district).group(:id)

However I found out that I cannot give field like ANY_VALUE(district.id), because it gets converted to symbol infrom_activerecord, and subsequently pluck tries to convert it to table.column.
(At least thats how I understand it works).

So, we found out way to bypass this and I was thinking about adding this to daru, in something like this:

      # Load dataframe from AR::Relation
      #
      # @param relation [ActiveRecord::Relation] A relation to be used to load the contents of dataframe
      # @param with_sql_methods [Boolean] Enables giving fields with SQL methods
      #
      # @return A dataframe containing the data in the given relation
      def from_activerecord(relation, *fields, with_sql_methods: false)
        fields = relation.klass.column_names if fields.empty?

        fields = if with_sql_methods
                   fields.map(&:to_s)
                 else
                   fields.map(&:to_sym)

        result = relation.pluck(*fields).transpose
        Daru::DataFrame.new(result, order: fields).tap(&:update)
      end

Now I can create new DataFrame as

data_frame = Daru::DataFrame.from_activerecord(active_record,
                                              ["ANY_VALUE(district.id)"],
                                              with_sql_methods: true)

What do you think about that?


Originally posted here - SciRuby/daru#523

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant