added remove indexes before bulk insert and add indexes after #12

aishek · 2014-09-29T19:13:52Z

I've improved bulk insert performance: remove all indexes before first insert, and create after all inserts to avoid index rebuilds on each insert operation.

toy · 2014-12-05T18:09:54Z

Doesn't work for rails < 4.0: travis.

aishek · 2014-12-07T18:37:24Z

Sorry, didn't notice that – fixed now.

toy · 2014-12-08T17:34:27Z

It will be better to use ActiveRecord::ConnectionAdapters::IndexDefinition.members instead of INDEX_FIELDS (for example becasue new rails version can add or change fields).
Why do you use all_indexes.select {|index| index.table == table} if ActiveRecord::Base.connection.indexes(table) should already return indexes for one table.
There are no tests.

More general — I assume that removing indexes during insertion should make whole process faster, but my attempts with/without removing indexes showed no difference using sqlite and postgres or opposite effect using mysql. Could you please create an example (schema and a script to generate data) which will be faster to expand with indexes removed?

toy · 2014-12-24T13:15:52Z

@aishek Did you get a notification about my comment?

aishek · 2014-12-24T13:16:45Z

@toy yeah, I've got it. A lot of work these days, sorry :)

toy · 2014-12-24T13:19:30Z

@aishek No worries :)

aishek · 2015-01-10T12:33:33Z

Let's discuss impact to speed for indexes removing. Here is example for sqlite3:

macbook-pro-2:temp aishek$ time sqlite3 test.sqlite3 < create_index_first.sql
real   0m1.150s
user   0m0.351s
sys    0m0.179s

macbook-pro-2:temp aishek$ time sqlite3 test.sqlite3 < create_index_after.sql
real   0m0.611s
user   0m0.299s
sys    0m0.123s

create_index_first.sql
create_index_after.sql

I suppose high selectivity of age column in my example make sense.

How do you think, should this feature present in dump? If so, I'll improve my pull request using your comments.

toy · 2015-01-10T16:48:38Z

I am a convinced.

Reworked example for multiple databases:

#!/usr/bin/env ruby

require 'benchmark'

table_cmd = 'echo "create table people (age int);"'
index_cmd = 'echo "create index people_age on people(age);"'
drop_cmd = 'echo "drop table people;"'

# Create data.sql inserting `n` rows
n = 100_000
per_insert = 500
values = Array(20..30)
File.open('data.sql', 'w') do |f|
  (n / per_insert).times do
    values_sql = Array.new(per_insert){ "('#{values.sample}')" }.join(',')
    f.puts("insert into people (age) VALUES #{values_sql};")
  end
end
data_cmd = 'cat data.sql'

Benchmark.bmbm do |x|
  {
    :sqlite => 'sqlite3 test.sqlite',
    :mysql => 'mysql5 -D test',
    :postgres => 'psql -q test',
  }.each do |db_name, db_cmd|
    x.report("#{db_name}: data then index") do
      system "(#{[table_cmd, data_cmd, index_cmd, drop_cmd] * ';'}) | #{db_cmd}"
    end

    x.report("#{db_name}: index then data") do
      system "(#{[table_cmd, index_cmd, data_cmd, drop_cmd] * ';'}) | #{db_cmd}"
    end
  end
end

Results:

Rehearsal -------------------------------------------------------------
sqlite: data then index     0.000000   0.000000   0.780000 (  0.878953)
sqlite: index then data     0.000000   0.000000   1.060000 (  1.185402)
mysql: data then index      0.000000   0.000000   0.030000 (  0.232668)
mysql: index then data      0.000000   0.000000   0.030000 (  0.717944)
postgres: data then index   0.000000   0.000000   0.040000 (  0.368946)
postgres: index then data   0.000000   0.000000   0.040000 (  0.575253)
---------------------------------------------------- total: 1.980000sec

                                user     system      total        real
sqlite: data then index     0.000000   0.000000   0.780000 (  0.933262)
sqlite: index then data     0.000000   0.000000   1.090000 (  1.398851)
mysql: data then index      0.000000   0.000000   0.030000 (  0.261416)
mysql: index then data      0.000000   0.000000   0.030000 (  0.731750)
postgres: data then index   0.000000   0.000000   0.040000 (  0.376046)
postgres: index then data   0.000000   0.000000   0.040000 (  0.584572)

aishek · 2015-01-11T19:54:18Z

I am using INDEX_FIELDS because of these approach. I'll little improved naming in last commits: now this array called VALID_INDEX_OPTIONS.

I also removed additional select.

I also added tests, rebased code, fixed pg and i18n gems versions for ruby 1.8.7

toy · 2015-01-16T17:57:38Z

After looking at rails DB adapters code it seems to me that not all index options which are supported by add_index (like internalor algorithm) will be present in instances of IndexDefinition returned by ActiveRecord::Base.connection.indexes(table). So there should be an environment variable to disable removing/adding indexes and an explanation in readme.

Your current approach for adding indexes ignores at least lengths and orders. To check - try to add an index with length to a table in MySQL DB and check how indexes returned by ActiveRecord::Base.connection.indexes(table) differ before and after restoring table.
So maybe it is better to grab all options provided by IndexDefinition and convert those which are arrays to hashes using column names as keys.

The test you've added checks only if with_disabled_indexes is called (also doesn't call the block) and doesn't check what happens to indexes.

Changes to reader.rb in b6e666f create long lines so are in reality creating code-style issues. They are not displayed as a some style checks are yet disabled by .rubocop_toto.yml.

I've already cherry-picked your commits about pg and i18n for version with updated progress gem dependency

aishek · 2015-01-16T18:50:48Z

I'll fix these issues later. Thanks for detailed remark!

…ns unit tests

…pecs

… definition

aishek · 2015-01-21T12:37:01Z

Ok, I fixed all:

added disabled by default environment variable with explanation in README
added mysql lengths support
added with_disabled_indexes specs
fixed codestyle issues

There is two things to discuss.

First, I've extracter assets-related logic from Reader to separate class to fix Rubocop warning about long class definition. Now there is some inconsistency in code: Summary and Assets classes specs should be extracted to separate files, and DB-related code should be extracted from Reader to separate class.

Second, 7966b03#diff-89f6960ada41447910198cd9a526b3dcR73 – ugly send

What do you think?

toy · 2015-02-01T15:02:59Z

Sorry for long wait.

About extracting assets related logic: better not to do it as part of this pull request — just increase Metrics/ClassLength in .rubocop_todo.yml for now. Code of this gem needs cleanup but not as part of adding features.

Entry in readme — there is script/update_readme to add environment variable descriptions from Env class.
Also you should add :rebuild_indexes to :restore_options in variable_names_for_command.

index_options is problematic:
Try to add index like add_index :chickens, [:string_col, :text_col], :length => 10 to spec/db/schema.rb and run REBUILD_INDEXES=1 rspec.
It's better to add such index to schema and add a spec comparing indexes before and after restore to cycle_spec to test on real database.

Also you handle lengths but not orders, and PostgreSQL adapter returns orders. So I still think that instead of using VALID_INDEX_OPTIONS it will be better to go through index.members and if it is a hash — assign it to "singularised" option name and if it's an array — create hash by zipping it with column names and also assign to "singularised" option name. Do you see problems with such approach?

aishek force-pushed the ignore-indexes-on-restore branch from eb114e3 to ce0fd35 Compare December 7, 2014 16:57

aishek added 7 commits January 11, 2015 21:51

added remove indexes before bulk insert and add indexes after

20b1c41

fixed index restore for rails 2.3 and rails 3.2

278a1d0

fixed indexes recreate on restore for ruby 1.8.7

9e18d99

fixed table indexes recreate for empty index name

8ce66f6

removed additional table indexes filtration from with_disabled_indexes

520b499

updated valid index options list for index_options

4e78a0d

added disable indexes while restoring unit tests

5df6df8

aishek force-pushed the ignore-indexes-on-restore branch from 1255e19 to 5df6df8 Compare January 11, 2015 18:53

fixed rubocop codestyle issues

b6e666f

aishek force-pushed the ignore-indexes-on-restore branch from cd89333 to b6e666f Compare January 11, 2015 19:13

fixed gem pg version to 0.17.1 for ruby 1.8.7

6adfbcf

aishek force-pushed the ignore-indexes-on-restore branch from 78318c7 to 6adfbcf Compare January 11, 2015 19:25

fixed gem i18n version to 0.6.11 for ruby 1.8.7

93b6afa

aishek added 5 commits January 19, 2015 14:10

fixed length index option for mysql

20540dd

added with_disabled_indexes, remove_indexes, add_indexes, index_optio…

d9f4493

…ns unit tests

fixed codestyle issues

5956f76

fixed specs for ruby 1.8.7

9aa9162

fixed spec/dump/table_manipulation_spec.rb rubocop issue

670cb1e

aishek added 2 commits January 21, 2015 15:05

added REBUILD_INDEXES restore option with explanation in README and s…

29d5d52

…pecs

extracted assets from Reader to separate class to reduce Reader class…

7966b03

… definition

toy force-pushed the main branch from 9fb0304 to 1e065fa Compare December 27, 2024 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added remove indexes before bulk insert and add indexes after #12

added remove indexes before bulk insert and add indexes after #12

aishek commented Sep 29, 2014

toy commented Dec 5, 2014

aishek commented Dec 7, 2014

toy commented Dec 8, 2014

toy commented Dec 24, 2014

aishek commented Dec 24, 2014

toy commented Dec 24, 2014

aishek commented Jan 10, 2015

toy commented Jan 10, 2015

aishek commented Jan 11, 2015

toy commented Jan 16, 2015

aishek commented Jan 16, 2015

aishek commented Jan 21, 2015

toy commented Feb 1, 2015

added remove indexes before bulk insert and add indexes after #12

Are you sure you want to change the base?

added remove indexes before bulk insert and add indexes after #12

Conversation

aishek commented Sep 29, 2014

toy commented Dec 5, 2014

aishek commented Dec 7, 2014

toy commented Dec 8, 2014

toy commented Dec 24, 2014

aishek commented Dec 24, 2014

toy commented Dec 24, 2014

aishek commented Jan 10, 2015

toy commented Jan 10, 2015

aishek commented Jan 11, 2015

toy commented Jan 16, 2015

aishek commented Jan 16, 2015

aishek commented Jan 21, 2015

toy commented Feb 1, 2015