There's enough for everyone

गते गते पारगते पारसंगते बोधि स्वाहा गते गते पारगते पारसंगते बोधि स्वाहा

Cross DB transfer

This code will transfer data between two different databases. mysql to postgresql is my personal favourite. It’s very loosely based on the sequel command code for copying databases, whence the ordering of: transfer schema; then table data; then recreate indexes and constraints.


And some features that I learned the hard way: * tables are transferred in chunks of 10000 records, each chunk as a transaction. This keeps memory usage pretty much constant across the lifetime of the transfer. * use the Sequel::Dataset#import method, which will use bulk update statements if the underlying DBMS supports that. * use yield for progress indicator

It’s not very fast – will transfer a 5Gb database in a couple of hours. If you have bigger data you probably also have a bigger budget for serious data transfer software ;–)

transfer.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
require 'sequel'
require 'ostruct'

Sequel.extension :migration, :schema_dumper, :pagination

@src_db = Sequel.connect 'mysql2://user:pass@localhost/your_src_dba
@dst_db = Sequel.connect 'postgres://user:pass@localhost/your_dst_db'

def transfer( table_name, options = { max_rows: 10000, dry_run: false } )
  total_records = @src_db[table_name].count
  yield "transferring #{total_records}"
  options = OpenStruct.new( {max_rows: 10000, dry_run: false}.merge( options ) )
  column_names = @src_db.schema(table_name.to_sym).map( &:first )

  @src_db[table_name].each_page(options.max_rows) do |page|
    yield "#{page.sql} of #{total_records}"
    unless options.dry_run
      @dst_db.transaction do
        rows_ary = []
        page.each do |row_hash|
          rows_ary << row_hash.values
        end
        @dst_db[table_name.to_sym].import column_names, rows_ary
      end
    end
  end
end

def same_db?
  @dst_db.database_type == @src_db.database_type
end

def transfer
  yield "dumping schema"
  schema_migration = eval @src_db.dump_schema_migration(:indexes=>false, :same_db=>same_db?)

  yield "dumping indexes"
  index_migration = eval @src_db.dump_indexes_migration(:same_db=>same_db?)

  yield "dumping foreign keys"
  fk_migration = eval @src_db.dump_foreign_key_migration(:same_db=>same_db?)

  yield "creating schema"
  schema_migration.apply @dst_db, :up

  @src_db.tables.each do |table_name|
    transfer( table_name ){|st| yield st}
  end

  yield "creating indexes"
  index_migration.apply @dst_db, :up

  yield "creating foreign keys"
  fk_migration.apply @dst_db, :up

  if @dst_db.database_type == :postgres
    yield "reset primary key sequences"
    @dst_db.tables.each{|t| @dst_db.reset_primary_key_sequence(t)}
    yield "Primary key sequences reset successfully"
  end
end
1
2
3
4
5
6
7
8
9
$ gem install pry pg mysql2 sequel
$ pry
[1] pry(main)> load 'transfer.rb'
=> true
[2] pry(main)> transfer{|s| puts s}
dumping schema
dumping indexes
dumping foreign keys
creating schema

etc

Comments