Produce expression2load.csv ordered by PK #30

pcm32 · 2020-03-03T13:10:59Z

Currently for many datasets a large part of the load is related to primary key, index generation and clustering. Maurizio suggest that if we load that data already sorted by the primary key, then it will greatly reduce the amount of time used on those steps:

CREATE TABLE
Creating CSV for loading
real    4m21.571s
user    4m12.718s
sys     0m8.317s
Copy CSV to DB
real    2m10.332s
user    0m8.894s
sys     0m2.451s
SET
ALTER TABLE
RESET
SET
CLUSTER
ANALYZE
RESET
ALTER TABLE
ALTER TABLE
Post-processing done
INFO:  partition constraint for table "scxa_analytics_e_hcad_6" is implied by existing constraints
ALTER TABLE
Partition table loaded for experiment E-HCAD-6 succesfully.
real    66m44.613s
user    4m21.633s
sys     0m11.332s

So, overall time 66m; create CSV 4m, COPY operation 2m. So currently it seems that it is the indexing and clustering operations that are taking the longest, for that example.

@alfonsomunozpomer is this something you could help us? Would you expect the sorting to be faster in javascript or should we just apply some good old unix sort once the file is generated?

The text was updated successfully, but these errors were encountered:

pcm32 · 2020-04-18T11:27:52Z

To add to this, psql \copy allows stdin as well as files, which could be useful provided that the machine has enough memory I guess.

Some PK times:

E-GEOD-139324     77 min
E-HCAD-10              36 min
E-HCAD-6                25 min

The easiest to try quickly would be to add the sort after the javascript code; if Jon sorts the matrix before hand, then we could just sort .mtx file only before javascript.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Produce expression2load.csv ordered by PK #30

Produce expression2load.csv ordered by PK #30

pcm32 commented Mar 3, 2020 •

edited

Loading

pcm32 commented Apr 18, 2020

Produce expression2load.csv ordered by PK #30

Produce expression2load.csv ordered by PK #30

Comments

pcm32 commented Mar 3, 2020 • edited Loading

pcm32 commented Apr 18, 2020

pcm32 commented Mar 3, 2020 •

edited

Loading