You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently for many datasets a large part of the load is related to primary key, index generation and clustering. Maurizio suggest that if we load that data already sorted by the primary key, then it will greatly reduce the amount of time used on those steps:
CREATE TABLE
Creating CSV for loading
real 4m21.571s
user 4m12.718s
sys 0m8.317s
Copy CSV to DB
real 2m10.332s
user 0m8.894s
sys 0m2.451s
SET
ALTER TABLE
RESET
SET
CLUSTER
ANALYZE
RESET
ALTER TABLE
ALTER TABLE
Post-processing done
INFO: partition constraint for table "scxa_analytics_e_hcad_6" is implied by existing constraints
ALTER TABLE
Partition table loaded for experiment E-HCAD-6 succesfully.
real 66m44.613s
user 4m21.633s
sys 0m11.332s
So, overall time 66m; create CSV 4m, COPY operation 2m. So currently it seems that it is the indexing and clustering operations that are taking the longest, for that example.
@alfonsomunozpomer is this something you could help us? Would you expect the sorting to be faster in javascript or should we just apply some good old unix sort once the file is generated?
The text was updated successfully, but these errors were encountered:
To add to this, psql \copy allows stdin as well as files, which could be useful provided that the machine has enough memory I guess.
Some PK times:
E-GEOD-139324 77 min
E-HCAD-10 36 min
E-HCAD-6 25 min
The easiest to try quickly would be to add the sort after the javascript code; if Jon sorts the matrix before hand, then we could just sort .mtx file only before javascript.
Currently for many datasets a large part of the load is related to primary key, index generation and clustering. Maurizio suggest that if we load that data already sorted by the primary key, then it will greatly reduce the amount of time used on those steps:
So, overall time 66m; create CSV 4m, COPY operation 2m. So currently it seems that it is the indexing and clustering operations that are taking the longest, for that example.
@alfonsomunozpomer is this something you could help us? Would you expect the sorting to be faster in javascript or should we just apply some good old unix sort once the file is generated?
The text was updated successfully, but these errors were encountered: