-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed first/last_indexed, improved import perf, support for mariadb connector #4108
base: dev
Are you sure you want to change the base?
Conversation
Thanks, @damien-git! Since I'm out of office this week, I haven't had time to give this an especially close look yet, but I did run the full integration test suite on this branch in my local test environment using MySQL and everything passed. That's hardly an exhaustive test, but at least there's no catastrophic issue there. I would like to do the same thing for PostgreSQL, but upgrading to Ubuntu 24 in my VM seems to have broken my PostgreSQL install; I'll need to figure that out when time permits so I can do further testing. I'll try to find time next week when I'm back in office, though with things piling up, it's possible it will take me a little longer! |
Just a quick note to say that I'm making some progress on getting my PostgreSQL test environment working. There have been quite a few distractions this week, but I hope to get back to this PR next week, once #4149 is merged. |
@demiankatz Thanks. I will be on vacation from Dec 13 to Jan 3 (included), and then I will be working on FOLIO development again for 6 months. I we can't finish it before I go on vacation, maybe someone else on my team can pick this up in January, or maybe I will find some time to fix issues in January. |
Enjoy your vacation, @damien-git! If I can make more progress on this early next week, I will try! I've just opened another PR (#4151) that will also be necessary to move this forward with PostgreSQL, but hopefully that won't take long to get approved. |
@maccabeelevine and @mtrojan-ub, since we've had conversations related to Java optimization and database logic, I thought you might have interest in this PR. No obligation to look if you're busy or not interested, but this seems like something that would benefit from more eyes! (I'm still waiting to get some PostgreSQL bug fix PRs merged so that I can move forward with testing more scenarios on my end). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again, @damien-git! With all of my PostgreSQL fixes now in place, I was able to run the full test suite here using PostgreSQL, and it seems to be passing, just as MySQL did previously.
It would probably be wise to test this with something a bit more robust than the standard test suite, though I'm not sure exactly what scale would be necessary to constitute an adequate test. Open to suggestions!
In any case, I'm going to approve this but leave it open for a few more days in case others have comments or report different results.
This sounds like a great improvement, thank you very much for implementing it! I can't really judge whether the implementation is entirely correct or not - but i can confirm that first/last indexed fields are usually the fields with the worst performance during the import process, so there is a lot of potential in optimizing this - also i cannot see any obvious bugs when having a first look at the implementation. |
Thanks for taking a look, @mtrojan-ub. Do you want to try this out in your local environment before we merge it, or is that not feasible? If nothing else, I can try doing a larger import in my test environment to see if any problems emerge before merging this. |
VuFind 10 includes a PR that modifies
UpdateDateTracker.java
to avoid a deprecated method. After we started using that in prod, our galera cluster crashed. Disablingfirst
/last_indexed
fixed it. We are guessing galera could not keep up with the frequent statement creations during import.This PR reverts the
UpdateDateTracker
PR. Thefinalize()
code could have been simply removed, becauseDatabaseManager
closes the database connection and hence the statements on shutdown.It also improves performance by sending db updates by batch. This required changing
DatabaseManager
to call the shutdown inUpdateDateTracker
before the connection is closed, otherwise the statements could not have been used reliably to finish sending db data by batch. When thechange_tracker
table is empty and only database inserts are used, import performance was improved by 30% in a test.Sending batches is done a bit differently in mysql and mariadb. We are using mariadb, so I had to add specific support for mariadb, and include the mariadb JDBC connector. In PHP code the mysql driver can be used, but support for the
mariadb
connection string is needed.NOTE 1: this change will require mariadb users to change their config to use
mariadb
as the database driver (or in the connection string). If they don't do it,firt
/last_indexed
import might fail with a syntax error.NOTE 2: this code has only been tested with mariadb so far, but it also changes execution for mysql and postgres.
TODO