Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning: try balance lighting throughput to all tikvs(consider all peers) #54976

Open
wants to merge 3 commits into
base: release-6.5-20231120-v6.5.4
Choose a base branch
from

Conversation

guoshouyan
Copy link
Contributor

@guoshouyan guoshouyan commented Jul 26, 2024

previously PR: #54615
in previous PR, it only consider leader region store id when trying to balance throughput to all tikv. In this PR, I try to consider all peers

  1. first have the struct RangeAndRegion to store both the range and its regions information
  2. have a map regionProcessingMap := sync.Map{} to stores all store id that are currently importing. when we assign a range to a worker, we increase the count. When the range finishes, we decrease the count.
  3. in pickRanges, it will find the best range to import. It iterates through all ranges and calculate a score: sum(number of current region importing). And we always try to schedule the one with lowest score.

perform testing: importing about 10T data with 7 lightning on a cluster with 30 tikv nodes and store limit of 30MB

  1. without pre split(base line): time taken 130 mins, throughput about 666 MB/s per lightning
    Screenshot 2024-07-26 at 1 36 55 PM
    Screenshot 2024-07-26 at 1 46 21 PM

  2. with presplit: time taken more than 3 hours, killed. throughput below 300 MB/s
    Screenshot 2024-07-26 at 1 44 10 PM
    Screenshot 2024-07-26 at 1 49 18 PM

  3. with presplit and balance across leader region store id: time taken 160 mins, throughput 512 MB/s
    Screenshot 2024-07-26 at 1 38 10 PM
    Screenshot 2024-07-26 at 1 47 35 PM

  4. with presplit and balance across all peer store id: time take 100 mins, throughput around 800~1000 MB/s
    Screenshot 2024-07-26 at 1 40 17 PM
    Screenshot 2024-07-26 at 1 45 18 PM

What problem does this PR solve?

Issue Number: ref #56113

Problem Summary:

What changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Jul 26, 2024
@sre-bot
Copy link
Contributor

sre-bot commented Jul 26, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ guoshouyan
❌ shouyan.guo


shouyan.guo seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link

ti-chi-bot bot commented Jul 26, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bb7133 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 26, 2024
Copy link

ti-chi-bot bot commented Jul 26, 2024

Hi @guoshouyan. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

tiprow bot commented Jul 26, 2024

Hi @guoshouyan. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@lance6716
Copy link
Contributor

/ok-to-test

@ti-chi-bot ti-chi-bot bot added ok-to-test Indicates a PR is ready to be tested. and removed needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Jul 27, 2024
br/pkg/lightning/backend/local/local.go Outdated Show resolved Hide resolved
br/pkg/lightning/backend/local/local.go Outdated Show resolved Hide resolved
br/pkg/lightning/backend/local/local.go Outdated Show resolved Hide resolved
br/pkg/lightning/backend/local/local.go Outdated Show resolved Hide resolved
br/pkg/lightning/backend/local/local.go Outdated Show resolved Hide resolved
rr := pickRanges(rangeAndRegions, &regionProcessingMap, finished)
startKey := rr.r.start
endKey := rr.r.end
processingRegions := rr.regions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing processingRegions to writeAndIngestByRange in line 1722, to skip the inner scan region API in line 1470?

Copy link

ti-chi-bot bot commented Oct 21, 2024

@guoshouyan: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/check_dev b862c36 link true /test check-dev
idc-jenkins-ci-tidb/check_dev_2 b862c36 link true /test check-dev2
idc-jenkins-ci-tidb/mysql-test b862c36 link true /test mysql-test
idc-jenkins-ci-tidb/unit-test b862c36 link true /test unit-test
idc-jenkins-ci-tidb/build b862c36 link true /test build

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants