Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add monitoring for our Solr Cloud cluster #469

Open
4 tasks
ke4 opened this issue Sep 19, 2024 · 2 comments
Open
4 tasks

Add monitoring for our Solr Cloud cluster #469

ke4 opened this issue Sep 19, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request improvement Improve/refactor existing code

Comments

@ke4
Copy link
Contributor

ke4 commented Sep 19, 2024

Currently we need to check our Solr Cloud manually if there are any issues with it or if it is down.
The above cases we have to restart them manually using a Jenkins job.

There is a better way to do it. We have to use a monitoring app that can check some critical parameters or if the servers are down and restart it.
We can investigate Monit to use for all the above.

  • Download and setup/configure Monit in test env's Solr server
  • Download and setup/configure Monit in staging env's Solr server
  • Download and setup/configure Monit in fallback env`s Solr server
  • Download and setup/configure Monit in public env's Solr server
@ke4 ke4 self-assigned this Sep 19, 2024
@ke4 ke4 added enhancement New feature or request improvement Improve/refactor existing code labels Sep 19, 2024
@ke4
Copy link
Contributor Author

ke4 commented Sep 23, 2024

From https://www.webfoobar.com/node/61, but that is only works for 1 Solr nod, not with Solr Cloud.

## Solr monitoring.

## Test the solr service.
check process solr with pidfile /var/solr/solr-8983.pid
  group solr
  start program = "/usr/bin/systemctl start solr"
  stop  program = "/usr/bin/systemctl stop solr"
  restart program  = "/usr/bin/systemctl restart solr"
  if failed port 8983 then restart
  if 3 restarts within 5 cycles then timeout
  depends on solr_bin   
  depends on solr_init
  alert root@localhost only on {timeout}

## Test the process binary.
check file solr_bin with path /opt/solr/bin/solr
  group solr
  if failed checksum then unmonitor
  if failed permission 755 then unmonitor
  if failed uid solr then unmonitor
  if failed gid solr then unmonitor
  alert root@localhost

## Test the init scripts.
check file solr_init with path /etc/init.d/solr
  group solr
  if failed checksum then unmonitor
  if failed permission 744 then unmonitor
  if failed uid root then unmonitor
  if failed gid root then unmonitor
  alert root@localhost

@ke4
Copy link
Contributor Author

ke4 commented Sep 23, 2024

Just add this here if we need to trigger a Jenkins job remotely: https://serverfault.com/questions/888176/how-to-trigger-jenkins-job-via-curl-command-remotely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request improvement Improve/refactor existing code
Projects
None yet
Development

No branches or pull requests

1 participant