Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mcut hangs forever in thread_safe_queue #51

Open
aldoshkind opened this issue Oct 31, 2024 · 0 comments · May be fixed by #52
Open

mcut hangs forever in thread_safe_queue #51

aldoshkind opened this issue Oct 31, 2024 · 0 comments · May be fixed by #52

Comments

@aldoshkind
Copy link

aldoshkind commented Oct 31, 2024

Hi!

mcut hangs forever in thread_safe_queue.

mcut version: master, b5b0ec6.

Behaviour: in rare cases mcut hangs forever with log like this:

[MCUT] Create context 0xdecaf
[MCUT] Create event (type=MC_COMMAND_DISPATCH, handle=0xdecb0)
[MCUT] Launch API thread 140511951611456 (0)

Cause: data_cond condvar is notified in thread_safe_queue::push without taking head_mutex in the moment when wait_for_data::until already checked m_done and list while keeping head_mutex but not started waiting for notification. A subtle circumstance is race between main thread and api_thread_main. If api_thread_main reaches wait_for_data first, main thread is locked on waiting head_mutex in m_queues[index].empty() check and everything works correctly.

How to reproduce stably: insert sleeps into thread_safe_queue::push and thread_safe_queue::wait_for_data like this:

    std::unique_lock<std::mutex> wait_for_data()
    {
        std::unique_lock<std::mutex> head_lock(head_mutex);
        auto until = [&]()
        {
            bool done = m_done->load();
            bool have_items = head.get() != get_tail();
            printf("wait_for_data sleep 3\n");
            sleep(3);
            return done || have_items;
        };
        data_cond.wait(head_lock, until);
        return head_lock;
    }
...
    void push(T new_value)
    {
        std::shared_ptr<T> new_data(std::make_shared<T>(std::move(new_value)));
        printf("push sleep before push\n");
        sleep(1);
        std::unique_ptr<node> p(new node);
        {
            std::lock_guard<std::mutex> tail_lock(tail_mutex);
            tail->data = new_data;
            node* const new_tail = p.get();
            tail->next = std::move(p);
            tail = new_tail;
        }
        printf("push sleep 1\n");
        sleep(1);
        printf("push notify one\n");
        data_cond.notify_one();
    }

and then run HelloWorld. You will see:

# ./HelloWorld 
push sleep before push
wait_for_data sleep 3
push sleep 1
push notify one
<hangs here>

Fix: lock head_mutex before notifying data_cond.

@aldoshkind aldoshkind linked a pull request Oct 31, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant