Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACCESS VIOLATION - Writing to nullptr #32

Closed
ratzrattillo opened this issue Jan 25, 2023 · 8 comments
Closed

ACCESS VIOLATION - Writing to nullptr #32

ratzrattillo opened this issue Jan 25, 2023 · 8 comments

Comments

@ratzrattillo
Copy link

ratzrattillo commented Jan 25, 2023

While working with libzmq, which currently embeds wepoll 1.5.8, i had several crashes due to access violations.
These seem to stem from a special behaviour caused by GetQueuedCompletionStatusEx().

This function sometimes returns success (TRUE), containing a positive completion_count (e.g. 1), but still contain an iocp_events[0].lpOverlapped structure with a NULL value. The function is called in line 1223:
https://github.com/piscisaureus/wepoll/blob/dist/wepoll.c#L1223

Later on, wepoll tries to write to this memory location in line 1842:
https://github.com/piscisaureus/wepoll/blob/dist/wepoll.c#L1842

Attached you can find some screenshots of my debug session:

screenshot1

screenshot2

A fix would be highly appreciated :)

@notgull
Copy link

notgull commented Jan 25, 2023

Is this behavior documented in GetCompletionPacketEx? Unless external packets are sent via the IOCP handle this shouldn't be an issue, if my memory of the Windows I/O API serves me right.

Otherwise, #20 may fix this.

@ratzrattillo
Copy link
Author

The following Remark i found in: https://learn.microsoft.com/en-us/windows/win32/fileio/getqueuedcompletionstatusex-func#remarks

This function returns TRUE when at least one pending I/O is completed, but it is possible that one or more I/O operations failed. Note that it is up to the user of this function to check the list of returned entries in the lpCompletionPortEntries parameter to determine which of them correspond to any possible failed I/O operations by looking at the status contained in the lpOverlapped member in each OVERLAPPED_ENTRY.

It is not clearly stating that lpOverlapped can be NULL, when more than 0 completions have been dequed, however it is the behavior i experienced.

Is there a plan to merge #20 ?

@piscisaureus
Copy link
Owner

It's hard to believe a NULL lpOverlapped would just randomly show up on the completion port.

Are you using the async_io crate by any chance? Because they use a forked version of wepoll that deliberately people posts NULL packets to the completion port.

@notgull
Copy link

notgull commented Feb 12, 2023

libzmq is written in C++ and doesn't use async-io. In addition, there are no crates that depend on async-io that have reported this bug as far as I know.

Are there any places in libzmq where an IOCP packet is manually posted to the wepoll port?

@piscisaureus
Copy link
Owner

piscisaureus commented Feb 12, 2023

libzmq is written in C++ and doesn't use async-io.

The OP is reporting they are using Rust, so they could be using async-io.

In addition, there are no crates that depend on async-io that have reported this bug as far as I know.

Crates that depend on async-io depend on https://github.com/aclysma/wepoll-ffi which is a patched wepoll that supports this, so I wouldn't expect them to run into this issue in isolation. It only becomes a problem when they use async-io and then link with a "stock" wepoll.

Other than that I can't think of any other way this could ever happen. The latest version of wepoll has been out for more than 2.5 years, and it is used enough in production that if this were a wepoll bug, it would have come up a lot earlier. So there must be something particular to the OP's setup that causes this.

@piscisaureus
Copy link
Owner

In fact, looking at the stack trace, I'm pretty convinced this is the issue now:
image

@notgull
Copy link

notgull commented Feb 12, 2023

Thanks for the catch @piscisaureus! I've opened smol-rs/polling#85 and aclysma/wepoll-ffi#1 to address this

@notgull
Copy link

notgull commented Mar 8, 2023

As of version v2.6.0, polling now no longer uses wepoll. This issue can probably be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants