Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use grid-stride looping for kernels with variable-length loops #3130

Merged
merged 6 commits into from
May 29, 2024

Conversation

ManasviGoyal
Copy link
Collaborator

No description provided.

@ManasviGoyal ManasviGoyal added the gpu Concerns the GPU implementation (backend = "cuda') label May 27, 2024
@ManasviGoyal
Copy link
Collaborator Author

@jpivarski Since awkward_ListArray_getitem_next_range kernel is needed for use, I think this can be merged if the everything works fine. I can make a new PR to work on other kernels.

@ManasviGoyal ManasviGoyal marked this pull request as ready for review May 29, 2024 18:07
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, I haven't been able to test it because of thousands of

E   cupy._util.PerformanceWarning: Jitify is performing a one-time only warm-up to populate the persistent cache, this may take a few seconds and will be improved in a future release...

warnings. #3113 was supposed to fix this, but it apparently isn't. I'll have to follow up again later.

@ManasviGoyal
Copy link
Collaborator Author

@jpivarski This seems to be an issue with the more recent versions of cupy. Maybe testing on a older version 12.x.x can be done so that we don't have to wait for fix.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CuPy 13 added a persistent cache to reduce the time spent compiling after the first compilation. In fact, the message is only presented the first time it encounters a kernel, so that must be what's going on. The messages might be helpful to end-users, but our test framework is set to consider warnings as errors. I added a filter to ignore these warnings when running the tests.

I think this PR is ready to be merged! I made an edit, but not to your code, so I don't think you'll have any counter-edits. I'll merge it as soon as the (CPU) tests pass.

Of course, I ran all of the GPU tests, and they all pass.

@jpivarski jpivarski enabled auto-merge (squash) May 29, 2024 20:19
@jpivarski jpivarski merged commit 0b9f6f4 into main May 29, 2024
41 checks passed
@jpivarski jpivarski deleted the ManasviGoyal/improve-variable-length-loop-kernels branch May 29, 2024 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu Concerns the GPU implementation (backend = "cuda')
2 participants