Cache-optimized way to process collisions between all pairs of objects in two arrays

I am working on the hitbox collision system for a game. I have two arrays of convex objects that I need to check for collisions between objects in opposite lists (M hitboxes vs N hurtboxes). I am ...
Pimpl idiom without pointer indirection?

I'm developing a C++ API and I want to hide private implementation details from the public interface. Currently, I'm employing the Pimpl idiom for this purpose. However, I'm also mindful of minimizing ...
Do different ways of initializing a Vec<T> create different memory layout?

fn main() { let vec0 = vec![0; 10]; let mut vec1 = vec![]; for _ in 0..10 { vec1.push(0); } assert_eq!(vec0.len(), vec1.len()); } In this example, vec0 and vec1 are ...
Can a custom allocator improve cache locality for lists?

This is a rather hypothetical question. I only have limited knowledge about how the cpu cache works. I know a cpu loads subsequent bytes into the cache. Since a list uses pointers/indirection into ...
Cache Locality - weight of TLB, Cache Lines, and ...?

From my understanding the constructs which give rise to the high level concept of "cache locality" are the following: Translation Lookaside Buffer (TLB) for virtual memory translation. ...
Why is inserting sorted keys into std::set so much faster than inserting shuffled keys?

I was accidentally surprised to found that inserting sorted keys into std::set is much much faster than inserting shuffled keys. This is somewhat counterintuitive since a red-black tree (I verified ...
Cache misses when accessing an array in nested loop

So I have this question from my professor, and I can not figure out why vector2 is faster and has less cache misses than vector1. Assume that the code below is a valid compilable C code. Vector2: void ...
Determining optimal block size for blocked matrix multiplication

I am trying to implement blocked (tiled) matrix multiplication on a single processor. I have read the literature on why blocking improves memory performance, but I just wanted to ask how to determine ...
Improving cache locality of binary search by doing local linear search?

Binary search of a sorted array may have poor cache locality, due to random access of memory, but linear search is slow for a large array. Is it possible to design a hybrid algorithm? For example, you ...
Cache locality considerations

I have been trying to get better awareness of cache locality. I produced the 2 code snippets to gain better understanding of the cache locality characteristics of both. vector<int> v1(1000, some ...
Detect whether a cache line is reused due to spatial or temporal locality

Is there a practical tool to detect whether a cache line is reused (a cache miss is avoided) due to either spatial or temporal locality? I could not find a related discussion in cachegrind. I was able ...
In Apache Spark, how to make a task to always execute on the same machine?

In its simplest form, RDD is merely a placeholder of chained computations that can be arbitrarily scheduled to be executed on any machine: val src = sc.parallelize(0 to 1000) val rdd = src....
Importance of padding in Dynamic Memory Allocation

I am trying to implement a heap (implicit free list with header/footer) and deciding on whether I should add padding to it. What are the tangible benefits of adding pads? I read that it somehow ...
Understanding data cache locality in mips code

I have been browsing stackoverflow could not really find a example regarding to this one. I understand the concept of Temporal and Spatial locality for data cache: Temporarl locality: address ...
How to benchmark random access with JMH?

I was trying to observe the effects of CPU cache spatial locality by benchmarking sequential/random reads to an array with JMH. Interestingly, the results are almost the same. So I wonder, is this ...
