27
questions
0
votes
0
answers
39
views
Cache-optimized way to process collisions between all pairs of objects in two arrays
I am working on the hitbox collision system for a game. I have two arrays of convex objects that I need to check for collisions between objects in opposite lists (M hitboxes vs N hurtboxes). I am ...
0
votes
0
answers
131
views
Pimpl idiom without pointer indirection?
I'm developing a C++ API and I want to hide private implementation details from the public interface. Currently, I'm employing the Pimpl idiom for this purpose. However, I'm also mindful of minimizing ...
0
votes
2
answers
89
views
Do different ways of initializing a Vec<T> create different memory layout?
fn main() {
let vec0 = vec![0; 10];
let mut vec1 = vec![];
for _ in 0..10 {
vec1.push(0);
}
assert_eq!(vec0.len(), vec1.len());
}
In this example, vec0 and vec1 are ...
0
votes
2
answers
179
views
Can a custom allocator improve cache locality for lists?
This is a rather hypothetical question.
I only have limited knowledge about how the cpu cache works.
I know a cpu loads subsequent bytes into the cache.
Since a list uses pointers/indirection into ...
-1
votes
2
answers
805
views
Cache Locality - weight of TLB, Cache Lines, and ...?
From my understanding the constructs which give rise to the high level concept of "cache locality" are the following:
Translation Lookaside Buffer (TLB) for virtual memory translation. ...
3
votes
1
answer
84
views
Why is inserting sorted keys into std::set so much faster than inserting shuffled keys?
I was accidentally surprised to found that inserting sorted keys into std::set is much much faster than inserting shuffled keys. This is somewhat counterintuitive since a red-black tree (I verified ...
2
votes
2
answers
1k
views
Cache misses when accessing an array in nested loop
So I have this question from my professor, and I can not figure out why vector2 is faster and has less cache misses than vector1.
Assume that the code below is a valid compilable C code.
Vector2:
void ...
0
votes
1
answer
1k
views
Determining optimal block size for blocked matrix multiplication
I am trying to implement blocked (tiled) matrix multiplication on a single processor. I have read the literature on why blocking improves memory performance, but I just wanted to ask how to determine ...
1
vote
0
answers
350
views
Improving cache locality of binary search by doing local linear search?
Binary search of a sorted array may have poor cache locality, due to random access of memory, but linear search is slow for a large array. Is it possible to design a hybrid algorithm? For example, you ...
0
votes
0
answers
75
views
Cache locality considerations
I have been trying to get better awareness of cache locality. I produced the 2 code snippets to gain better understanding of the cache locality characteristics of both.
vector<int> v1(1000, some ...
1
vote
0
answers
38
views
Detect whether a cache line is reused due to spatial or temporal locality
Is there a practical tool to detect whether a cache line is reused (a cache miss is avoided) due to either spatial or temporal locality?
I could not find a related discussion in cachegrind.
I was able ...
2
votes
0
answers
67
views
In Apache Spark, how to make a task to always execute on the same machine?
In its simplest form, RDD is merely a placeholder of chained computations that can be arbitrarily scheduled to be executed on any machine:
val src = sc.parallelize(0 to 1000)
val rdd = src....
2
votes
2
answers
1k
views
Importance of padding in Dynamic Memory Allocation
I am trying to implement a heap (implicit free list with header/footer) and deciding on whether I should add padding to it. What are the tangible benefits of adding pads? I read that it somehow ...
0
votes
1
answer
541
views
Understanding data cache locality in mips code
I have been browsing stackoverflow could not really find a example regarding to this one. I understand the concept of Temporal and Spatial locality for data cache:
Temporarl locality: address ...
2
votes
0
answers
224
views
How to benchmark random access with JMH?
I was trying to observe the effects of CPU cache spatial locality by benchmarking sequential/random reads to an array with JMH. Interestingly, the results are almost the same.
So I wonder, is this ...