One way to do it is to represent each user and each event as a bitmap. Essentially, each half hour corresponds to a bit, the entire day corresponding to 6 bytes. 0 means the person is available, and 1—that the person is not available. Same for the events.
Now, all operations become operations on bitmaps (XOR, etc.), which are usually very fast and still relatively space efficient—you can use SIMD, or do the computation in GPU—essentially rely to any techniques that are used when working with graphics. Chances are, you can just load everything in memory at once, do the computation you need, then save the result.
Now, if you decide to go this way, there are two important points:
Do it as an alternative to the naive, most basic implementation. This way, you'll be able to (1) measure the performance of the original implementation to decide if it is acceptable, and if not, (2) compare the performance of both implementations to see how much you saved.
Decide at which point you need to go from the actual structured data to bitmaps. Very likely, you want the data to be stored in the database in a human-readable format: as highlighted by Flater in the comment below, bitmaps are not well fit for things like recurring schedules (and obviously all but human-readable: looking at, say, 0xC4399F2A03, it is not necessarily obvious that the person is available tomorrow from 3 p.m. to 3:30 p.m., but then busy until 5 p.m. This means that you'll need to convert the data back and forth. This process would also take time, that is important to measure as well.
I was curious to see if this approach is viable, so I decided to make a quick test. As I was poking with code, I made a few discoveries that could be useful to share.
But let's start with the actual code. I wanted to start with something small, just to see if it works correctly: fifteen users, five events, and only one day span instead of a week. And I'm doing it in Python—which appeared to be handy for one particular aspect.
import random
import time
half_hours = 1 * 24 * 2 # 48 bits.
assert half_hours % 8 == 0
bitmap_size = int(half_hours / 8) # 6 bytes.
total_users = 15
total_events = 5
To create sample data, there is no need to be particularly realistic: users' schedules are completely random (say, the person is available from 3 a.m. to 3:30 a.m., then busy for half an hour, then available for two hours, then busy... and so on), while meetings are always consecutive blocs that span 30 minutes to 3 hours and start at a random time.
def create_users():
for _ in range(total_users):
yield int.from_bytes(random.randbytes(bitmap_size))
def create_events():
for _ in range(total_events):
duration = random.randint(1, 6) # 30 minutes to 3 hours.
pattern = {
1: 0b1,
2: 0b11,
3: 0b111,
4: 0b1111,
5: 0b11111,
6: 0b111111,
}[duration]
start_time = random.randint(0, half_hours - duration)
yield pattern << start_time
Here's, finally, the thing that takes those users and events, and determines, for each event, how many participants could join.
def b(value):
return bin(value)[2:].rjust(half_hours, '0')
def run():
users_slots = list(create_users())
events_slots = list(create_events())
print("Users:")
for u in users_slots:
print(b(u))
users_per_event = []
for e in events_slots:
count = 0
for u in users_slots:
if u & e == e:
count += 1
users_per_event.append((e, count))
most_popular_counter = 0
for _, count in users_per_event:
if count > most_popular_counter:
most_popular_counter = count
print(f'Luckiest event has {most_popular_counter} potential participant(s).')
print("Events:")
for e, count in users_per_event:
print(f'{b(e)}: {count}')
if __name__ == '__main__':
run()
Here's an example of an output. Naturally, the data being random, each time the script runs, the binary patterns are different, and so would be the counters.
Users:
010101110001001101000011010110110110100001101000
001011111110111111101001010011011001110011111000
111010010101101000110001010011010010010111111101
111001101001100010000000000011001001100101001000
011011110100001011100100101100011110111001000010
010001110010111011001100110000111100101101011010
001111110111100011001110101100101100111110000010
111100111111100101111111101110010001010111101110
000100101000011101100010110101110010100111101111
010100111000001110110010110011111111110000100100
000000100111001011111101001100001001110101010101
111001011010000011010000001111110010100000100101
100110111000110100111011111111000101110000100011
010000111110000001001001101100101011010011010111
010100100101000100110101111101110001111000111011
Luckiest event has 3 potential participant(s).
Events:
000000000000000000000011000000000000000000000000: 3
000000000000000000000000000000000011111000000000: 0
000000000000000011100000000000000000000000000000: 3
000000000000000000000001111100000000000000000000: 2
000000000000000011111000000000000000000000000000: 1
The number at the right of each event indicates how much users can attend this event. It is relatively easy to see how it works visually. Taking, for instance, the third event, one can spot that the second user in the list has all ones matching the ones of the event. So does the fifth user, and another one towards the middle of the list. The algorithm seems to be working.
Now, let's modify the script a bit, by:
- Increasing the span from one day to a week, as well as increasing the number of users and events.
- Removing the binary output.
- Adding the thing that measures how long the code spends doing stuff.
half_hours = 7 * 24 * 2 # 336 bits.
assert half_hours % 8 == 0
bitmap_size = int(half_hours / 8) # 42 bytes.
total_users = 15000
total_events = 5000
[...]
def run():
users_slots = list(create_users())
events_slots = list(create_events())
start = time.time()
users_per_event = []
for e in events_slots:
count = 0
for u in users_slots:
if u & e == e:
count += 1
users_per_event.append((e, count))
end = time.time()
print(f'Spent {end - start:.3f} seconds.')
most_popular_counter = 0
for _, count in users_per_event:
if count > most_popular_counter:
most_popular_counter = count
print(f'Luckiest event has {most_popular_counter} potential participant(s).')
Running it on an Intel i5-10600 3.30 GHz CPU, using a single core, the output is:
Spent 3.466 seconds.
Luckiest event has 7680 potential participant(s).
Observations:
Three and a half seconds (YMMV) is not too bad for a naive approach that don't use SIMD or even parallel computing.
I tried a parallel approach with concurrent.futures.ProcessPoolExecutor, but it led to the time increasing up to seven seconds, because the large list of users is copied. One needs to use shared memory or find another approach, but I didn't have the patience to poke with it long enough.
I'm surprised how easy it is to work with data, actually. Originally, I didn't agree with Hans-Martin Mosner's comment, as I was believing that bitmaps should be very constrained—limited to the actual computations, but should be converted to human readable format as soon as possible. After playing with the Python script, I would rather think that binary approach is quite clear. If enough quality tooling is developed around it, it is definitively a viable approach even considering bitmaps being stored in the database.
The algorithm, too, is at its simplest. I mean, it's literally u & e == e!
I'm cheating a bit, since Python has no maximum limit for integers. This being said, manipulation of bytes in other languages should work similarly.