Greg has set us several tasks this week .. I think I can see through Steve’s plan here. By getting Greg to set all the tasks, our group ire will be aimed elsewhere and Steve can come out of Studio 3, smelling like a rose 😉
This task was aimed at getting us to think (dread the thought) and to expose us at some handy little thread optimization techniques.
The programme, as handed to us by Greg, created the output image in 73.163 seconds (on this laptop). As at this moment, the output image is being created in 7.5 seconds (again on this laptop)
Unfortunately, the notes I was making as I went along have been surrendered to the void and I will have to wing it. I can’t remember how much time I saved from various steps, but I can tell you the main time saver, that cut a huge amount of time and another that cut me back from the 12 second mark down to the 7.5 second mark.
A series of spheres are generated, 53 larger spheres in a spiral in the centre of the screen and 900 lying in an ordered manner on the ground. The spiral spheres are reflective in nature and show all/most of the other spheres on the ground, depending on the relevant positions. There are also two lights on the scene. One is a red light off to the left and the other is a white light off to the right (SPOILERS – important fact for later optimisation).
The core of the program is sending a ray through a screen pixel and seeing what it lands on and what information needs to be returned. This creates a n^2 operation as it is scanning through every object in the scene and then scanning through every object again to determine the shadow values of the lights.
Step one was to go through the code and see where coding optimisations could be made. The first was where the ray would tell which part of the skymap it hit and then check if it actually hit a sphere. I changed this to an if/else loop so that it would only return the sky map if there were not hits on a sphere.
The next step was to create a new shadow check function. The programme was originally using the same loop for the “trace” function and even if it hit something, it would still keep searching for a closer sphere. With the shadows, I knew that once there was an object between the ray hitpoint and the light, it would be in shadow and could return to the parent function. This code enabled me to do that:
for (int i = 0; i < m_renderables.size(); ++i)
// Check the sphere.
//return if there is one hit
I then tried to redesign the way the spheres were created. The Larger spheres were created from the ground up, This meant that for a majority of the scan, it would have to go through 50 or so spheres before it hit something. I changed the way it was created so that it started at the top of the stack and wound down to the ground. I felt this created a better optimised array for searching through.
From memory, these three changes dropped the running time to about 45 – 55 seconds.
The next change was a big deal. I activated omp in my version of Visual Studio and used #include <omp.h> in several of my classes. I then used this command: “#pragma omp parallel for”. Trying to find the right point for it was easy enough. The heaviest part of the main function is when the computer goes through two “for” loops and sends a ray through each pixel. Every other place I used the command was then removed, because this was already optimised through the main function.
This bought my processing speed down to about 12.5 seconds.
After a couple of failed experiments that included changing the order that the spheres were created (addin 2 seconds to my process time) and stuffing around with leaving every 4th pixel blank and then lerping between the left and right pixels to fill it. This showed up horribly when I placed the proper output image in photoshop and then placed the new output image as a layer and set the layer settings to “differences” (part of the brief is to have no differences between the correct output image and our own)
I had to find a way to get the program even more out of the n^2 way it was working the shadow calculations. That is when I hit the idea that the first “trace” determines the hitpoint and which renderable object it hits. I created a new int value in the Hitpoint class and called it m_renderableNumber. Then as the function was scrolling through the renderable objects, trying to find the nearest object to the ray, I would be able to record which object is the nearest and use that as a basis for minimising the number of objects it needs to check against.
Here is the code I used to get the m_renderableNumber:
for (int i = 0; i < m_renderables.size(); ++i)
// Find the nearest intersect point.
if (hp.m_renderable == m_renderables[i])
hp.m_renderableNumber = i;
It wasn’t quite as easy as I just described. I was originally using if(hp.m_hit = m_renderables[i]) and ending up with m_renderables always equaling 953. I tried various other methods of trying to get the renderable number and as often as not ended up with a number something like -8355809. I figured it was an error, because it was always the same number, but still couldn’t find out how I was getting a rubbish number. Google is your friend? Nah.
So now I had the number of the renderables array that I could start my search from.
After several false starts trying to find out how I could determine which light was being checked, I came up with this line of code: “if (ray.end().x < hit.m_position.x)”.
If true, we are searching for the red light to the left, else, it must be the white light to the right.
Next come the coding nightmare that I hated doing, but it meant that best case scenario, it was only checking 2 renderables instead of 953 and worst case, about 100 spheres.
I set up if and if/else statements covering different aspects of where the spheres were placed and there the light was placed for a total of 577 lines of code.
I will try and clean this code up a bit in the mean time, but I really need to try and understand c++ better so that I can use different classes and functions, as I do in c#. C++ terrifies me because I just don’t understand how it works.
The above video shows my optimised raytracing shown against the control version that we were given at the start of the project. It should be noted that the times are off due to having to use Open Broadcast System to get the footage.
Anyway, this bought my processing time down from 12.5 seconds, to 7.5 seconds, almost half the time and is still an exact match in photoshop.
Quite chuffed at the result, but not so much with the amount of code needed.