StOneDOes Posted March 14 Author Posted March 14 9 minutes ago, Dreikblack said: Maybe some new function can be added to load an asset as not entity to world but pure asset at background in dedicated thread? This is all I'm asking for. I'm not asking Josh to do it all for me; I just want the function that loads the data into LE format into memory and I'll take it from there. Quote
Josh Posted March 19 Posted March 19 On 3/13/2025 at 7:46 PM, StOneDOes said: This is all I'm asking for. I'm not asking Josh to do it all for me; I just want the function that loads the data into LE format into memory and I'll take it from there. If that's what you really wanted you could just read the file contents into memory on another thread, use that thread to create a static buffer, create a buffer stream using that, and load the content from the stream. For DDS files this would be pretty much instantaneous, but I don't think you would see much benefit from it because I don't think the speed of the DDS loader is your bottleneck. If your bottleneck is the number of textures loaded, DDS is already in a game-ready format. All the loader does is read the header, which takes virtually no time, and then read the mipmap data straight into memory, which then gets sent to the GPU as-is. If you are using DDS already, texture loading is as fast as it possibly can be, without or without multiple threads. On 3/13/2025 at 7:18 PM, StOneDOes said: I downloaded a couple of high quality weapon models, each with 60 texture maps each. Without knowing their resolution, this seems really excessive for a single model. You would get a huge stall in performance the moment these got sent to the GPU because it would have to send a (probably?) large amount of data over the PCI bridge. Multithreading would do nothing to prevent this. If you can make these models available for me to test I will try them out and see what the bottleneck is. I have some ideas but I'm not sure. On 3/13/2025 at 7:18 PM, StOneDOes said: Multithreading in itself is not complicated; Multithreaded code is one of the most complicated concepts in computer science. Threaded solutions should always be your very last resort when nothing else will work. On 3/13/2025 at 7:18 PM, StOneDOes said: I can't say too much in that regard, but you're saying that we don't know it will help? Of course we know it will help. If you have a model that takes seconds, not milliseconds to load, and you start queuing these up, its going to take a long time, vs loading each one (into memory) Without knowing what the bottleneck is multithreading could even make the process slower. I would guess that the read speed for loading ten files on different threads is probably slower than loading them in sequence. Definitely with an HDD, with an SSD it would probably be about the same as sequential loading at best. If you have a specific file you would like me to test I will take a look at it. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without.
Josh Posted March 19 Posted March 19 For reference, I just loaded a 175,000 poly model in 392 milliseconds, so it is difficult to even find content to test. Quote My job is to make tools you love, with the features you want, and performance you can't live without.
Josh Posted March 20 Posted March 20 There are three steps to loading any asset. Loading from the disk into memory. This could be done with C commands like fread and alloc, if you really wanted, but when you start using ReadFile() there are complications. Libzip cannot be called from different threads randomly, so loading from zip packages will no longer be possible. The FreeImage library cannot be called from different threads at the same time. So let's say we implement an ASyncLoadTexture() command that only supports DDS, and only loads from the OS file system, does that actually solve anything? There are still other things to consider... Processing. PNG files involve a lot of processing. The file data has to be decompressed, and then the image has to be downsampled many times to create the mipmaps. DDS files load a lot faster because no processing is required. Model files require more processing because they involve insertion of new entities into the system, but it's impossible to know whether that is an issue without testing a real file... Sending. Once the data is in memory, textures and mesh data still has to be sent to the GPU. If a lot of data hits the GPU all at once it will cause the renderer to pause. So there is no point to trying to optimize other things if this still causes an issue. How much is too much? It depends on the files. Step 1 can be totally eliminated from the main loop. Step 2 can be mitigated, in some circumstances, by storing ready-to-use data when possible. Nothing can be done about step 3 except maybe to feed data in a little bit at a time over more frames. It's really impossible to tell what bottlenecks will form without testing real files in an expected use case scenario. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without.
Josh Posted March 20 Posted March 20 My guess is in something like Battlefield 4, they probably have one or more textures allocated for the weapon, and when weapons are switched instead of creating a new texture, they send one mipmap at a time of the new texture data to fill up the old texture's pixels, in order to replace it. Or maybe they have a set of several textures where the memory is pre-allocated and they switch between them as needed. Same thing with mesh data. When you get into this kind of thing it's no longer a nice neat simple LoadModel() command, it becomes a more complicated system that has to be designed for exactly what you are doing. Can you guarantee all your weapon textures will be the same size and pixel format? What is the maximum number of vertices all your weapons will use? These are the kinds of restrictions you have to be able to live with to do something like that. Code has to be written for the exact specifications of the content you plan to use. Quote My job is to make tools you love, with the features you want, and performance you can't live without.
StOneDOes Posted March 21 Author Posted March 21 Quote "If that's what you really wanted you could just read the file contents into memory on another thread, use that thread to create a static buffer, create a buffer stream using that, and load the content from the stream." Not sure if you have misunderstood me - I'm suggesting that there be a Leadwerks function to do this in a thread safe manner. It shouldn't have to do more than just load model geometry and materials into memory, and then I can manage its lifetime and copy it in the world when I want to. I think that's a pretty good place to meet halfway. Quote "If your bottleneck is the number of textures loaded, DDS is already in a game-ready format. " I am not speaking of any specific task or process being a bottleneck; I'm simply pointing out that having to load every single thing on the main thread is inherently flawed for the following reasons: -Your algorithm for loading your models could be the most efficient algorithm ever written, but its not going to matter if we have a large number of them. (Not necessarily saying that I do or will, but somebody who uses your engine will!). -From the user perspective, the game has essentially crashed or "frozen" during loading time. If they click the window, the OS will report that the application is not responding. Showing some graphical change on screen whilst loading would be nice also. You mentioned this could be done, but its not a solution to this particular part of the problem. And this is my biggest concern on this topic. Quote "Without knowing their resolution, this seems really excessive for a single model. You would get a huge stall in performance" Sure, your probably right - but I'm not really focused on the performance of gameplay right now, nor am I going to use those models. They were just a good test subject and an eye opener for me. Quote "Multithreaded code is one of the most complicated concepts in computer science." I'm not sure I agree on this. At least not when all your doing is offloading a task that has a very simple synchronisation condition. Quote "Without knowing what the bottleneck is multithreading could even make the process slower." As mentioned previously, there is no actual bottleneck per se, rather just the sheer volume of items to be loaded. Quote "I would guess that the read speed for loading ten files on different threads is probably slower than loading them in sequence. Definitely with an HDD, with an SSD it would probably be about the same as sequential loading at best." Well, your engine requirements state that an SSD is needed. SSDs have the capability for multi-threaded I/O. While having additional threads does not guarantee an exact linear improvement (eg. 2 threads != double speed), it does provide a significant improvement. Certainly doesn't make it slower. I've written and attached a basic test to demonstrate how much of a significant improvement you can obtain by using multiple threads when reading from disk (SSD). Not specifically related to the engine; just reads file contents and stores them. You can change the number of threads in ThreadPool.cpp, and see as the thread count increases the time taken to complete shortens. Or you can turn off multithreading altogether in Main.cpp. Quote "For reference, I just loaded a 175,000 poly model in 392 milliseconds, so it is difficult to even find content to test." What if you have 10 of these (unique) models? Are we really going to wait 13 seconds for them? Quote "Libzip cannot be called from different threads randomly, so loading from zip packages will no longer be possible." So I guess it really will be impossible to make any improvement then, sadly? (Without keeping game assets protected?) Or even on just a single background thread? EDIT: Link in case of attachment deleted: https://filebin.net/rwamm41me0h4c0ic/tests.zip tests.zip 1 Quote
Josh Posted March 21 Posted March 21 If a map takes a long time to load, the easiest solution is to use a callback to periodically call PeekEvent() / WaitEvent(). That solves the timeout problem. 2 Quote My job is to make tools you love, with the features you want, and performance you can't live without.
Josh Posted March 21 Posted March 21 There is a zip file attached in the post above but it looks like the attachment was deleted? Quote My job is to make tools you love, with the features you want, and performance you can't live without.
Josh Posted March 21 Posted March 21 As for asynchronous model loading, it may not work the way you are expecting. If a lot of data has to be sent to the GPU in one frame, there will be a noticable delay. It depends on how much data you are trying to load in a single frame. Quote My job is to make tools you love, with the features you want, and performance you can't live without.
StOneDOes Posted March 21 Author Posted March 21 I'll try again with the file and provide a link: https://filebin.net/rwamm41me0h4c0ic/tests.zip tests.zip Quote
Josh Posted March 21 Posted March 21 What is this for? It's a VS project with some threading code? Quote My job is to make tools you love, with the features you want, and performance you can't live without.
StOneDOes Posted March 21 Author Posted March 21 As I mentioned in my previous post it is an example that demonstrates how loading files in a multithreaded manner is significantly faster than loading them sequentially, not slower. Quote I've written and attached a basic test to demonstrate how much of a significant improvement you can obtain by using multiple threads when reading from disk (SSD). Not specifically related to the engine; just reads file contents and stores them. Quote
Josh Posted March 21 Posted March 21 I don't know, I just tried it with ten 150 MB files, and it runs at 0-1 milliseconds either way. It doesn't really matter because loading a stream into memory asynchronously is the least of your worries. With a fair amount of work we can eliminate the time it takes to load a file from the disk into memory, but there are other bottlenecks you will hit. I am extremely reluctant to spend time implementing something like this when it is just going to hit the next bottleneck, and then everyone will complain because they didn't expect that to happen. When you call LoadTexture(), for instance, you aren't getting back a finished ready to use texture. You're getting an object in system memory that contains a list of instructions and some data buffers that get put into a queue to be sent to the GPU. It's not really a loaded texture until that data makes it into VRAM. Creating object on the GPU usually forces a synchronization with the GPU. The rendering thread will stall and wait for the GPU to completely finish what it's doing, then it returns the handle to the newly created resource. If you are trying to dynamically upload content to the GPU, you usually need to do this by allocating it ahead of time and then feeding it in a little bit at a time each frame. If you have a specific model you plan on using I am willing to run some tests, and at least I can tell you how much of a delay it incurs, You can test this out yourself pretty easily, just by calling CreateTexture() in the main loop, or loading a texture from a bufferstream, which is a stream already loaded into memory. Quote My job is to make tools you love, with the features you want, and performance you can't live without.
StOneDOes Posted March 21 Author Posted March 21 Quote I don't know, I just tried it with ten 150 MB files, and it runs at 0-1 milliseconds either way. It doesn't really matter because loading a stream into memory asynchronously is the least of your worries. Let's not try and obfuscate the truth here. I've run this on 4.5gb and 8 threads vs 1 thread I have 500ms vs 3300ms. And then when its actually loading model geometry where you are doing all sorts of other operations to load it into the LE format its going to be an even greater number. And now that we have established that it is indeed significantly faster, its no longer relevant? I went out of my way to write this up for you because you claimed that it would be slower. I'm not sure if you are getting this, but I'm trying to help. If you don't want to do anything about this that's your call, and its fine. I can't make you. But so much of what I have tried to convey to you in this thread seems to have been ignored along with the entire premise, and it is quite frustrating. So I'm probably going to just leave it there, and leave the ball in your court. Quote
Josh Posted March 21 Posted March 21 Just now, StOneDOes said: Let's not try and obfuscate the truth here. I've run this on 4.5gb and 8 threads vs 1 thread I have 500ms vs 3300ms. Let's say for the sake of discussion that this is always true and carries over to practical usage. 1 minute ago, StOneDOes said: And then when its actually loading model geometry where you are doing all sorts of other operations to load it into the LE format its going to be an even greater number. And now that we have established that it is indeed significantly faster, its no longer relevant? I went out of my way to write this up for you because you claimed that it would be slower. I'm not sure if you are getting this, but I'm trying to help. Entity creation can't realistically be multhreaded because it would require so many mutex locks and would create a longlasting continuous source of very hard to track down bugs. That itself might not be a problem, it just depends on how many limbs the loaded models contain. You can create entities right now in the main thread with no observable penalty, so it just depends on the file you are loading. I think texture and mesh upload speed to the GPU is going to be the main problem here. How big are the models you plan on loading? How many textures do they use, and what resolution and format are the textures? 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without.
Josh Posted March 21 Posted March 21 I am not just making this up. I have a map I'm working with right now that takes 1.2 seconds to load, but it doesn't show up onscreen for about nine seconds. It's heavy on big textures but pretty light on geometry, so I think the bottleneck there is either texture allocation or upload speed. In that situation, I could entirely eliminate the load time, but it still would cause the GPU to stall for about 7 seconds. It's not very hard to run some tests and find out what we can expect to see with such a system, but I need to know the specifics of the models you hope to load this way. Quote My job is to make tools you love, with the features you want, and performance you can't live without.
Josh Posted March 22 Posted March 22 I was curious what would happen in that specific situation, so I disabled the sending of all texture data, but left the texture allocation in. The map now appears onscreen in less than three seconds. So I can pretty well confirm about 4-5 seconds is spent just uploading new texture data to the GPU, in this situation. 1 Quote My job is to make tools you love, with the features you want, and performance you can't live without.
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.