For want of distraction, I've been hanging around some of the OpenCL forums of late. Boy do they get some ninny questions.
From people trying to drop plain C (with full stdio and all) into the compiler and expecting it to work, to someone asking if you can 'write functions' in the host language ... (err, I mean seriously. Maybe the guy is a matlab guru but certainly it isn't that hard to find out about C or whatever host language he's using).
But those are just the most extreme in the last couple of days. What is more worrying is just how many people don't seem to understand computer architecture at all - much less a system containing a 'remote' processing device like a GPU.
Really basic things like cache, registers, stack, memory latency, memory banks & contention, I/O bus latency, call invocation overheads, and so on. Not to mention the less-obvious but not really more complex ideas that GPU's bring to the table such as memory coalescing, thread divergence/masking (i.e. SIMT branches), local memory, and so on.
Also, a rough idea of just how fucking fast modern hardware is.
I presume most of the queries are from students but they just seem to have absolutely NFI what is going on 'under the bonnet' on their shiny new bit of hardware. e.g. the reason your code is slower on a GPU should be bleedingly obvious before you went anywhere near a compiler.
Before trying to work with such a complex system, you really need to know some basics of computer system architecture - otherwise none of the api will make any sense, nor will any of the results.
The manuals are good: just read them.
Experiment on your own; it's easy, it's fast, it's cheap (it only takes your time, and a student's time isn't worth anything). You don't learn any-where near as much if you just copy someone else, or forever ask inane questions.
Subscribe to:
Post Comments (Atom)
5 comments:
One of the reasons for students not being aware of "Under the hood" things is that because, most of the students program in Python, Javascript or C# these days (you know that people say Java is dead ;-) )
Learning to implement an app. using Javascript and hosting it in Google AppEngine will give money from users, rather than trying to learn traversing matrixes column-wise instead of row-wise will be good for cache. So people tend to learn new things than under-the-hood things.
I am not trying to defend what they do is right. But just giving the reason which I feel are the cause of current day scenario.
In addition to complaining, if you could have given a few pointers for some good text/blogs on computer architecture, memory coalescing etc. it would have helped some students who follow your blog. My 2 cents :-)
you never know how old they are. I remember quite well how happy I was as someone (who was actually a physics prof) cared to answer my "stupid" question about matrix rotation in 3d space. At this time i didn't even knew what sin or cos are. Every time i get a somewhat "stupid" question i try to remind myself of that.
mbein - it's not so much the really stupid questions - they're a bit entertaining and I can just leave those.
It's the ones where someone asks a question showing they don't really know what they're doing, yet refuse your answer since they know better than anyone else ... or keep asking more questions demonstrating they aren't really interested in the answers but in the solution.
sankar: I do plenty of work dumping some pretty good code and lots of hints and tips for hacking into this blog. I'm sure I can rant once in a while.
But your first paragraph was my point entirely - learning has become so abstracted from the hardware, people have no idea what it's doing. They don't even know how fast it is.
This isn't normally a big deal, but when you're complaining that 'my code is slower than plain c': you should really do a bit of your own research to find out why that might be. Particularly with something as complicated as a separate processor with a novel architecture.
With OpenCL, the abstraction is only weak: although you are not explicitly exposed to the underlying architecture, it is up to you to make the problem fit efficiently. It doesn't do this for you as with other parallel frameworks. This is spelt out quite clearly in the documentation.
There is skads of really excellent documentation which i point to all the time (i.e. the AMD stuff, and the specification itself is well written and readable). There are also plenty of forum posts, blog posts, stuff on stack overflow, vendor examples and code as well: it shows that some people just aren't even trying.
Yes, you do point to a lot of nice links :-) No arguments there.
Post a Comment