Jump to content

Personal Nvidia Supercomputer $800 1.2TFlops


Recommended Posts

OK i finally got the time to build my personal supercomputer. I ordered the parts and the parts are coming. This computers costs about $800, and runs at about 1.2TeraFlops. and for those of you that don't know what a flop is, a flops is Floating point Operations Per Second. As a comparison a Core 2 Duo quad can run at 60GiggaFlops. You could get the same performance out of 20 Core 2 Duo's out of this one graphics card setup. How does a graphics card get so much performance, you might ask. A graphics card has many parallel processing cores, and the graphics processor is mostly based on logic, and not as a controller. I am about to describe the setup for my personal supercomputer, i know there are better options out there, you could go for more power or you could go for less cost, that is up to you. This setup has 4 graphics cards. I have not bought them yet, so if you don't think that this set up will work please let me know.

My personal Supercomputer has:

# Name/description Cost

1 MSI K9A2 Platinum AM2+/AM2 AMD 790FX ATX AMD Motherboard $103.68

1 AMD Phenom 9600 Agena 2.3GHz 4 x 512KB L2 Cache 2MB L3 Cache 119.99

1 G.SKILL 4GB (2 x 2GB) 240-Pin DDR2 SDRAM DDR2 1066 (PC2 8500) 49.99

4 MSI NX8800GTS-T2D320E-HD-OC GeForce 8800 GTS 320MB 99.99 each

1 SAMSUNG SpinPoint T Series HD320KJ 320GB 7200 RPM 8MB 49.99

1 Sunbeam PSU-HUSH680-US 680W ATX 39.99


1 LITE-ON Black 20X CD-DVD burner 25.99

Again this setup is not the best. I have been writing a Program in CUDA for a stock analytics, hopefully it will make money some day. I have also considered renting it out, who knows. All off these parts came from newegg and I encourage you to build your own. If you have any comments or changes i should know about feel free to post them.


Case : post-8531-1228881579.jpg

Dvd-cd : post-8531-1228881607.jpg

Graphics card : post-8531-1228881612.jpg

Hard drive : post-8531-1228881616.jpg

Memory : post-8531-1228881622.jpg

Motherboard : post-8531-1228881628.jpg

Processor : post-8531-1228881638.jpg

Power Supply : post-8531-1228881644.jpg


Link to comment
Share on other sites

I'm probably going to be building one of these rigs over next summer and I've just have a pretty good idea what for.

As for the setup I'd drop the memory to PC6400, the PC8500 is useless if you not overclocking and as this is going to be used as a compute server overclocking is not something that I would want to do, I'd prefer to have some reliability in my results. Second I'd double the amount of memory. Memory is so cheap now its silly not to, you'll have to be running a 64-bit operating system anyway, or you going to loose more than 1/4 of your system memory with that many GFX cards installed.

I'd probably try and stretch to 9800GTXs instead of 8800GTX even if it meant going for three rather than 4 initially.

Also that PSU seems very cheap, and this system is going to need a lot of clean and stable power, I'd uprate that personally.

Probably spending $1000 gives you are more viable option.

Link to comment
Share on other sites

Yeah, ill probably invest about a $1000 to $1200 in this computer. Ill take your advice on the PSU and the graphics cards. Also i will need the 4 gigs of memory, Cause the problems that i will be solving with this setup will require lots of memory. Thanks.

Link to comment
Share on other sites

Yeah, ill probably invest about a $1000 to $1200 in this computer. Ill take your advice on the PSU and the graphics cards. Also i will need the 4 gigs of memory, Cause the problems that i will be solving with this setup will require lots of memory. Thanks.

Which is why I said double the amount of RAM, go for 8GB :D

Its not worth paying for the extra speed rating though, PC6400 with a CAS 4 will do much better.

Link to comment
Share on other sites

Right now i am planning to use it for a stock market analytic program that i made with some help form some of the professors at the university i go to. The program uses a complex algorithm to tell me which stocks that i should buy and sell, hopefully it will make me some money one day. You can also use it for password cracking or dictionary attacks. It is crazy fast to do a dictionary attack with this setup. All you need to learn is CUDA which is very easy. You have the option to give it better graphics cards and get this computer up to 8Tflops (not for sure that would work). Also you can cluster many of these together on a network and maybe crack GSM, so then you could intercept transmit, receive free cell phone calls, but that is illegal. With this setup there are basically endless possibilities, maybe even an AI system for the EvilServer, or for your own personal robot.

Link to comment
Share on other sites

also remember that that SLI has a huge bandwidth overhead

on paper, sli offers a major improvement but in real life, it yields very little


if you check 3 way sli setups, the performance gain is even smaller, the more cards you stack, the larger the performance overhead becomes

SLI is just too inefficient

if they can put 2 GPU dies on 1 GPU like how they do dual and quad core cpu's there would probably be a doubling of the speed since in dual and quad core cpu's the cores can talk to each other a lot more efficiently than they talk to another card

thats from tomshardware (their new charts UI sucks as it no longer allows you to compare cards in a graphical way and is very cluttered, for newer cards, check their vga charts)

PS most programs are unable to make use of the processing power of the cards

by the time CUDA is used practically, your 4 8800's will be obsolete

don't rush into spending money of future support

it happened in the early 90's and is still going on, people being sucked into future support which wound up being replaced before any of that future support was made use of because the hardware was simply too slow

stick with 1 card, then use the money for the other 3 in order to buy a new high end card each year for 3 more years

Link to comment
Share on other sites


Cuda doesn't use SLI. So your post is pretty irrelevant.

With Cuda you control the cores, so if you want to use multiple gfx cards then you code that into your application, theres no way to treat the 128 cores in two gfx cards as one single 256 gfx card in Cuda, its up to the developer.

I wouldn't go for 8800s personally because firstly they are hard to get hold of in the UK now, and the 9600 GSO seems to perform slightly better even with less cores, but 9800s would probably be the best bang for buck, but possibly only getting 3 instead of 4 to begin with and then upgrading.

Link to comment
Share on other sites

not that much data goes through the SLI bridge, the problem is sharing processing tasks between 2 videocards, they need to be able to talk to each other

look for benchmarks for dual socket motherboards. having 2 physical cpus on a motherboard is slower than having 2 cpu cores on CPU

if you go with a single threaded benchmark, then you up the benchmarking app to 2 threads on the same cpu, the speed pretty much shown a 100% performance boost, but if you add the 2 cores on the second physical CPU, the boost becomes more like a 80-85%

and sli isn't meant to have both cards act as 1 card, it is designed to sync the cards up so both don't render the same thing as that would defeat the purpose of SLI you want each card rendering a different frame so theres less work on each card

believe it or not, is is hard to get a motherboard to efficiently do this


look at SLI benchmarks on different motherboards, you will see that some very high end boards will show speed boosts around 40-50% while some of the more affordable boards will do 15-30%

Link to comment
Share on other sites


Nothings going over the SLI bridge, using multiple cards is the responsibility of the application in CPU land which invokes the Cuda application on the gfx card, which means all the data transfer is over the PCIe, the application running the Cuda apps needs to divide the work and then when received back match it up. I can't see a need for the cores on the gfx card to need to communicate that much if at all.

Dual core of Dual CPU systems are different paradigms, both have advantages and disadvantages, you will never receive a straight up 100% boost from doubling the number of cores or processors, but it scales extremely well. Which is why distributed systems, multi processor systems and multi core systems are becoming the norm.

As for SLI when I last looked the CPU told the GPUs what to render, not have the GPUs talk amongst themselves to work out who was doing what and the SLI bridge was transfer of data between one GPU to the other for output on the montior and that is connected to only one of the cards.

I think you also have to remember that Cuda applications are designed to be heavily distributed, Games (the primary source of benchmarks) aren't, so looking at how Games are scale isn't going to give you a good representation.

Link to comment
Share on other sites

Ok, i did make some changes to my order. I did upgrade to 8 gigs of ram (4 x 2gig) and a 1500watt power supply. Also i did buy some extra fans for my PSC (personal supercomputer) cause on first start up it was starting to run really hot inside. Other than that i did install XP and ClusterKnoppix. I have future plans to cluster more of these together.

Right now im working on a basic algorithm that analises two images or frames from a video stream, and caculates disparity from the images and then refines the disparity maps down to a depth map. After i have a stream of depth maps then i can compile it down to a 3d scene witch then it can further refine using more algorithms down to a simpler and small file with no loss. This will make a good project for a grad thesis/project, because i graduate in may and i plan to continue my education.

I plan to post images and other data as soon as i can get a camera, ive been trying to wait till i can get my web site up and running before i post anything new. If anyone has any suggestions feel free to suggest.

Some links:




Link to comment
Share on other sites

Don't know whether you have ordered your stuff yet, but for the price you were getting the GFX cards for its not worth it, you could spend your money on one of their Quadro cards, these cards are fully integrated with CUDA and are meant for this kind of work. Of course if you are going to game it is not really worth it, but it would be a much wiser investment on your part if you plan on using this computer for high scale computational stuff.

Link to comment
Share on other sites

I have found a better solution, the nvidia GTX 280 gets 933 GFlops per a card, so instead of getting 4 8800GTS's you can get 2 GTX 280's for just $200 more ($324 each and the main setup will cost about $1,400) and you still get about 1.8TFlops. It is better to have two cards rather than to have four cards mainly because you have better motherboard communication. And when you are not using this setup as a supercomputer you can use it as a monster gaming rig.

An yes i know that this isnt the most perfect setup, and quadro has better compatibility with CUDA, but the PNY NVIDIA Quadro FX 5600 costs around $2,500 ( source : TigerDirect ). And i really dont think the performance vs cost is that much better than Geforce graphics cards. If im wrong feel free to correct me, cause i really would like to know. Thanks.

Link to comment
Share on other sites

  • 2 weeks later...

In the new episode of hak5, you are shown how you can use your GPU to crack MD5's. If you want to test your computer to see how fast it can bruteforce MD5's goto www.geeks3d.com/?p=2333. With my current dual GTX 280 setup i can crack about 992 million MD5's per a second with a peak at 1.003 billion cracks per a second. Each GTX 280 runs at about 933.12 GFlops.

The future coming GTX 295 runs at about 1788.48 GFLOPS, that is about the power of 30 Intel quad core core2 processors. So if you have a dual GTX 295 setup, or if you have an i7 system then you can probably use a tri GTX 295 setup, you can get performance up to 3.6 TFlops for a dual setup and about 5.4 Tflops for a tri setup. If you run the MD5 crack test you should get 1.8 billion MD5 cracks for a dual setup and about 2.7 billion MD5 cracks for the tri-set up. The only problem i see is that there isn't enought motherboard speed to convey all of those cracks to system memory, and with a 3Gbps SATA HD you will only be able to write maybe 12 million cracks per a second to the HD. But i do hope in June, when i get my next major paycheck that i can buy a new system (hopefully and i7 system) with dual GTX 295's, i wont be using the system for cracking passwords, i will most likely use it for other solutions.

But there are already products out now that allow you to use the power of your graphics card. Like Adobe CS4 for example. Also for media encoding Have a look at the BadaBoom media converter. It uses NVIDIA CUDA to transcode H264 HD videos in realtime (~30fps) or faster (some sources say 60-70 fps). Also if your interested in robotics or 3D moddleing i suggest you have a look at StereoImaging and disparity maps with CUDA, this will allows you to create depth maps of a scene and will allow for you to gadge distances from objects in real time. There are many more projects out there that harness the power of your CPU. Google is your friend in this citation.

links :





**sorry if this message might be a little inconvient, i had a previous rough day and didnt sleep that night**

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Create New...