NVIDIA GF100 Review

The NVIDIA GeForce G100 is essential for watching Blu-ray movies on a PC, accelerating the Microsoft Windows Vista experience, and powering Microsoft DirectX 9 and DirectX 10 games.Code named GF100, the GeForce GTX 400 GPU delivers unparalleled 3D realism, brilliant Microsoft DirectX 11 graphics, and immersive NVIDIA 3D Vision gaming; all with incredible performance. And yes, it’s ok to drool. The Fermi nVidia finally shows us what it is capable of is by its acronym of GF-100 that Legit Reviews has been able to watch a video demonstration of technology in triple SLI, named "Rocket Sled Demo", narrated by Nvidia. It seems that some adjustments are still needed in software, as you will see the end of the video, but the GF-100 is (as Nvidia) returned to mass production and is scheduled for March 2010. This confirms the Fermi output for Q1 2010.

The GF100 just out, the GF104, midrange GPU architecture-based Fermi looming on the horizon at Nvidia.  Scheduled for mid-year, this GPU will be the main competitor couple Cypress LE / XT-PRO Juniper aka Radeon HD 5830/5770/5750. The program would integrate natively on GF104 256 streaming processors with 48 units of filtering and texturing. Memory Type GDDR5 would in turn, interfaced to a memory bus with a maximum width of 256 bits, 192 bits for the low end. Nvidia will not launch nor disclose its future graphics cards today. Although the manufacturer has given and shown few results, but the rather limited relevance. The aim is, as you might expect to unveil details of the architecture associated with 3D rendering to ensure continuous communication on GF100 pending its implementation, but also reassured by passing a few very specific messages.

Some have doubted it because the manufacturer unveiled the first side "compute" its architecture and was heavily promoted to attract a new market, because competition saw an opportunity to attack and because Part of the press has used the controversy to fuel debates without any reflection. Obviously Nvidia does not neglect its core market, whether public or professional, for a bet on the explosion of a new market. It would have been suicidal more than the design of GF100 started while CUDA was still in incubation. However, this does not mean that the final architecture can be more successful in a field different from that which was the priority, but we must keep in mind that it has always been the graphics.

Features Highlights :

  • Bring your PC to life : Offload video processing tasks from your CPU to your GeForce graphics card and enjoy stutter-free, vibrant high-definition video playback with NVIDIA PureVideo HD engine. Easily manage and display crisp and vivid photos, fluidly zoom around 3D-enabled cities in Google Earth, and swiftly manipulate complex 2D Acrobat documents.
  • Realistic gaming for less : Play the latest PC games featuring Microsoft DirectX 9 and 10 without costing you a bundle. Enjoy popular gaming titles the way they were meant to be played.
  • NVIDIA unified architecture :  Fully unified shader core dynamically allocates processing power to geometry, vertex, physics, or pixel shading operations, delivering up to 2x the gaming performance of prior generation GPUs.
  • CUDA Technology : NVIDIA CUDA technology unlocks the power of the GPU’s processing cores to accelerate the most demanding system tasks – such as video encoding – delivering better performance over traditional CPUs. Requires application support for CUDA technology.
  • NVIDIA PureVideo HD Technology : The combination of high-definition video decode acceleration and post-processing that delivers unprecedented picture clarity, smooth video, accurate color, and precise image scaling for movies and video. Feature requires supported video software. Features may vary by product.
  • GeForce Boost : Turbocharges the performance of NVIDIA discrete GPUs when combined with NVIDIA motherboard GPUs.
  • Full DirectX 10 Support : World’s first Microsoft DirectX 10 GPU with full Shader Model 4.0 support delivers unparalleled levels of graphics realism and film-quality effects.
  • GigaThread Technology : Massively multi-threaded architecture supports thousands of independent, simultaneous threads, providing extreme processing efficiency in advanced, next generation shader programs.

Then Nvidia wanted to mark the difference between a new architecture for a simple evolution GF100 cons for the Radeon HD 5000. A communication element boat course, since the concept of new architecture and evolution is purely subjective and it does that represent what a particular manufacturer’s needs at any given time. His previous architecture having stagnated for a long time, Nvidia was late on some elements and therefore needed to jump most important architectural AMD who had been upgraded more regularly. Moreover, all this is cyclical, with each in turn seen as a more advanced architecture. It is also a dangerous argument because new architecture has often been synonymous with low efficiency. We have seen with the R600 and the NV30. Although Nvidia has been lying all this with the G80 but overall it takes time to make the most of a new architecture in both software as its physical implementation.

Nvidia also tries to anticipate here as negative associations with the NV30 (GeForce FX 5800) by rehabilitating as a funnel significant change to the 3D. If in some respects the NV30 has been seducing the professional world, we are not fooled, this is an attempt to avoid negative associations with the current situation. The NV30 was a bad implementation of DirectX 9, which forced Nvidia to lie miserably in trying to conceal his faults as long as possible. More than an important development, the NV30 and its derivatives have a summer brake evolution.

Finally, Nvidia wanted to return to its relationship with developers and the bad image it can sometimes have. Nothing new yet at this level. Course work with Nvidia developers is important and beneficial. However, we are not stupid here either, and while Nvidia denies emphatically ago in some cases, especially when the competition offers better products and there is a significant partnership with a publisher of games, and the regrettable excesses not support MSAA in Batman on the Radeon is one we believe. By cons we understand it can be frustrating for development teams Nvidia to be presented in a bad light while in the vast majority of cases they are only bring beneficial and necessary assistance to developers for the game PC can continue to evolve.

If you follow for years the small world of the GPU, you could probably get an idea, reading between the lines, what the hype continuous (a photo here, a little phrase there) and the elements communication that Nvidia is trying to establish mean: The GF100 is overdue and it is not by pure performance in current games that will start a revolution. The link with the launch of the GeForce FX is evident here, but this does not mean that the result will be the same. The press is now much more careful at the time and it would be suicidal to try to bluff us. In addition, and also in contrast to the time we have direct access to the architects of the chip to answer many technical questions and so we can make a relatively accurate idea of the architecture, its strengths but also weaknesses.

The GF100 :

Remember, the G8x, G9x GT2xx and were based on structures called Texture Processing Clusters for TPCS that included 2 or 3 SMs (streaming multiprocessors) and a group of 8 texturing units (with limitations for the G80). The GT200 has for example 10 tpcs 3 DMs who share 8 texturing units. The TPCS are controlled by a single set of units specialized in the preparation tasks, the setup of triangles, the rasterization etc. The GF100 is in turn composed of 4 large blocks, the GPCs for Graphics Processing Clusters. All specialized units are meeting this time at the GPCs and DMs. The GF100 is the first GPU to be able to treat more than one triangle per cycle! We return to this point. Each CIM includes 4 DMs for a total of 16 in the GPU. Another important change takes place with texturing units that are located more at the main structure but at AW. It is for these reasons that Nvidia had to abandon the name PWC in favor of GPC. Each SM has in the GF100 4 texturing units dedicated to him. Groups of DMs will no longer need to share texturing units which simplifies design and enables more efficient.

Opting for texturing units decoupled (AMD’s RV670 to R520) or semi-decoupled (Nvidia’s G80 or GT200) was on paper an idea that allowed for the elegant architecture evolve easily to a ratio of computing power / power texturing higher, to isolate a fixed function of the programmable core to maximize performance by enabling all units to be used where the GPU needs. In practice it emerged that the efficiency gain was not as good as that and did not compensate the loss of efficiency due to complexity of design. AMD has now reverted back with the Radeon HD 4000 and Nvidia does the same today, demonstrating in passing that architectural evolution can be productive cons.

Each DM is composed of a scheduler that can double in each cycle, send an instruction to 2 of these 5 blocks of execution:

  • 16-way SIMD0 unit (the "cores"): 16 FMA FP32, 16 ADD INT32 16 INT32 MUL
  • 16-way SIMD1 unit (the "cores"): 16 FMA FP32, 16 ADD INT32
  • SFU quad unit: 4 FP32 or 16 special functions interpolations
  • Unit Load / Store 16-way 32-bit

Texturing units :

The latency and throughput of each instruction is different, but everything is decoupled, which means for example that a special function that takes several cycles will not prevent the scheduler to send a statement to another execution block . At one point they can all be at work. Note that we leave aside here the FMA which uses FP64 units and SIMD0 SIMD1 and is not used in graphics rendering. Note that the concept of "core" is made even more complex with the GPC. What structure should receive this name? GPC? SM? Each channel of a SIMD unit? Nvidia obviously prefers the latter option and about 512 "CUDA Cores. From our side we are looking more for the SM. A GF100 would consist of 16 cores.

Frequencies :

Another important difference introduced with respect to the GF100 frequency ranges. Since the G80, there are 3 main areas: the GPU frequency, the frequency of DMs / schedulers and the frequency of calculation units. These last two frequencies are related because the calculation units operate roughly like the units dual pumped Pentium 4 and therefore at a frequency double that of the scheduler, registers etc.. From G80 to GT200, the important frequencies were only those of the GPU and computing units. The texturing units, ROPs, the setup / rasterizer were all in the GPU. With the GF100, it changes. Everything in the GPC operates at a frequency of DMs / schedulers. In other words, in terms of threads, there remains only the ROPs in the area of GPU frequency. Nvidia would not talk to the frequency GF100 but if we rely on the current architecture, this would bring a gain to many units. This also facilitates the synchronization of this little world, which should improve performance and / or simplify the design.

Memory Architecture :

The GF100 has first of many general purpose registers, 32768 SM 32 bits, a structure with a general caches of 16 KB L1 cache per SM (48 KB possibility compute mode only), a texture cache 12 KB per SM (read only) and an L2 cache of 768 KB total latter connects, or rather part of the 6-bit memory controllers 64 which form the 384-bit bus. The L1 caches and texture share the same ports to the L2 cache read. The bandwidth between the L1 and L2 is a total of 384 bytes per cycle, or + / – 270 GB / s with a frequency that will be adopted and, in each direction. If the number of records 32,768 may seem high, it is actually down compared to the number of computing units, which means that the long latencies will be less masked by the number of threads. By cons cache structure will help reduce long latencies and to extend the space around the register L1 to keep more threads in the SMs when the pressure on the general registers is high.

It is difficult to know how the general cache architecture chosen by NVIDIA will be effective. Certainly it is more flexible and will allow new things to do, but it replaces a series of dedicated cache very effective it will be responsible for all the work. If you followed the presentation of the architecture compute Fermi, you probably remember that the L1 cache and shared memory (which allows threads of the same group to communicate between them) are linked. They share 64 KB in this manner: 16 KB to 48 KB and one for the other. In one way or the other, which leaves 2 options. In graphics mode it will always be 16 KB and 48 KB of L1 shared memory, the L1 being really useful when calculations somewhat predictable, which is the opposite of 3D rendering. Direct3D 11 requires the support of a shared memory of 32 KB shared between threads maximum 1024 (1536 threads in a maximum of SM GF100). If the GF100 can go beyond, in practice it is generally recommended to keep at least 2 groups of threads per SM. That means better performance it will generally be satisfied with 768 threads and 24 KB of shared memory or 512 threads and 16 KB of shared memory. The latter option is preferable because it is a common denominator with the architecture of the Radeon HD 5000.

Specs :

We have summarized the specifications of GF100 to compare to other GPUs. We also calculated the maximum flow rates taking into account the frequency of 725 MHz for the GPU, 1400 MHz for calculation units (and 700 MHz for setup engines and texturing units) and 1200 MHz (2400 MHz at data transfer) for GDDR5.     As you can notice the strong point is its speed GF100 triangles while its weakness lies in the flow filtered textures.
As computing power it is still far lower than the Radeon HD 5870. By cons organization units is very different and allows for better efficiency of Nvidia’s side. Still, the organization has changed since the G80 and GT200, it will remain in check at which level it is on the side of GF100. Although it is clear that it is in all cases more important than the side of the Radeon given scalar behavior, we must ensure that the dual scheduler keeps the same efficiency level as the previous GeForce.

Conclusion :

The GF100 is likely to be beaten in games engines to the most simple but will come out better in others. Since the GeForce 8, Nvidia always had an advantage in the power of texture filtering, which permitted him to display a huge performance boost from the launch of its new GPUs. This time, the orientation is clearly towards drivers who will make greater use of computing power and a very detailed geometry through the tessellation. A complete turnaround in the classical opposition between the GeForce and Radeon. The GF100 will require the use of PhysX and DirectX 11 games that exploit the tessellation for display in its best light. Promises, then, that Nvidia has paradoxically regularly criticized AMD in the past. The setup 4 engines, the new memory subsystem, and all changes made to the "compute" the paper are the highlights of this architecture and, in absolute terms, significant advances in the evolution of GPU . More Nvidia unveils the GF100, we have more questions about it and are therefore eager to test all this into practice, including of course the behavior in games of this innovative geometric architecture.

Finally finish with problems related to a late arrival on a new generation of Direct3D. History has shown that the first arrival was always right and the other always wrong. This Will it affect the GF100 and use the benefits of its architecture? Hard to say, but Nvidia counting on its close relationship with many developers to contradict the story in his pockets and a very important asset: Nexus. Presented first as an environment for debugging code CUDA integrated with Visual Studio 2008, it is actually a development tool more comprehensive has everything a developer can dream at the analysis and profiling of 3D rendering. Combined with a GF100 whose advances in computing easier debugging, Nexus could become essential for developers and thus clear the backlog of Nvidia on Direct3D 11.

Notes :

  • Playback of HDCP-protected content requires other HDCP-compatible components.
  • Certain GeForce GPUs ship with hardware support for NVIDIA PhysX technology. NVIDIA PhysX drivers are required to experience in-game GPU PhysX acceleration. Refer to www.nvidia.com/PhysX for more information.
  • NVIDIA SLI certified versions of GeForce PCI Express GPUs only. A GeForce GPU must be paired with an identical GPU, regardless of graphics card manufacturer. SLI requires sufficient system cooling and a compatible power supply. Visit www.slizone.com for more information and a listing of SLI-Certified components.
  • Requires external DisplayPort transmitter. 10-bit per component scanout requires future GeForce driver support.
  • Requires DVI-to-HDMI dongle and SPDIF audio cable from motherboard to graphics card.


Please enter your comment!
Please enter your name here