Cryo PC Workstations for
Software Engineering

Key Benefits

Design and Development

Modern complex software development environments (IDE's), interpreters and compilers make maximum use of available memory and processing resources. The Intel Nehalem chipset and processors can provide over 100GB of memory and up to 12 simultaneous threads of execution. For highly multi-threaded applications and development environments with large in memory data sets and arrays it is the platform of choice. With reduced code-compile-execute-test cycle times the platform also gives more leverage to aggressive Agile, Extreme Programming and Rapid Development methodologies. It supports a potential step change in productivity and code quality. More processing power frees up valuable creative software engineering resource to focus on innovation and the uniquely value creating aspects of the discipline.

The following processor and main memory intensive benchmarks illustrate how the Nehalem platform's memory bandwidth and multiple execution threads improve performance (all based on single CPU only for comparison):

  • Fritzmark - score over 35
  • WPrime - under 4s
  • SiSoft Sandra - Dhrystone ALU iSSE4.2, 161 GIPS, 48 MIPS
  • SiSoft Sandra - Whetstone FPU iSSE3, 144 GFLOPS, 42 MFLOPS
  • SiSoft Sandra - Memory Bandwidth (iSSE2), 37GB/s
  • Linpack - 95 GFlops
  • Cinebench - OpenGL 2314 CB-GFX, CPU Rendering 29776 CB-CPU
  • SPECint_rate_base2006 - 255
  • SPECfp_rate_base2006 - 204

At a minimum this represents a 30-40% improvement over all other platforms and in some cases over 100% performance improvement (see SPEC results table for full results).

Possibly Nehalem's greatest advantage is that with greater miniaturisation (down to 35nm prefabrication in the next generation) In single threaded or with serially characterised workloads the platform helps by offering clock speeds of as much as 5GHz or more. Our experience is that the Nehalem architecture offers improvements even at clock speeds matching that of legacy architectures (i.e. clock for clock its able to perform more work). However it also has significantly greater over performance headroom and with our unique designs and Cryo Boost process we are able to more than double the CPU work rate. Typically customers with this sort of application or program will see a several fold improvement in execution times over previous platforms delivering real immediate cost benefits.

Testing and Debugging

The massive amount of processing, storage and bandwidth available mean that its finally possible to simulate production environments on a single host with many virtual machines and at the same time execute a meaningful workload. With many virtual machines installed on the the single host each one can have its own dedicated core (or virtual core) simulating the equivalent of up to 24 powerful single core servers in a server farm. Our relatively modest workstations are capable of meeting the workload challenge of a small data centre server farm.

This means you can replicate real world task metrics and synthetic loads with a single powerful workstation. You can emulate the concurrency of several thousand users with load generating or injector software. Even the SAS RAID storage arrays can get close to Enterprise level data access and transfer rates. Typically an eight drive solid state storage array can achieve transfer rates close to 2GB/s and with random access times under 0.1ms while storing 2TB of data. In an Enterprise environment a fully managed fibre channel SAN (Storage Area Network) costing several hundred times more would be humbled by this performance and require specialist expensive management tools and resources.

Massively Concurrent / Parallel Processing

It is now possible for all C language based software engineers to harness not only the power of the CPU but also the GPU. While a CPU runs as several GHz clock speeds and offers six or fewer cores the GPU runs at a more modest 600MHz or so clock speed but offers several hundred cores. Reminiscent of the RISC vs CISC processing paradigms the GPU is suited to very specific RISC type workloads where a core stream of simple instructions can execute many complex mathematical or logical operations concurrently. This is what characterises the requirements of graphics and video processing and hence the GPU has been honed to be formidable in this area. The chipset and CPU is still required to be the 'host' of traditional x86 based instructions and provide the work to the GPU's and perform complex instruction sets outside the scope of the GPU.

Hence there is a balance that has to be struck between the performance of the CPU and GPU to ensure the best possible performance is realised. For the right kinds of workloads though the GPU transforms the execution time required to complete the task. Taking a video encode for example (H.264) the introduction of High Definition video (1080p) has hugely increased the workload required to encode each frame increasing the time taken to encode footage. nVidia provide a platform SDK (Software Development Kit) for their GPU's that allows you to package a C based program up for execution on the GPU across many of its cores simultaneously. Adobe added CUDA support to video encoding in CS4 (Creative Suite 4) and reduced a six hour encode task down to 40 minutes (Pinnacle and others are now also adding CUDA support).



  • Improve Productivity
  • Reduce Compile Times
  • Increase Execution Speeds
  • Reduce Code-Test Cycle Times
  • Utilise Virtual Environments for Accurate Testing
  • Component Level Upgradability

 

Recommendations:
High Performance

Cryo Tetrad
  • Intel Core i7 920 D0 2.66GHz processor
  • Cryo Boost to 3.9GHz
  • Four cores, plus HyperThreading
  • X58 Chipset with Quad SLI/Crossfire support
  • Triple Channel DDR3 memory
  • Up to 24GB of PC3-14400 1866MHz memory
  • Up to 16TB RAID0/1/5/10 SATA storage
  • ATI or nVidia GX200 to 240 cores per GPU

 

Ultra Performance

Cryo Extreme or
Cryo Velox (Water Cooled)
  • Intel Core i7 975 Extreme 3.33GHz processor
  • Cryo Boost to 4.4GHz+
  • Four cores, plus HyperThreading
  • X58 Chipset with Quad SLI/Crossfire support
  • Triple Channel DDR3 memory
  • Up to 24GB of PC3-16000 2GHz memory
  • Up to 16TB RAID0/1/5/10 SATA storage
  • ATI or nVidia GX200 to 240 cores per GPU

 

Multiprocessor Workstation

Cryo Octane
  • Dual Intel Xeon W5590 3.46GHz processor
  • Eight cores, plus HyperThreading
  • Nehalem EP and EX 5540 Chipset & Processors
  • Triple Channel DDR3 memory
  • Up to 128GB of PC3-10666 1333MHz memory
  • Up to 16TB RAID0/1/5/10 SAS storage
  • ATI or nVidia GX200 to 240 cores per GPU