Archive for December, 2012

Artificial neural networks (ANNs) are used today to learn solutions to parallel processing problems that have proved impossible to solve using conventional algorithms. From cloud-basedvoice-driven apps like Apple’s Siri to realtimeknowledge mining apps like IBM’s Watson to gaming apps like Electronic Arts’ SimCity, ANNs are powering voice-recognitionpattern-classification and function-optimization algorithms perfect for acceleration with Intel hyper-threading technology.

“Artificial neural networks and hyper-threading technologies are ideally suited for each other,” says Chuck Desylva, a support engineer for Intel performance primitives. “By functionally decomposing ANN workloads–dividing them among logical processors and employing additional optimization methods–you can achieve significant performance gains.”

Desylva recently tested several widely available open-source ANN algorithms on a Pentium-4 extreme edition to demonstrate how using its dual threads can achieve significant speed-ups. For the forthcoming massively parallelXeon Phi, Desylva predicts even more significant acceleration of ANN algorithms, since Xeon Phi supports four threads for each of its 50+ cores.

“I think that Xeon Phi will be a perfect fit for ANNs,” Desylva believes.

Biological neurons (upper left) are emulated by artificial neural network (ANN) mapping concepts that sum inputs (upper right) then supply an output (bottom) filtered by an activation function. Source: Intel

Biological neurons (upper left) are emulated by artificial neural network (ANN) mapping concepts that sum inputs (upper right) then supply an output (bottom) filtered by an activation function. Source: Intel

Artificial neural networks (ANNs)  emulate how the brain’s billion of neurons and trillions of synaptic connections divide and conquer tough combinatorial problems involving  detection of features,  perception of objects and  cognitive functions of association, generalization and attention. By implementing multiple layers of virtual parallel processors–each simulating a layer of interconnected neurons like those found in the cerebral cortex–ANNs are capable oflearning the solution to programming problems  impossible to execute in realtime using conventional algorithms.

For instance, ANNs enable voice-recognition systems to instantaneously match your voice against millions of stored samples, in contrast with standard algorithms that would have to serially compare your voice to each sample then calculate the best match,  a task too computationally intensive for realtime execution.

To evaluate how to accelerate ANNs, Desylva adapted for hyper-threading several popular algorithms, such as the back-propagation-of-error (BPE) learning algorithm that sends corrective feedback to previous layers in a multi-layer neural network until the desired real time response is achieved.

 Testing of these neural-learning algorithms was applied to a virtual network 10 million neurons. Performance boosts of over 10 percent were achieved immediately by using the Streaming SIMD Extensions 2 (SSE2) and thread-safeversions of the Microsoft Standard template Library (STL). OpenMP pragmas were then used to direct the compiler to use threading, resulting in a 20 percent overall performance increase compared to the original source. VTunes was then run to show a 3-to-4 times speedup in the commands OpenMP uses to synchronize threads.

Next the same OpenMP-based optimization technique was applied to the update function, which calculates the output of each neural-network layer before passing it to the next, resulting in double the average performance of several different ANN applets.

Finally, dissecting the BPE learning algorithm itself, resulted in as much as a 3.6-times speedup over the original unmodified source.

Posted on October 16, 2012 by R. Colin Johnson, Geeknet Contributing Editor


Read Full Post »