<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=unicode">
<meta content="ja" http-equiv="Content-Language">
<title>Auto-Parallelizer</title>
<link href="../design.css" rel="stylesheet" type="text/css">
<style type="text/css">
.auto-style1 {
	border: 2px solid #808080;
}
.auto-style2 {
	background-color: #FFFF00;
}
</style>
</head>
<body>
<h1>自動パラレライザ (Auto-Parallelizer) (VC++2012)
</h1>
<p>VC++2012の心機能の一つである「自動パラレライザ(Auto-Parallelizer)」の動作を確認しました。<br>
結論を先に書くと、高速化されているように見えません。そんなはずはないので追跡評価が必要です。</p>
<p>以下、うまく高速化できた例を示しますが、単純な int 
型による四則演算では高速化の確認をとれませんでした。並列化して遅くなった、というわけでもありませんでしたが。<br>四苦八苦して、結果 float 型で 
sin, cos の演算をやらせたら高速化を確認できた、という感じです。</p>
<h3>評価環境：</h3>
<ul>
	<li>コンパイラ： Visual C++ 2012</li>
	<li>OS： Windows8 64bit 日本語版</li>
	<li>PC : DELL Inspiron14z</li>
</ul>
<table style="WIDTH: 688px; HEIGHT: 510px" border="1" cellspacing="0" cellpadding="0">
  <tr>
    <td style="width: 800px">
      <pre>#include "stdafx.h"
#include &lt;sstream&gt;
#include &lt;cmath&gt;


using namespace std;


void disp_laptime(stringstream &amp;output, DWORD &amp;dwStartCount, DWORD &amp;dwLapCount)
{
    DWORD   dwNowCount;

    dwNowCount = GetTickCount() - dwStartCount;
    output &lt;&lt; dwNowCount &lt;&lt; "(" &lt;&lt; dwNowCount - dwLapCount &lt;&lt; ")" &lt;&lt; endl;
    dwLapCount = dwNowCount;
}


int _tmain(int argc, _TCHAR* argv[])
{
    const size_t    n=10000000;
    static float    a[n], b[n], c[n];

    Sleep(2000);

    DWORD           dwStartCount = GetTickCount();
    DWORD           dwLapTime = 0;
    stringstream    output ;


    // initialize
    for ( int i=0; i&lt; n; ++i){
        a[i] = b[i] = c[i] = 1.0;
    }

    
    disp_laptime( output, dwStartCount, dwLapTime );
#pragma loop(no_vector)
    for (int i = 0; i &lt; n; ++i){
       a[i] += sin(b[i]) * cos(c[i]);
    }

    output &lt;&lt; "no_vector : ";
    disp_laptime( output, dwStartCount, dwLapTime );

    for (int i = 0; i &lt; n; ++i){
       a[i] += sin(b[i]) * cos(c[i]);
    }

    disp_laptime( output, dwStartCount, dwLapTime );

#pragma loop(no_vector)
    for (int i = 0; i &lt; n; ++i){
       a[i] += sin(b[i]) * cos(c[i]);
    }

    output &lt;&lt; "no_vector : ";
    disp_laptime( output, dwStartCount, dwLapTime );

    for (int i = 0; i &lt; n; ++i){
       a[i] += sin(b[i]) * cos(c[i]);
    }

    disp_laptime( output, dwStartCount, dwLapTime );

#pragma loop(hint_parallel(4))
#pragma loop(no_vector)
    for (int i = 0; i &lt; n; ++i){
       a[i] += sin(b[i]) * cos(c[i]);
    }
    
    output &lt;&lt; "loop(hint_parallen(0)), no_vector : ";
    disp_laptime( output, dwStartCount, dwLapTime );
    
#pragma loop(hint_parallel(0))
    for (int i = 0; i &lt; n; ++i){
       a[i] += sin(b[i]) * cos(c[i]);
    }

    output &lt;&lt; "loop(hint_parallel(4)) : ";
    disp_laptime( output, dwStartCount, dwLapTime );

    cout &lt;&lt; output.str() &lt;&lt; endl;

<span class="comment">    /*========*/
    /* 後処理 */
    /*========*/</span>
    {
        string str;

        cout &lt;&lt; "HIT [Enter] KEY !! " ;
        getline(cin, str);
    }

    return EXIT_SUCCESS;
}</pre>
	  </td></tr></table>

<font face="Meiryo UI">
<p>実行結果は以下の通り。<br><img alt="" src="Auto-Parallelizer/img28.gif"><br><br>

２<font face="Meiryo UI">~３分の一、ぐらいの実行時間になっているので、
Auto-Parallelizer は効いているように見えます。i5のCPUで4つの並行動作を指定しているのだから、妥当な値でしょうか。</font><br><br>コンパイルオプションは次の通り。<br>
<table style="width: 80%">
	<tr>
		<td class="auto-style1">cl /Yu"stdafx.h" /GS <span class="auto-style2">/Qpar</span> /GL /analyze- /W3 /Gy /Zc:wchar_t /Zi 
		/Gm- /O2 /sdl /Fd"Release\vc110.pdb" /fp:precise /D "WIN32" /D "NDEBUG" 
		/D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- 
		/Zc:forScope /arch:SSE2 /Gd /Oy- /Oi /MD /Fa"Release\" /EHsc /nologo 
		/Fo"Release\" /Fp"Release\test_AutoVectorizer.pch"&nbsp; /Qvec-report:2 
		<span class="auto-style2">/Qpar-report:2</span></td>
	</tr>
</table>
<br><br>コンパイル時の統合環境出力は以下の通り。<br>
<table style="width: 80%">
	<tr>
		<td class="auto-style1">
		test_autovectorizer.cpp(18) : info C5002: ループはベクター化されません。理由: '1400'<br>
		test_autovectorizer.cpp(23) : info C5001: ループがベクター化されています<br>
		test_autovectorizer.cpp(29) : info C5002: ループはベクター化されません。理由: '1400'<br>
		test_autovectorizer.cpp(34) : info C5001: ループがベクター化されています<br>
		test_autovectorizer.cpp(41) : info C5002: ループはベクター化されません。理由: '1400'<br>
		test_autovectorizer.cpp(47) : info C5001: ループがベクター化されています<br>
		test_autovectorizer.cpp(18) : info C5012: ループは並行化されません。理由: '1008'<br>
		test_autovectorizer.cpp(23) : info C5012: ループは並行化されません。理由: '1008'<br>
		test_autovectorizer.cpp(29) : info C5012: ループは並行化されません。理由: '1008'<br>
		test_autovectorizer.cpp(34) : info C5012: ループは並行化されません。理由: '1008'<br>
		test_autovectorizer.cpp(41) : info C5011: ループが並行化されています<br>
		test_autovectorizer.cpp(47) : info C5011: ループが並行化されています</td>
	</tr>
</table>
</p>
<p>&nbsp;</p>
<p>参考情報：<br>
<a href="http://blogs.microsoft.co.il/blogs/sasha/archive/2012/06/17/visual-studio-2012-c-auto-parallelizer.aspx">
Visual Studio 2012 C++ Auto-Parallelizer</a><br></p>
<hr>
<p>記載： 2012年11月15日 木下英俊<br>2013年7月15日 追記、木下英俊<br></p>
</font>
</body>
</html>