Tried compiling....SGT.o error

Message boards : Number crunching : Tried compiling....SGT.o error

To post messages, you must log in.

AuthorMessage
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 596 - Posted: 7 Oct 2008, 22:23:26 UTC

Hi All,

Tried compiling from the source using the -march=amdfam10 flag. Got the following output.....

./compile enigma.c
./compile charmap.c
./compile cipher.c
./compile ciphertext.c
./compile date.c
./compile dict.c
./compile display.c
./compile error.c
./compile hillclimb.c
hillclimb.c: In function ‘handle_signal’:
hillclimb.c:56: warning: unused parameter ‘signum’
hillclimb.c: In function ‘hillclimb’:
hillclimb.c:85: warning: ‘ch.u1’ may be used uninitialized in this function
hillclimb.c:85: warning: ‘ch.u2’ may be used uninitialized in this function
hillclimb.c:85: warning: ‘ch.s1’ may be used uninitialized in this function
hillclimb.c:85: warning: ‘ch.s2’ may be used uninitialized in this function
./compile ic.c
ic.c: In function ‘ic_noring’:
ic.c:13: warning: unused parameter ‘to’
./compile input.c
./compile key.c
./compile result.c
./compile resume_in.c
./compile resume_out.c
./compile scan_int.c
./compile score.c
./compile stecker.c
./load enigma charmap.o cipher.o ciphertext.o date.o dict.o \
display.o error.o hillclimb.o ic.o input.o key.o result.o \
resume_in.o resume_out.o scan_int.o score.o stecker.o -lm
gmake: *** No rule to make target `SGT.c', needed by `tools/SGT.o'. Stop.

What's a mechanical engineer doing wrong here? ;-) I'll try -march=K8 in the interim, but I don't think that will help. I tried gmake and make, same result.

Mike Doerner
ID: 596 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 597 - Posted: 8 Oct 2008, 15:41:47 UTC
Last modified: 8 Oct 2008, 15:41:59 UTC

Also, running OpenSUSE 11.0, running gcc 4.3.1 (included with distro).

Mike Doerner
ID: 597 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 267,994,998
RAC: 0
Message 598 - Posted: 8 Oct 2008, 19:19:58 UTC - in response to Message 597.  

Do you need the SGT tool ? If not, then open Makefile and remove SGT from there.



M4 Project homepage
M4 Project wiki
ID: 598 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 599 - Posted: 8 Oct 2008, 19:49:22 UTC

That's an excellent question. What is the SGT tool, and do I need it to run the compile? :-)

Mike Doerner
ID: 599 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 267,994,998
RAC: 0
Message 600 - Posted: 8 Oct 2008, 20:44:20 UTC - in response to Message 599.  

That's an excellent question. What is the SGT tool, and do I need it to run the compile? :-)

Mike Doerner



* Simple Good-Turing Frequency Estimator
*
*
* Geoffrey Sampson, with help from Miles Dennis
*
* School of Cognitive and Computing Sciences
* University of Sussex, England
*
* http://www.grs.u-net.com/
*
*
* First release: 27 June 1995
* Revised release: 24 July 2000
*
* * Takes a set of (frequency, frequency-of-frequency) pairs, and
* applies the "Simple Good-Turing" technique for estimating
* the probabilities corresponding to the observed frequencies,
* and P.0, the joint probability of all unobserved species.
* Simple Good-Turing Frequency Estimator
*
*
* Geoffrey Sampson, with help from Miles Dennis
*
* School of Cognitive and Computing Sciences
* University of Sussex, England
*
* http://www.grs.u-net.com/
*
*
* First release: 27 June 1995
* Revised release: 24 July 2000
*
* * Takes a set of (frequency, frequency-of-frequency) pairs, and
* applies the "Simple Good-Turing" technique for estimating
* the probabilities corresponding to the observed frequencies,
* and P.0, the joint probability of all unobserved species.
* requirements for using the SGT estimator are met.
*
* The output is a series of lines each containing an integer followed
* by a probability (a real number between zero and one), separated by a
* tab. In the first line, the integer is 0 and the real number is the
* estimate for P.0. In subsequent lines, the integers are the
* successive observed frequencies, and the reals are the estimated
* probabilities corresponding to those frequencies.
*
* The revised release cures a bug to which Martin Jansche of Ohio
* State University kindly drew attention. No warranty is given
* as to absence of further bugs.
*
*
*/


It isn't needed to compile/run standalone or BOINC version of enigma.


M4 Project homepage
M4 Project wiki
ID: 600 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 601 - Posted: 8 Oct 2008, 23:32:46 UTC

OK, so if I don't need SGT, how do I yank it from the makefile? Do I just # the line, or just remove the SGT from that line?

PS As a mechanical engineer, I haven't coded anything other than some simple fortran programs from the early 1990's. Then I was introduced to spreadsheets and the rest is history......;-) Thanks for your help so far, and hopefully I can get this thing compiled ASAP.

Mike Doerner
ID: 601 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 267,994,998
RAC: 0
Message 602 - Posted: 8 Oct 2008, 23:38:05 UTC - in response to Message 601.  

Take a look at the screenshot few posts up, just remove the SGT.
M4 Project homepage
M4 Project wiki
ID: 602 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 603 - Posted: 9 Oct 2008, 1:47:55 UTC

Thanks. I will try that tonight.

Mike Doerner
ID: 603 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 604 - Posted: 9 Oct 2008, 3:03:05 UTC

Okie-Dokie!

I placed my enigma app compiled with the -march=amdfam10 flag w/ gcc 4.3.1 and here's a before/after shot of what's going on....

Before - awgly100_0_1998795_r0 3,747.89 seconds of cpu time (default app)
After - awgly100_0_2001488_r0 2,765.39 seconds of cpu time (optimized phenom app)

So a reduction in cpu time of 35.5% or so?

Now I did drop my cpu voltage a bit so I'm not generating as much heat, but this is still at 2.6 Ghz (not overclocking....yet) so we should be comparing apples to apples here.

How does this compare to some of the other optimized apps that have been generated?

Mike Doerner
ID: 604 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 605 - Posted: 9 Oct 2008, 15:24:05 UTC
Last modified: 9 Oct 2008, 15:30:54 UTC

Well this is annoying. My other computer is a laptop w/ C2D T7500 2.2Ghz.

With the C2D optimized app in Windows.....

awgly100_0_1760349_r0 2,542.52 Secs CPU time. (220 seconds less than my 2.6 Ghz Phenom)

Is this just an Intel architecture issue (i.e. better at integer math?), or are there further optimizations I need to use in the app I compiled?

Mike Doerner

PS At least my Phenom beats my old Athlon XP+ 2000....;-) awgly100_0_1910365_r0 6,788.98 seconds.
ID: 605 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 267,994,998
RAC: 0
Message 606 - Posted: 9 Oct 2008, 16:41:24 UTC - in response to Message 605.  

Well this is annoying. My other computer is a laptop w/ C2D T7500 2.2Ghz.

With the C2D optimized app in Windows.....


Linux app compiled with Intel C/C++ compiler is even faster, but it doesn't work on AMD processors.



Is this just an Intel architecture issue (i.e. better at integer math?), or are there further optimizations I need to use in the app I compiled?


Intels are known to perform better in this project, however you could try some more compiler options. The fastest Athlon 64/Athlon 64 x2 executable I've seen was build with gcc options:
-O2 -finline-functions -funroll-loops -ffast-math -mtune=athlon64 -march=x86-64 -fomit-frame-pointer -fschedule-insns
You could try these options on your Phenom, just change the -march and -mtune.
For some reasons older version of gcc produces faster enigma executable, I've had best results with gcc v3.2.
One more thing - inlining functions can make the executable a bit faster, I did some benchmarks on my Phenom server, after I've changed all scoring functions inside score.c to inline I gained around 3-4% speed.

Edited score.c:

http://tjm.boo.pl/enigma/app/score.tgz


M4 Project homepage
M4 Project wiki
ID: 606 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 607 - Posted: 9 Oct 2008, 18:37:46 UTC
Last modified: 9 Oct 2008, 18:49:01 UTC

Hiya TJM,

Thanks for your help so far. I fear I may become a programmer after this is all said and done....;-)

I have also located some documentation from AMD regarding their Generic performance switches. This is what they recommend for gcc 4.2 (and later, i would assume)....

-O3 -ffast-match -funroll-all-loops -fpeel-loops (I realize the floating point optimizations may not do anything for the Phenom in this instance)

I notice you are still using -O2, is -O3 counter-productive? Or just your experience that it is faster than -O3?

Also, AMD apparently has a Core Math Library numerical routines available to help AMD's chips in this area. Is there any benefit in your opinion to trying these libraries? They seem to be FORTRAN based optimizations with C interfaces. Or do you think it will simply increase the size of the final executable? I do not believe it requires a re-write of any code, only to add additional compilation flags when making the executable.

Mike Doerner


PS My bad, it looks like the C language has to call the ACML intentionally, while FORTRAN just substitutes standard routines. Oh well.
ID: 607 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 267,994,998
RAC: 0
Message 608 - Posted: 9 Oct 2008, 18:57:55 UTC - in response to Message 607.  



I notice you are still using -O2, is -O3 counter-productive? Or just your experience that it is faster than -O3?



That depends on other flags used. Make two executables and then compare the results between -O2 and -O3, you'll see that sometimes -O2 is faster. There's a simple benchmark script for linux which helps to check app performance: http://www.enigmaathome.net/forum_thread.php?id=17&nowrap=true#321 Runtime is around 5 minutes on 2,2-2,5GHz clocked AMDs.


M4 Project homepage
M4 Project wiki
ID: 608 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 609 - Posted: 9 Oct 2008, 23:49:56 UTC

Hiya TJM,

I tried d/loading eb.tgz, but it looks like the archive is corrupt....


mdoerner@Linux-Quadzilla:~/Xfers> tar zxfvp eb.tgz
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Error exit delayed from previous errors
mdoerner@Linux-Quadzilla:~/Xfers>


Mike Doerner
ID: 609 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 267,994,998
RAC: 0
Message 610 - Posted: 9 Oct 2008, 23:57:15 UTC - in response to Message 609.  

Strange, it works for me. But I've repacked it anyway, here's the new link : http://tjm.boo.pl/enigma/eb.tgz

M4 Project homepage
M4 Project wiki
ID: 610 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 611 - Posted: 10 Oct 2008, 1:57:06 UTC

Got it, thanks.

Mike Doerner
ID: 611 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 612 - Posted: 10 Oct 2008, 2:47:05 UTC - in response to Message 608.  
Last modified: 10 Oct 2008, 2:51:20 UTC


Runtime is around 5 minutes on 2,2-2,5GHz clocked AMDs.


OK here's what I got with CFLAGS listed below.

mdoerner@Linux-Quadzilla:~/Xfers/enigma_benchmark> ./start
2008-10-09 22:31:05 enigma: working on range ...
2008-10-09 22:34:42 enigma: finished range

real 3m37.584s
user 3m36.582s
sys 0m0.008s
mdoerner@Linux-Quadzilla:~/Xfers/enigma_benchmark> echo $CFLAGS
-O3 -finline-functions -funroll-loops -ffast-math -mtune=amdfam10 -march=amdfam10 -fomit-frame-pointer -fschedule-insns
mdoerner@Linux-Quadzilla:~/Xfers/enigma_benchmark>

My previous app had the following CFLAGS....

-march=amdfam10 -O2 -msse3 -pipe

And the output was.....

mdoerner@Linux-Quadzilla:~/Xfers/enigma_benchmark> ./start
2008-10-09 22:39:35 enigma: working on range ...
2008-10-09 22:43:11 enigma: finished range

real 3m35.900s
user 3m34.701s
sys 0m0.008s
mdoerner@Linux-Quadzilla:~/Xfers/enigma_benchmark>

So less flags got me more speed.....sheesh. I wonder what -O1 will get me. Oh well. Time for bed....

PS I'm compiling in 64-bit mode, is this part of the problem? Is 32-bit faster?

Mike Doerner
ID: 612 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 613 - Posted: 10 Oct 2008, 13:35:03 UTC - in response to Message 605.  
Last modified: 10 Oct 2008, 13:41:16 UTC

Well this is annoying. My other computer is a laptop w/ C2D T7500 2.2Ghz.

With the C2D optimized app in Windows.....

awgly100_0_1760349_r0 2,542.52 Secs CPU time. (220 seconds less than my 2.6 Ghz Phenom)


This one must have been a fluke, all the other awgly100_0's I've seen from this processor are in the 2730-2750 second range. This makes me feel somewhat better (I thought I really had a fubar'ed AMD processor there for a while).

However, it seems that a C2D Windows 32-bit mode optimized app (@ 2.2 Ghz) = AMD Phenom 64-bit Linux optimized app (@ 2.6 Ghz).

Or that the Intel processor 18% more efficient clock-for-clock in 32-bit protected mode compared to the AMD processor in 64-bit long mode.

Now that I have a better idea on what the performance difference is (the 2.6 Ghz Phenom needs to complete an awgly100_0 in about 2310 seconds or so to equal the Intel C2D processor clock-for-clock) I'm going to dig into the ACML a bit more and see if it is an "easy code change" to incorporate the acml.h into the engima code (not being a programmer, this seems unlikely though.) All testing will be done on the benchmark code you've published (no need to screw up any hard work done so far on the server ;-) ) and see if it makes things faster or not. (get the completion time from 212s to around 180s for the benchmark)

Also, the AMD documentation has compilation flags for the Intel compiler. Maybe this could speed things up even though it's an AMD processor?

Mike Doerner
ID: 613 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 614 - Posted: 10 Oct 2008, 14:12:48 UTC
Last modified: 10 Oct 2008, 14:13:39 UTC

I talk a lot, don't I? ;-)

Here's the start of enigma.c

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <unistd.h>
#include <limits.h>
#include "charmap.h"
#include "cipher.h"
#include "ciphertext.h"
#include "dict.h"
#include "display.h"
#include "error.h"
#include "global.h"
#include "hillclimb.h"
#include "ic.h"
#include "input.h"
#include "key.h"
#include "result.h"
#include "resume_in.h"
#include "resume_out.h"
#include "scan.h"


Would I just insert a #include <acml.h> after #include <limits.h> ?

Here's a quote from the ACML documentation.....

2.5 ACML FORTRAN and C interfaces
All routines in ACML come with both FORTRAN and C interfaces. The FORTRAN
interfaces typically follow the relevant standard (e.g. LAPACK, BLAS). Here we document
how a C programmer should call ACML routines.
In C code that uses ACML routines, be sure to include the header file <acml.h>, which
contains function prototypes for all ACML C interfaces. The header file also contains C
prototypes for FORTRAN interfaces, thus the C programmer could call the FORTRAN
interfaces from C, though there is little reason to do so.
C interfaces to ACML routines differ from FORTRAN interfaces in the following major
respects:
• The FORTRAN interface names are appended by an underscore (except for the Windows
32-bit and 64-bit Microsoft C/Intel Fortran version of ACML, where FORTRAN
interface names are distinguished from C by being upper case rather than lower case -
this is the default for the Intel Fortran compiler)
• The C interfaces contain no workspace arguments; all workspace memory is allocated
internally.
• Scalar input arguments are passed by value in C interfaces. FORTRAN interfaces pass
all arguments (except for character string length arguments that are normally hidden
from FORTRAN programmers) by reference.
• Most arguments that are passed as character string pointers to FORTRAN interfaces
are passed by value as single characters to C interfaces. The character string length
arguments of FORTRAN interfaces are not required in the C interfaces.
• Unlike FORTRAN, C has no native complex data type. ACML C routines which
operate on complex data use the types complex and doublecomplex defined in <acml.h>
for single and double precision computations respectively. Some of the programs in the
ACML examples directory (see Section 2.9 [Examples], page 17) make use of these
types.


Or do I need to shove additional code in beyond the header statement? (If this even helps?) Also, the ACML includes routines for single core and multi-core processors. I would use the single core version even with the phenom as BOINC relegates each task to a single core, correct?

My apologies if this is taking up too much of your time, I'm just trying to get my Phenom a little more "competitive" with our Intel competition. :-)

Mike Doerner
ID: 614 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 267,994,998
RAC: 0
Message 616 - Posted: 11 Oct 2008, 23:05:51 UTC - in response to Message 613.  
Last modified: 11 Oct 2008, 23:06:50 UTC



Also, the AMD documentation has compilation flags for the Intel compiler. Maybe this could speed things up even though it's an AMD processor?



I've tried building ICC-optimized executable for AMD processors, the best compilation I got was 3~4% slower than gcc build on my test AMD machine (A64 2,2GHz), so I gave up. I haven't tried on Phenom, so feel free to try and tell me about the results :-D


Would I just insert a #include after #include ?


That's fine as long as acml headers are in place where compiler can find them.

Or do I need to shove additional code in beyond the header statement? (If this even helps?)

I think you'll have to edit the source to call acml's functions instead of default ones.


Also, the ACML includes routines for single core and multi-core processors. I would use the single core version even with the phenom as BOINC relegates each task to a single core, correct?


Yep, that's correct.
M4 Project homepage
M4 Project wiki
ID: 616 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Tried compiling....SGT.o error




Copyright © 2025 TJM