llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-16 16:48:02 +00:00

Author	SHA1	Message	Date
Alkis Evlogimenos	0742b93bb9	Add memory operand folding support for SHLD and SHRD instructions. llvm-svn: 11905	2004-02-27 15:03:18 +00:00
Alkis Evlogimenos	b1f67f6741	Add memory operand folding support for SHL, SHR and SAR, SHLD instructions. llvm-svn: 11903	2004-02-27 09:28:43 +00:00
Alkis Evlogimenos	cf49d13ed2	Rename SHL, SHR, SAR, SHLD and SHLR instructions to make them consistent with the rest and also pepare for the addition of their memory operand variants. llvm-svn: 11902	2004-02-27 06:57:05 +00:00
John Criswell	0b01bff060	Fixes for PR258 and PR259. Functions with linkonce linkage are declared with weak linkage. Global floating point constants used to represent unprintable values (such as NaN and infinity) are declared static so that they don't interfere with other CBE generated translation units. llvm-svn: 11884	2004-02-26 22:20:58 +00:00
Alkis Evlogimenos	b15631fcfa	Uncomment assertions that register# != 0 on calls to MRegisterInfo::is{Physical,Virtual}Register. Apply appropriate fixes to relevant files. llvm-svn: 11882	2004-02-26 22:00:20 +00:00
Chris Lattner	4aff6ec077	Use a map instead of annotations llvm-svn: 11875	2004-02-26 08:02:17 +00:00
Chris Lattner	6a3796eaf9	Fix some warnings, some of which were spurious, and some of which were real bugs. Thanks Brian! llvm-svn: 11859	2004-02-26 01:20:02 +00:00
Misha Brukman	3d1720cdb9	Instructions to call and return from functions. llvm-svn: 11858	2004-02-26 00:37:12 +00:00
Misha Brukman	6a13621948	SparcV8 regs are really 32-bit, not 64! Thanks, Chris. llvm-svn: 11835	2004-02-25 21:03:02 +00:00
Misha Brukman	f12c1e5a55	Clean up the tablegen descriptions for SparcV8. llvm-svn: 11834	2004-02-25 21:02:21 +00:00
Misha Brukman	c8801eb5be	Fix the SparcV8 register definitions that were imported from PPC template. llvm-svn: 11833	2004-02-25 21:00:05 +00:00
Misha Brukman	a4b3e0f01b	SparcV8 has different types of instructions, but F1 is only used for CALL. llvm-svn: 11832	2004-02-25 20:52:20 +00:00
Chris Lattner	7c05e5d4d8	Fix failures in 099.go due to the cfgsimplify pass creating switch instructions where there did not used to be any before llvm-svn: 11829	2004-02-25 19:30:19 +00:00
Brian Gaeke	5166390fd2	SparcV8 skeleton llvm-svn: 11828	2004-02-25 19:28:19 +00:00
Brian Gaeke	c6de948cd1	Great renaming part II: Sparc --> SparcV9 (also includes command-line options and Makefiles) llvm-svn: 11827	2004-02-25 19:08:12 +00:00
Brian Gaeke	965df0b91b	Great renaming: Sparc --> SparcV9 llvm-svn: 11826	2004-02-25 18:44:15 +00:00
Chris Lattner	ab9628ad18	Teach the instruction selector how to transform 'array' GEP computations into X86 scaled indexes. This allows us to compile GEP's like this: int* %test([10 x { int, { int } }]* %X, int %Idx) { %Idx = cast int %Idx to long %X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0 ret int* %X } Into a single address computation: test: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4] ret Before it generated: test: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] shl %ECX, 3 add %EAX, %ECX lea %EAX, DWORD PTR [%EAX + 4] ret This is useful for things like int/float/double arrays, as the indexing can be folded into the loads&stores, reducing register pressure and decreasing the pressure on the decode unit. With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot. On bzip2 for example, we go from this: 10665 asm-printer - Number of machine instrs printed 40 ra-local - Number of loads/stores folded into instructions 1708 ra-local - Number of loads added 1532 ra-local - Number of stores added 1354 twoaddressinstruction - Number of instructions added 1354 twoaddressinstruction - Number of two-address instructions 2794 x86-peephole - Number of peephole optimization performed to this: 9873 asm-printer - Number of machine instrs printed 41 ra-local - Number of loads/stores folded into instructions 1710 ra-local - Number of loads added 1521 ra-local - Number of stores added 789 twoaddressinstruction - Number of instructions added 789 twoaddressinstruction - Number of two-address instructions 2142 x86-peephole - Number of peephole optimization performed ... and these types of instructions are often in tight loops. Linear scan is also helped, but not as much. It goes from: 8787 asm-printer - Number of machine instrs printed 2389 liveintervals - Number of identity moves eliminated after coalescing 2288 liveintervals - Number of interval joins performed 3522 liveintervals - Number of intervals after coalescing 5810 liveintervals - Number of original intervals 700 spiller - Number of loads added 487 spiller - Number of stores added 303 spiller - Number of register spills 1354 twoaddressinstruction - Number of instructions added 1354 twoaddressinstruction - Number of two-address instructions 363 x86-peephole - Number of peephole optimization performed to: 7982 asm-printer - Number of machine instrs printed 1759 liveintervals - Number of identity moves eliminated after coalescing 1658 liveintervals - Number of interval joins performed 3282 liveintervals - Number of intervals after coalescing 4940 liveintervals - Number of original intervals 635 spiller - Number of loads added 452 spiller - Number of stores added 288 spiller - Number of register spills 789 twoaddressinstruction - Number of instructions added 789 twoaddressinstruction - Number of two-address instructions 258 x86-peephole - Number of peephole optimization performed Though I'm not complaining about the drop in the number of intervals. :) llvm-svn: 11820	2004-02-25 07:00:55 +00:00
Chris Lattner	dccf14825c	* Make the previous patch more efficient by not allocating a temporary MachineInstr to do analysis. * FOLD getelementptr instructions into loads and stores when possible, making use of some of the crazy X86 addressing modes. For example, the following C++ program fragment: struct complex { double re, im; complex(double r, double i) : re(r), im(i) {} }; inline complex operator+(const complex& a, const complex& b) { return complex(a.re+b.re, a.im+b.im); } complex addone(const complex& arg) { return arg + complex(1,0); } Used to be compiled to: _Z6addoneRK7complex: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] * mov %EDX, %ECX fld QWORD PTR [%EDX] fld1 faddp %ST(1) * add %ECX, 8 fld QWORD PTR [%ECX] fldz faddp %ST(1) * mov %ECX, %EAX fxch %ST(1) fstp QWORD PTR [%ECX] *** add %EAX, 8 fstp QWORD PTR [%EAX] ret Now it is compiled to: _Z6addoneRK7complex: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] fld QWORD PTR [%ECX] fld1 faddp %ST(1) fld QWORD PTR [%ECX + 8] fldz faddp %ST(1) fxch %ST(1) fstp QWORD PTR [%EAX] fstp QWORD PTR [%EAX + 8] ret Other programs should see similar improvements, across the board. Note that in addition to reducing instruction count, this also reduces register pressure a lot, always a good thing on X86. :) llvm-svn: 11819	2004-02-25 06:13:04 +00:00
Chris Lattner	10d08a2955	Add a helper to create an addressing mode given all of the pieces. llvm-svn: 11818	2004-02-25 06:01:07 +00:00
Chris Lattner	c0e2bc0250	add an inefficient way of folding structure and constant array indexes together into a single LEA instruction. This should improve the code generated for things like X->A.B.C[12].D. The bigger benefit is still coming though. Note that this uses an LEA instruction instead of an add, giving the register allocator more freedom. We should probably never generate ADDri32's. llvm-svn: 11817	2004-02-25 03:45:50 +00:00
Chris Lattner	969f90db77	Implement special case for storing an immediate into memory so that we don't need an intermediate register. llvm-svn: 11816	2004-02-25 02:56:58 +00:00
Brian Gaeke	eae0364189	FunctionLiveVarInfo.h moved: include/llvm/CodeGen -> lib/Target/Sparc/LiveVar llvm-svn: 11804	2004-02-24 19:46:00 +00:00
Chris Lattner	9da41150e8	Fix some unexpected fallout from the config.h changes. Because the CBE no longer was getting this #include, it always fell back on the less precise floating point initializer values, causing some testsuite failures. llvm-svn: 11803	2004-02-24 18:34:10 +00:00
Alkis Evlogimenos	9b103024ef	Refactor rewinding code for finding the first terminator of a basic block into MachineBasicBlock::getFirstTerminator(). This also fixes a bug in the implementation of the above in both RegAllocLocal and InstrSched, where instructions where added after the terminator if the basic block's only instruction was a terminator (it shouldn't matter for RegAllocLocal since this case never occurs in practice). llvm-svn: 11748	2004-02-23 18:14:48 +00:00
Chris Lattner	40e15a6000	Simplify code a bit, don't go off the end of the block, now that the current block we are in might be empty llvm-svn: 11744	2004-02-23 07:42:19 +00:00
Chris Lattner	28e4e925eb	We were forgetting to add FP_REG_KILL instructions to basic blocks which will eventually get an assignment due to elimination of PHIs. llvm-svn: 11743	2004-02-23 07:29:45 +00:00
Chris Lattner	b200638dc4	Work around a gas bug. Print '-9223372036854775808' as unsigned. llvm-svn: 11729	2004-02-23 03:27:05 +00:00
Chris Lattner	85f13fae06	Implement cast fp -> bool llvm-svn: 11728	2004-02-23 03:21:41 +00:00
Chris Lattner	795ca35cde	Stop passing iterators around by reference now that we have ilists! Implement cast Type::ULongTy -> double llvm-svn: 11726	2004-02-23 03:10:10 +00:00
Chris Lattner	f9acb33dfd	Add a new cmove instruction llvm-svn: 11722	2004-02-23 01:16:05 +00:00
Chris Lattner	cf8db3e8aa	Only insert FP_REG_KILL instructions in MachineBasicBlocks that actually use FP instructions. This reduces the number of instructions inserted in 176.gcc (for example) from 58074 to 101 (it doesn't use much FP, which is typical). This reduction speeds up the entire code generator. In the case of 176.gcc, llc went from taking 31.38s to 24.78s. The passes that sped up the most are the register allocator and the 2 live variable analysis passes, which sped up 2.3, 1.3, and 1.5s respectively. The asmprinter pass also sped up because it doesn't print the instructions in comments :) Note that this patch is likely to expose latent bugs in machine code passes, because now basicblock can be empty, where they were never empty before. I cleaned out regalloclocal, but who knows about linscan :) llvm-svn: 11717	2004-02-22 19:47:26 +00:00
Alkis Evlogimenos	7f7d70a53c	Move MOTy::UseType enum into MachineOperand. This eliminates the switch statements in the constructors and simplifies the implementation of the getUseType() member function. You will have to specify defs using MachineOperand::Def instead of MOTy::Def though (similarly for Use and UseAndDef). llvm-svn: 11715	2004-02-22 19:23:26 +00:00
Chris Lattner	cc9a188e0a	Reduce the number of pointless copies inserted due to constant pointer refs. Also, make an assertion actually fireable! llvm-svn: 11713	2004-02-22 17:35:42 +00:00
Chris Lattner	ed03319931	Fix bug in previous checkout: leave the iterator at the first instruction AFTER the GEP that was emitted. :( llvm-svn: 11712	2004-02-22 17:05:38 +00:00
Chris Lattner	ade64c9839	Completely rewrite how getelementptr instructions are expanded. This has two (minor) benefits right now: 1. An extra dummy MOVrr32 is gone. This move would often be coallesced by both allocators anyway. 2. The code now uses the gep_type_iterator to walk the gep, which should future proof it a bit. It still assumes that array indexes are Longs though. These don't really justify rewriting the code. The big benefit will come later though. llvm-svn: 11710	2004-02-22 07:04:00 +00:00
Alkis Evlogimenos	6998610eda	When folding memory operands in machine instructions be careful to leave register operands with the same use/def flags as the original instruction. llvm-svn: 11709	2004-02-22 06:54:26 +00:00
Chris Lattner	3392d316e9	Wow this is out of date. When we have _real_ code generator documentation, this should be folded into it. llvm-svn: 11705	2004-02-22 05:53:54 +00:00
Chris Lattner	cf8afa52b8	The two address pass cannot handle two addr instructions where one incoming value is a physreg and one is a virtreg. For this reason, disable copy folding entirely for physregs. Also, use the new isMoveInstr target hook which gives us folding of FP moves as well. llvm-svn: 11700	2004-02-22 04:44:58 +00:00
Chris Lattner	b24f30de8d	It is totally unacceptable to print out (literally) millions of zeros when compiling 129.compress... so don't! llvm-svn: 11649	2004-02-20 05:49:22 +00:00
Alkis Evlogimenos	7ec1bad952	Fix argument size for MOVSX and MOVZX instructions. llvm-svn: 11576	2004-02-18 16:20:40 +00:00
Chris Lattner	f58d2dd6cf	Add support for GlobalAddress's for alkis llvm-svn: 11560	2004-02-17 18:23:55 +00:00
Alkis Evlogimenos	c6f0651e5c	These store to memory too. llvm-svn: 11558	2004-02-17 17:53:48 +00:00
Chris Lattner	88271db3bc	These store to memory, not read from it. llvm-svn: 11556	2004-02-17 17:46:50 +00:00
Alkis Evlogimenos	0528c59353	Instructiosn with 1 memory operand have 4 operands in our representation.. duh! llvm-svn: 11554	2004-02-17 15:58:13 +00:00
Alkis Evlogimenos	b1a61b72f2	Align case statements. llvm-svn: 11552	2004-02-17 15:50:41 +00:00
Alkis Evlogimenos	b815fd46ec	Add TEST and XCHG memory operand support. llvm-svn: 11550	2004-02-17 15:48:42 +00:00
Alkis Evlogimenos	32a5b0fd6c	Add OR and XOR memory operand support. llvm-svn: 11549	2004-02-17 15:33:14 +00:00
Alkis Evlogimenos	1e4b3b3c9b	Peephole optimize SUBmi{16,32} into SUBmi{16,32}b when immediate is 8 bits wide. llvm-svn: 11548	2004-02-17 15:14:29 +00:00
Alkis Evlogimenos	4f22bb4d4b	ADDmi{16,32} should be in the next case statement. llvm-svn: 11547	2004-02-17 15:10:11 +00:00
Alkis Evlogimenos	135c4faa55	Add memory operand folding support for MUL, DIV, IDIV, NEG, NOT, MOVSX, and MOVZX. llvm-svn: 11546	2004-02-17 09:14:23 +00:00

1 2 3 4 5 ...

1380 Commits