I can see with the original code was that I forgot that this runs after
type legalization and hence the result type will always be i32. (Custom
legalization of EXTRACT_VECTOR_ELT is only enabled for vector types with
8- and 16-bit elements.)
Regarding the FIXME comment: any information about sign and zero-extension
should be captured by separate extension operations. The DAG combiner should
handle those to produce either VGETLANEu or VGETLANEs, and that seems to be
working now. If there are cases that we're missing, let me know.
llvm-svn: 84218
1. Emit external function type information for all COFF targets since it's
a feature of object format
2. Emit linker directives only for cygming (since this is ld-specific stuff)
llvm-svn: 84214
In the case where there are no good places to put constants and we fall back
upon inserting unconditional branches to make new blocks, allow all constant
pool references in range of those blocks to put constants there, even if that
means resetting the "high water marks" for those references. This will still
terminate because you can't keep splitting blocks forever, and in the bad
cases where we have to split blocks, it is important to avoid splitting more
than necessary.
llvm-svn: 84202
as expressions, code for parsing a few arm specific directives (still needs
the MCStreamer calls for these). Some clean up of the operand parsing code
and adding some comments.
llvm-svn: 84201
identifying the malloc as a non-array malloc. This broke GlobalOpt's optimization of stores of mallocs
to global variables.
The fix is to classify malloc's into 3 categories:
1. non-array mallocs
2. array mallocs whose array size can be determined
3. mallocs that cannot be determined to be of type 1 or 2 and cannot be optimized
getMallocArraySize() returns NULL for category 3, and all users of this function must avoid their
malloc optimization if this function returns NULL.
Eventually, currently unexpected codegen for computing the malloc's size argument will be supported in
isArrayMalloc() and getMallocArraySize(), extending malloc optimizations to those examples.
llvm-svn: 84199
When ARMConstantIslandPass cannot find any good locations (i.e., "water") to
place constants, it falls back to inserting unconditional branches to make a
place to put them. My recent change exposed a problem in this area. We may
sometimes append to the same block more than one unconditional branch. The
symptoms of this are that the generated assembly has a branch to an undefined
label and running llc with -debug will cause a seg fault.
This happens more easily since my change to prevent CPEs from moving from
lower to higher addresses as the algorithm iterates, but it could have
happened before. The end of the block may be in range for various constant
pool references, but the insertion point for new CPEs is not right at the end
of the block -- it is at the end of the CPEs that have already been placed
at the end of the block. The insertion point could be out of range. When
that happens, the fallback code will always append another unconditional
branch if the end of the block is in range.
The fix is to only append an unconditional branch if the block does not
already end with one. I also removed a check to see if the constant pool load
instruction is at the end of the block, since that is redundant with
checking if the end of the block is in-range.
There is more to be done here, but I think this fixes the immediate problem.
llvm-svn: 84172
don't bother every time going around the main worklist. This speeds up a
release-asserts opt -std-compile-opts on 403.gcc by about 4% (1.5s). It
seems to speed up the most expensive instances of instcombine by ~10%.
llvm-svn: 84171
instruction (which disqualifies stores, unreachable, etc) and at least the
first operand is a constant. This filters out a lot of obvious cases that
can't be folded. Also, switch the IRBuilder to a TargetFolder, which tries
harder.
llvm-svn: 84170
header is just the entry block to the loop, and it needn't be at
the top of the loop in the code layout.
Remove the code that suppressed loop alignment for outer loops,
so that outer loops are aligned.
llvm-svn: 84158
so get rid of eh.selector.i64 and rename eh.selector.i32 to eh.selector.
Likewise for eh.typeid.for. This aligns us with gcc, which always uses a
32 bit value for the selector on all platforms. My understanding is that
the register allocator used to assert if the selector intrinsic size didn't
match the pointer size, and this was the reason for introducing the two
variants. However my testing shows that this is no longer the case (I
fixed some bugs in selector lowering yesterday, and some more today in the
fastisel path; these might have caused the original problems).
llvm-svn: 84106