Some CPUs do not allow memory accesses to be unaligned, e.g. 2k1000la
who uses the la264 core on which misaligned access will trigger an
exception.
In this patch, a backend feature called `ual` is defined to decribe
whether the CPU supports unaligned memroy accesses. And this feature
can be toggled by clang options `-m[no-]unaligned-access` or the
aliases `-m[no-]strict-align`. When this feature is on,
`allowsMisalignedMemoryAccesses` sets the speed number to 1 and returns
true that allows the codegen to generate unaligned memory access insns.
Clang options `-m[no-]unaligned-access` are moved from `m_arm_Features_Group`
to `m_Group` because now more than one targets use them. And a test
is added to show that they remain unused on a target that does not
support them. In addition, to keep compatible with gcc, a new alias
`-mno-strict-align` is added which is equal to `-munaligned-access`.
The feature name `ual` is consistent with linux kernel [1] and the
output of `lscpu` or `/proc/cpuinfo` [2].
There is an `LLT` variant of `allowsMisalignedMemoryAccesses`, but
seems that curently it is only used in GlobalISel which LoongArch
doesn't support yet. So this variant is not implemented in this patch.
[1]: https://github.com/torvalds/linux/blob/master/arch/loongarch/include/asm/cpu.h#L77
[2]: https://github.com/torvalds/linux/blob/master/arch/loongarch/kernel/proc.c#L75
Reviewed By: xen0n
Differential Revision: https://reviews.llvm.org/D149946
Compares write a mask result. Min/max write a full result. This
makes them sufficiently different to have their own classes.
Reviewed By: pcwang-thead
Differential Revision: https://reviews.llvm.org/D152020
This new representation means that a valid command line option may
potentially be used directly as a multilib flag without any translation.
To indicate that a flag is required not to be present, its first
character is replaced with '!', which is intended for consistency with
the logical not operator in many programming languages.
Reviewed By: simon_tatham
Differential Revision: https://reviews.llvm.org/D151438
Decouple the interface of the MultilibBuilder flag method from how flags
are stored internally. Likewise change the addMultilibFlag function.
Currently a multilib flag like "-fexceptions" means a multilib is
*incompatible* with the -fexceptions command line option, which is
counter-intuitive. This change is a step towards changing this scheme.
Differential Revision: https://reviews.llvm.org/D151437
The TODO was left there to verify that Assign() runtime handles
overlaps of allocatable components. It did not, and this change-set
fixes it. Note that the same Assign() issue can be reproduced
without HLFIR. In the following example the LHS would be reallocated
before value of RHS (essentially, the same memory) is read:
```
program main
type t1
integer, allocatable :: a(:)
end type t1
type(t1) :: x, y
allocate(x%a(10))
do i =1,10
x%a(i) = 2*i
end do
x = x
print *, x%a
deallocate(x%a)
end program main
```
The test's output would be incorrect (though, this depends on the memory
reuse by malloc):
0 0 0 0 10 12 14 16 18 20
It is very hard to add a Flang unittest exploiting derived types.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D152306
The LoongArch ELF psABI document has changed location and versioning
scheme; this revision is v2.10 in the old scheme. Notably this revision
brings initial capability of linker relaxation to LoongArch.
Reviewed By: SixWeining, MaskRay
Differential Revision: https://reviews.llvm.org/D152184
In https://reviews.llvm.org/D147812 I created
`BalancedPartitioningTest.cpp` and inadvertently drastically increased the
number of files needed to compile `SupportTests`. Instead lets move the
`BPFunctionNode` test to `unittests/ProfileData` so we can remove the
extra dependency.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D152325
It's possible that both multiplicands are being negated. This won't
change the opcode, but we can delete the two negates. Allow this
case to get through negateFMAOpcode.
I think D152260 will also fix this test case, but in the future
it may be possible for an fneg to appear after we've already converted
to RISCVISD opcodes in which case D152260 won't help.
Reviewed By: fakepaper56
Differential Revision: https://reviews.llvm.org/D152296
On Darwin, we do not want to show the BuildId appended at the end of stack
frames in Sanitizers. The BuildId/UUID can be seen by using the
print_module_map=1 sanitizer option.
Differential Revision: https://reviews.llvm.org/D150298
rdar://108324403
This patch skips both `test_completion_target_create_from_root_dir`
introduced in `e896612` and `target-label.test` introduced in `1e82b20`
since I don't have a windows machine to try to accomodate the filesystem
path style differences for these tests to pass.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
Fix debug printing, making it easier to compare two debug logs side by side:
- `BinaryFunction::addRelocation`: print function name instead of `this` ptr,
- `DataAggregator::doTrace`: remove duplicated function name.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D152314
This fixes a regression introduced by 27f27d15f6 that results in a
NameError: (name 'self' is not defined) when using crashlog with the -c
option.
rdar://110007391
CUDA-12 no longer supports 32-bit compilation.
Tests agnostic to 32/64 compilation mode are switched to use nvptx64.
Tests that do care about it have 32-bit ptxas compilation disabled with cuda-12+.
Differential Revision: https://reviews.llvm.org/D152199
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.
Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
This simplifies the usage of `__less` by making the class not depend on the types compared, but instead the `operator()`. We can't remove the template completely because we explicitly instantiate `std::__sort` with `__less<T>`.
Reviewed By: ldionne, #libc
Spies: arichardson, EricWF, libcxx-commits, mgrang
Differential Revision: https://reviews.llvm.org/D145285
Parametrize SampleProfileInference and SampleProfileLoaderBaseImpl by function
type (Function/MachineFunction) instead of block type
(BasicBlock/MachineBasicBlock). Move out specializations to appropriate
locations.
This change makes it possible to use GraphTraits instead of a custom TypeMap and
make SampleProfileInference not dependent on LLVM types, paving the way for
generalizing SampleProfileInference interfaces to BOLT IR types
(BinaryFunction/BinaryBasicBlock) in stale profile matching (D144500).
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D152187