Jakob Stoklund Olesen ca561ffcf3 Replace the SubRegSet tablegen class with a less error-prone mechanism.
A Register with subregisters must also provide SubRegIndices for adressing the
subregisters. TableGen automatically inherits indices for sub-subregisters to
minimize typing.

CompositeIndices may be specified for the weirder cases such as the XMM sub_sd
index that returns the same register, and ARM NEON Q registers where both D
subregs have ssub_0 and ssub_1 sub-subregs.

It is now required that all subregisters are named by an index, and a future
patch will also require inherited subregisters to be named. This is necessary to
allow composite subregister indices to be reduced to a single index.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@104704 91177308-0d34-0410-b5e6-96231b3b80d8
2010-05-26 17:27:12 +00:00
..
2010-04-05 04:04:10 +00:00

//===-- README.txt - Notes for Blackfin Target ------------------*- org -*-===//

* Condition codes
** DONE Problem with asymmetric SETCC operations
The instruction

  CC = R0 < 2

is not symmetric - there is no R0 > 2 instruction. On the other hand, IF CC
JUMP can take both CC and !CC as a condition. We cannot pattern-match (brcond
(not cc), target), the DAG optimizer removes that kind of thing.

This is handled by creating a pseudo-register NCC that aliases CC. Register
classes JustCC and NotCC are used to control the inversion of CC.

** DONE CC as an i32 register
The AnyCC register class pretends to hold i32 values. It can only represent the
values 0 and 1, but we can copy to and from the D class. This hack makes it
possible to represent the setcc instruction without having i1 as a legal type.

In most cases, the CC register is set by a "CC = .." or BITTST instruction, and
then used in a conditional branch or move. The code generator thinks it is
moving 32 bits, but the value stays in CC. In other cases, the result of a
comparison is actually used as am i32 number, and CC will be copied to a D
register.

* Stack frames
** TODO Use Push/Pop instructions
We should use the push/pop instructions when saving callee-saved
registers. The are smaller, and we may even use push multiple instructions.

** TODO requiresRegisterScavenging
We need more intelligence in determining when the scavenger is needed. We
should keep track of:
- Spilling D16 registers
- Spilling AnyCC registers

* Assembler
** TODO Implement PrintGlobalVariable
** TODO Remove LOAD32sym
It's a hack combining two instructions by concatenation.

* Inline Assembly

These are the GCC constraints from bfin/constraints.md:

| Code  | Register class                            | LLVM |
|-------+-------------------------------------------+------|
| a     | P                                         | C    |
| d     | D                                         | C    |
| z     | Call clobbered P (P0, P1, P2)             | X    |
| D     | EvenD                                     | X    |
| W     | OddD                                      | X    |
| e     | Accu                                      | C    |
| A     | A0                                        | S    |
| B     | A1                                        | S    |
| b     | I                                         | C    |
| v     | B                                         | C    |
| f     | M                                         | C    |
| c     | Circular I, B, L                          | X    |
| C     | JustCC                                    | S    |
| t     | LoopTop                                   | X    |
| u     | LoopBottom                                | X    |
| k     | LoopCount                                 | X    |
| x     | GR                                        | C    |
| y     | RET*, ASTAT, SEQSTAT, USP                 | X    |
| w     | ALL                                       | C    |
| Z     | The FD-PIC GOT pointer (P3)               | S    |
| Y     | The FD-PIC function pointer register (P1) | S    |
| q0-q7 | R0-R7 individually                        |      |
| qA    | P0                                        |      |
|-------+-------------------------------------------+------|
| Code  | Constant                                  |      |
|-------+-------------------------------------------+------|
| J     | 1<<N, N<32                                |      |
| Ks3   | imm3                                      |      |
| Ku3   | uimm3                                     |      |
| Ks4   | imm4                                      |      |
| Ku4   | uimm4                                     |      |
| Ks5   | imm5                                      |      |
| Ku5   | uimm5                                     |      |
| Ks7   | imm7                                      |      |
| KN7   | -imm7                                     |      |
| Ksh   | imm16                                     |      |
| Kuh   | uimm16                                    |      |
| L     | ~(1<<N)                                   |      |
| M1    | 0xff                                      |      |
| M2    | 0xffff                                    |      |
| P0-P4 | 0-4                                       |      |
| PA    | Macflag, not M                            |      |
| PB    | Macflag, only M                           |      |
| Q     | Symbol                                    |      |

** TODO Support all register classes
* DAG combiner
** Create test case for each Illegal SETCC case
The DAG combiner may someimes produce illegal i16 SETCC instructions.

*** TODO SETCC (ctlz x), 5) == const
*** TODO SETCC (and load, const) == const
*** DONE SETCC (zext x) == const
*** TODO SETCC (sext x) == const

* Instruction selection
** TODO Better imediate constants
Like ARM, build constants as small imm + shift.

** TODO Implement cycle counter
We have CYCLES and CYCLES2 registers, but the readcyclecounter intrinsic wants
to return i64, and the code generator doesn't know how to legalize that.

** TODO Instruction alternatives
Some instructions come in different variants for example:

  D = D + D
  P = P + P

Cross combinations are not allowed:

  P = D + D (bad)

Similarly for the subreg pseudo-instructions:

 D16L = EXTRACT_SUBREG D16, bfin_subreg_lo16
 P16L = EXTRACT_SUBREG P16, bfin_subreg_lo16

We want to take advantage of the alternative instructions. This could be done by
changing the DAG after instruction selection.


** Multipatterns for load/store
We should try to identify multipatterns for load and store instructions. The
available instruction matrix is a bit irregular.

Loads:

| Addr       | D | P | D 16z | D 16s | D16 | D 8z | D 8s |
|------------+---+---+-------+-------+-----+------+------|
| P          | * | * | *     | *     | *   | *    | *    |
| P++        | * | * | *     | *     |     | *    | *    |
| P--        | * | * | *     | *     |     | *    | *    |
| P+uimm5m2  |   |   | *     | *     |     |      |      |
| P+uimm6m4  | * | * |       |       |     |      |      |
| P+imm16    |   |   |       |       |     | *    | *    |
| P+imm17m2  |   |   | *     | *     |     |      |      |
| P+imm18m4  | * | * |       |       |     |      |      |
| P++P       | * |   | *     | *     | *   |      |      |
| FP-uimm7m4 | * | * |       |       |     |      |      |
| I          | * |   |       |       | *   |      |      |
| I++        | * |   |       |       | *   |      |      |
| I--        | * |   |       |       | *   |      |      |
| I++M       | * |   |       |       |     |      |      |

Stores:

| Addr       | D | P | D16H | D16L | D 8 |
|------------+---+---+------+------+-----|
| P          | * | * | *    | *    | *   |
| P++        | * | * |      | *    | *   |
| P--        | * | * |      | *    | *   |
| P+uimm5m2  |   |   |      | *    |     |
| P+uimm6m4  | * | * |      |      |     |
| P+imm16    |   |   |      |      | *   |
| P+imm17m2  |   |   |      | *    |     |
| P+imm18m4  | * | * |      |      |     |
| P++P       | * |   | *    | *    |     |
| FP-uimm7m4 | * | * |      |      |     |
| I          | * |   | *    | *    |     |
| I++        | * |   | *    | *    |     |
| I--        | * |   | *    | *    |     |
| I++M       | * |   |      |      |     |

* Workarounds and features
Blackfin CPUs have bugs. Each model comes in a number of silicon revisions with
different bugs. We learn about the CPU model from the -mcpu switch.

** Interpretation of -mcpu value
- -mcpu=bf527 refers to the latest known BF527 revision
- -mcpu=bf527-0.2 refers to silicon rev. 0.2
- -mcpu=bf527-any refers to all known revisions
- -mcpu=bf527-none disables all workarounds

The -mcpu setting affects the __SILICON_REVISION__ macro and enabled workarounds:

| -mcpu      | __SILICON_REVISION__ | Workarounds        |
|------------+----------------------+--------------------|
| bf527      | Def Latest           | Specific to latest |
| bf527-1.3  | Def 0x0103           | Specific to 1.3    |
| bf527-any  | Def 0xffff           | All bf527-x.y      |
| bf527-none | Undefined            | None               |

These are the known cores and revisions:

| Core        | Silicon            | Processors              |
|-------------+--------------------+-------------------------|
| Edinburgh   | 0.3, 0.4, 0.5, 0.6 | BF531 BF532 BF533       |
| Braemar     | 0.2, 0.3           | BF534 BF536 BF537       |
| Stirling    | 0.3, 0.4, 0.5      | BF538 BF539             |
| Moab        | 0.0, 0.1, 0.2      | BF542 BF544 BF548 BF549 |
| Teton       | 0.3, 0.5           | BF561                   |
| Kookaburra  | 0.0, 0.1, 0.2      | BF523 BF525 BF527       |
| Mockingbird | 0.0, 0.1           | BF522 BF524 BF526       |
| Brodie      | 0.0, 0.1           | BF512 BF514 BF516 BF518 |


** Compiler implemented workarounds
Most workarounds are implemented in header files and source code using the
__ADSPBF527__ macros. A few workarounds require compiler support.

|  Anomaly | Macro                          | GCC Switch       |
|----------+--------------------------------+------------------|
|      Any | __WORKAROUNDS_ENABLED          |                  |
| 05000074 | WA_05000074                    |                  |
| 05000244 | __WORKAROUND_SPECULATIVE_SYNCS | -mcsync-anomaly  |
| 05000245 | __WORKAROUND_SPECULATIVE_LOADS | -mspecld-anomaly |
| 05000257 | WA_05000257                    |                  |
| 05000283 | WA_05000283                    |                  |
| 05000312 | WA_LOAD_LCREGS                 |                  |
| 05000315 | WA_05000315                    |                  |
| 05000371 | __WORKAROUND_RETS              |                  |
| 05000426 | __WORKAROUND_INDIRECT_CALLS    | Not -micplb      |

** GCC feature switches
| Switch                    | Description                            |
|---------------------------+----------------------------------------|
| -msim                     | Use simulator runtime                  |
| -momit-leaf-frame-pointer | Omit frame pointer for leaf functions  |
| -mlow64k                  |                                        |
| -mcsync-anomaly           |                                        |
| -mspecld-anomaly          |                                        |
| -mid-shared-library       |                                        |
| -mleaf-id-shared-library  |                                        |
| -mshared-library-id=      |                                        |
| -msep-data                | Enable separate data segment           |
| -mlong-calls              | Use indirect calls                     |
| -mfast-fp                 |                                        |
| -mfdpic                   |                                        |
| -minline-plt              |                                        |
| -mstack-check-l1          | Do stack checking in L1 scratch memory |
| -mmulticore               | Enable multicore support               |
| -mcorea                   | Build for Core A                       |
| -mcoreb                   | Build for Core B                       |
| -msdram                   | Build for SDRAM                        |
| -micplb                   | Assume ICPLBs are enabled at runtime.  |