Merge pull request #817 from tobiasjakobi/exynos

Exynos video driver
This commit is contained in:
Twinaphex 2014-07-17 18:32:50 +02:00
commit a645bf74a2
12 changed files with 1710 additions and 1016 deletions

View File

@ -211,15 +211,16 @@ ifeq ($(HAVE_SDL), 1)
LIBS += $(SDL_LIBS)
endif
ifeq ($(HAVE_LIMA), 1)
OBJ += gfx/lima_gfx.o
LIBS += -llimare
endif
ifeq ($(HAVE_OMAP), 1)
OBJ += gfx/omap_gfx.o
endif
ifeq ($(HAVE_EXYNOS), 1)
OBJ += gfx/exynos_gfx.o memcpy-neon.o
LIBS += $(DRM_LIBS) $(EXYNOS_LIBS)
DEFINES += $(DRM_CFLAGS) $(EXYNOS_CFLAGS)
endif
ifeq ($(HAVE_OPENGL), 1)
OBJ += gfx/gl.o \
gfx/gfx_context.o \

60
README-exynos.md Normal file
View File

@ -0,0 +1,60 @@
# RetroArch Exynos-G2D video driver
The Exynos-G2D video driver for RetroArch uses the Exynos DRM layer for presentation and the Exynos G2D block to scale and blit the emulator framebuffer to the screen. The G2D subsystem is a separate functional block on modern Samsung Exynos SoCs (in particular Exynos4412 and Exynos5250) that accelerates various kind of 2D blit operations. It can fill, copy, scale and blend pixel buffers and therefore provides adequate functionality for RetroArch purposes.
## Reasons to use the driver
Hardware accelerated rendering on devices based on an Exynos SoC is usually restricted to the use of the GPU block, which is either a Mali or PowerVR IP. Both GPU types have the problem that interfacing with them requires a proprietary driver stack, comprised of kernel and userspace code. While the kernel code is open source, the userspace code is only available as a binary blob to the enduser.
If you want to use such a device with an upstream kernel, the GPU block will most likely not work for you. Also the chances of Mali or PowerVR kernel code being accepted upstream is very slim. Still, one might want to ask the question if using the GPU block for such trivial operations (basically scale and blend) is the right approach in the first place.
Since the G2D block is present on all modern Exynos SoCs, the natural way of proceeding would be to use it instead of the GPU block. The G2D is still a dedicated piece of hardware, so all operations are offloaded from the CPU. It should be noted though, that using the G2D instead of the GPU removes the possibility to use GPU shaders to enhance the image quality of your emulator core of choice. If the user relies on these enhancements, then he's advised to continue using the GPU, most likely by using the EGL/GLES video driver.
The author uses a Hardkernel ODROID-X2, which is an developer board powered by an Exynos4412 SoC. The vendor supplied kernel, a Linux tree based on the 3.8.y branch, currently offers no way to use the G2D because of issues related to clock setup. However upstreaming work is in progress and a tree based on 3.15.y, with some slight modifications, is available from here:
[odroid-3.15.y repository](https://github.com/tobiasjakobi/linux-odroid)
Please refer to the minimalistic documentation in README-ODROID for setup.
## Performance analysis
Some simple benchmarking was done to evaluate the performance of the G2D block. The test run was done with the snes9x-next emulation core and a game title that uses a native resolution of 256x224 pixels. The output screen was configured to a 1280x720 mode. Scaling to the output screen was done by keeping the native aspect ratio. In this case this would result in an output rectangle of size 822x720.
total memcpy calls: 18795
total g2d calls: 18795
total memcpy time: 8.978532 seconds
total g2d time: 29.703944 seconds
average time per memcpy call: 477.708540 microseconds
average time per g2d call: 1580.417345 microseconds
The average time to display the emulator framebuffer on screen is roughly 2058 microseconds, or around 486 frames per second. Assuming that the time consumption increases linearly with the amount of pixels processed, which is usually a safe assumption, scaling to an output rectangle of size 1920x1080 would yield a average duration of 7207 microseconds, which is still 138 frames per second.
## Configuration
The video driver uses the libdrm API to interface with the DRM. Some patches are still missing in the upstream tree, therefore the user is advised to use the 'exynos' branch of the repository mentioned below.
[libdrm repository](https://github.com/tobiasjakobi/libdrm)
Make sure that the Exynos API support is enabled. If you're building libdrm from source, then use
./configure --enable-exynos-experimental-api
to enable it.
The video driver name is 'exynos'. It honors the following video settings:
- video\_monitor\_index
- video\_fullscreen\_x and video\_fullscreen\_y
The monitor index maps to the DRM connector index. If it is zero, then it just selects the first 'sane' connector, which means that it is connected to a display device and it provides at least one useable mode. If the value is non-zero, it forces the selection of this connector. For example, on the author's ODROID-X2, with an odroid-3.15.y kernel, the HDMI connector has index 1.
The two fullscreen parameters select the mode the DRM should select. If zero, the native connector mode is selected. If non-zero, the DRM tries to select the wanted mode. This might fail if the mode is not available from the connector.
## Issues and TODOs
The driver still suffers from some issues.
- The aspect ratio computation can be improved. In particular the user supplied aspect ratio is currently unused.
- Font rendering and blitting is very inefficient since the backing buffer is cleared every frame. Introduce a invalidation rectangle which covers the region where font glyphs are drawn, and then only clear this region.
- Temporary GEM buffers are used as source for blitting operations. Support for the IOMMU has to be enabled, so that one can use the 'userptr' functionality.
- More TODOs are pointed out in the code itself.

View File

@ -1,42 +0,0 @@
# RetroArch Lima video driver
The Lima video driver for RetroArch uses the open-source Lima driver, which implements the userspace code to enable the Mali GPU contained in a lot of ARM SoC. At the time of writing (24/01/2014) the Lima driver supports GPUs of the type Mali-200 and Mali-400. The full driver stack to enable the Mali GPU is comprised of a part in kernelspace, which is available as open-source from ARM itself, and the aforementioned userspace part, which ARM only supplies as a binary blob.
## Reasons to use the driver
The original binary blob provides hardware-accelerated GLES 2.0 rendering through EGL. Depending on which blob one uses, rendering is either done to a framebuffer provided by a fbdev device or a framebuffer provided by a X11 window. None of these choices are particular good and are also not very performant.
The author uses a Hardkernel ODROID-X2, which is an developer board powered by an Exynos4412 SoC. This SoC incorporates a Mali-400 GPU and dedicated blocks for 2D acceleration and HDMI interfacing. The non-Mali graphics blocks functionality of the SoC is exported through a DRM driver (Exynos DRM).
The DRM exposes a fbdev device through an emulation layer. The layer introduces overhead and also doesn't provide any decent support for proper vertical synchronisation, ruining the experience with lots of tearing artifacts. Switching back and forth between fbdev and X11 solved neither the vsync nor the performance issue.
Users with similar experiences on a Exynos4-based hardware (coupled with a Mali GPU supported by the Lima driver) are invited to try this driver.
## Configuration
The original Lima driver suffers from a similar problem as the blob, since it can only render into a framebuffer provided by a fbdev device. In the repository mentioned below you can find a modified Lima version, which can utilize the Exynos DRM directly.
[lima-drm repository](https://github.com/tobiasjakobi/lima-drm)
The Lima video driver for RetroArch only works with this version. Proceed with the usual steps to install limare from the repository onto your system. Make sure that you have a recent version of [libdrm](http://cgit.freedesktop.org/mesa/drm/) installed on your system, and that Exynos API support is enabled in libdrm. If you're compiling libdrm from source, then use
./configure --enable-exynos-experimental-api
to enable the Exynos API. After finishing the limare build, compile RetroArch against the resulting limare library (*liblimare.so*). I usually just skip the make install step and manually place the library and header (which is *limare.h*) in *$HOME/local/lib/* and *$HOME/local/include/* respectively (this requires adjustements for *LD_LIBRARY_PATH* and *CFLAGS*).
The video driver name is 'lima'. It honors the following video settings:
- video\_monitor\_index
- video\_fullscreen\_x and video\_fullscreen\_y
The monitor index maps to the DRM connector index. If it's zero, then it just selects the first "sane" connector, which means that it is connected to a display device and it provides at least one useable mode. If the value is non-zero, it forces the selection of this connector. For example, on the ODROID-X2 the HDMI connector has index 2.
The two fullscreen parameters select the mode the DRM should select. If zero, the native connector mode is selected. If non-zero, the DRM tries to select the wanted mode. This might fail if the mode is not available from the connector.
## Issues and TODOs
The driver still suffers from some issues.
- The aspect ratio is wrong. The dimensions of the emulator framebuffer on the screen are not computed correctly at the moment.
- Limare should be able to handle a custom pitch, when uploading texture pixel data. This would save some memcpy for emulator cores which don't provide the framebuffer with full pitch (snes9x-next for example).
- Font rendering is kinda inefficient, since the whole font texture is invalidated each frame. It would be better to introduce something like an invalidated rectangle, which tracks the region which needs to be updated.

View File

@ -45,6 +45,7 @@ enum
VIDEO_VG,
VIDEO_NULL,
VIDEO_OMAP,
VIDEO_EXYNOS,
AUDIO_RSOUND,
AUDIO_OSS,

View File

@ -134,11 +134,11 @@ static const video_driver_t *video_drivers[] = {
#ifdef HAVE_NULLVIDEO
&video_null,
#endif
#ifdef HAVE_LIMA
&video_lima,
#endif
#ifdef HAVE_OMAP
&video_omap,
#endif
#ifdef HAVE_EXYNOS
&video_exynos,
#endif
NULL,
};

View File

@ -624,8 +624,8 @@ extern const video_driver_t video_xdk_d3d;
extern const video_driver_t video_sdl;
extern const video_driver_t video_vg;
extern const video_driver_t video_null;
extern const video_driver_t video_lima;
extern const video_driver_t video_omap;
extern const video_driver_t video_exynos;
extern const input_driver_t input_android;
extern const input_driver_t input_sdl;
extern const input_driver_t input_dinput;

1491
gfx/exynos_gfx.c Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1,958 +0,0 @@
/* RetroArch - A frontend for libretro.
* Copyright (C) 2013-2014 - Tobias Jakobi
* Copyright (C) 2013-2014 - Daniel Mehrwald
*
* RetroArch is free software: you can redistribute it and/or modify it under the terms
* of the GNU General Public License as published by the Free Software Found-
* ation, either version 3 of the License, or (at your option) any later version.
*
* RetroArch is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
* without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
* PURPOSE. See the GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along with RetroArch.
* If not, see <http://www.gnu.org/licenses/>.
*/
#include <stdlib.h>
#include <string.h>
#include <limare.h>
#include <GLES2/gl2.h>
#include "../general.h"
#include "gfx_common.h"
#include "fonts/fonts.h"
/* Rename to LIMA_GFX_DEBUG to enable debugging code. */
#define NO_LIMA_GFX_DEBUG 1
/* Current limare only natively supports a limited amount of formats for texture *
* data. We compensate for this limitation by swizzling the texture data in the *
* pixel shader. */
#define LIMA_TEXEL_FORMAT_BGR_565 0x0e
#define LIMA_TEXEL_FORMAT_RGBA_5551 0x0f
#define LIMA_TEXEL_FORMAT_RGBA_4444 0x10
#define LIMA_TEXEL_FORMAT_RGBA_8888 0x16
/* Limare is currently unable to deallocate individual texture objects and *
* only allows to destroy all objects at once. *
* We only create a maximum of 12 objects, before doing a full "reset", or *
* sooner, under the condition that limare's texture memory runs out. */
static const unsigned num_max_textures = 12;
typedef struct limare_state limare_state_t;
typedef struct limare_texture {
unsigned width;
unsigned height;
int handle;
unsigned format;
bool menu;
} limare_texture_t;
typedef struct vec2f {
float x, y;
} vec2f_t;
typedef struct vec3f {
float x, y, z;
} vec3f_t;
/* Create three shader programs. One is for displaying only the emulator core pixel data. *
* The other two are for displaying the menu, where the pixel data can be provided in *
* two different formats. Current RetroArch only seems to ever use a single format, but *
* this is not set in stone, therefore making two programs necessary. */
typedef struct limare_data {
limare_state_t *state;
int program;
int program_menu_rgba16;
int program_menu_rgba32;
float screen_aspect;
float frame_aspect;
unsigned upload_format;
unsigned upload_bpp; /* bytes per pixel */
vec3f_t *vertices;
vec2f_t *coords;
/* Generic buffer to create contiguous pixel data for limare
* or to use for font blitting. */
void *buffer;
unsigned buffer_size;
limare_texture_t **textures;
unsigned texture_slots;
limare_texture_t *cur_texture;
limare_texture_t *cur_texture_menu;
unsigned font_width;
unsigned font_height;
limare_texture_t *font_texture;
} limare_data_t;
/* Header for simple vertex shader. */
static const char *vshader_src =
"attribute vec4 in_vertex;\n"
"attribute vec2 in_coord;\n"
"\n"
"varying vec2 coord;\n"
"\n"
"void main()\n"
"{\n"
" gl_Position = in_vertex;\n"
" coord = in_coord;\n"
"}\n";
/* Header for simple fragment shader. */
static const char *fshader_header_src =
"precision highp float;\n"
"\n"
"varying vec2 coord;\n"
"\n"
"uniform sampler2D in_texture;\n"
"\n";
/* Main (template) for simple fragment shader. */
static const char *fshader_main_src =
"void main()\n"
"{\n"
" vec3 pixel = texture2D(in_texture, coord)%s;\n"
" gl_FragColor = vec4(pixel, 1.0);\n"
"}\n";
/* Header for menu fragment shader. */
/* Use mediump, which makes uColor into a (single-precision) float[4]. */
static const char *fshader_menu_header_src =
"precision mediump float;\n"
"\n"
"varying vec2 coord;\n"
"uniform vec4 uColor;\n"
"\n"
"uniform sampler2D in_texture;\n"
"\n";
/* Main (template) for menu fragment shader. */
static const char *fshader_menu_main_src =
"void main()\n"
"{\n"
" vec4 pixel = texture2D(in_texture, coord)%s;\n"
" gl_FragColor = pixel * uColor;\n"
"}\n";
static inline void put_pixel_rgba4444(uint16_t *p, unsigned r, unsigned g, unsigned b, unsigned a) {
*p = (a >> 4) | ((b >> 4) << 4) | ((g >> 4) << 8) | ((r >> 4) << 12);
}
static inline unsigned align_common(unsigned i, unsigned j) {
return (i + j - 1) & ~(j - 1);
}
static float get_screen_aspect(limare_state_t *state) {
unsigned w = 0, h = 0;
limare_buffer_size(state, &w, &h);
if (w != 0 && h != 0) {
return (float)w / (float)h;
}
return 0.0f;
}
static void apply_aspect(limare_data_t *pdata, float ratio) {
vec3f_t *vertices = pdata->vertices;
float x, y;
if (fabsf(pdata->screen_aspect - pdata->frame_aspect) < 0.0001f) {
x = 1.0f;
y = 1.0f;
} else {
if (pdata->screen_aspect > pdata->frame_aspect) {
x = pdata->frame_aspect / pdata->screen_aspect;
y = 1.0f;
} else {
x = 1.0f;
y = pdata->screen_aspect / pdata->frame_aspect;
}
}
/* TODO: use ratio parameter */
vertices[0].x = vertices[2].x = -x;
vertices[1].x = vertices[3].x = x;
vertices[0].y = vertices[1].y = -y;
vertices[2].y = vertices[3].y = y;
}
static int destroy_textures(limare_data_t *pdata) {
unsigned i;
int ret;
pdata->cur_texture = NULL;
pdata->cur_texture_menu = NULL;
for (i = 0; i < pdata->texture_slots; ++i) {
free(pdata->textures[i]);
pdata->textures[i] = NULL;
}
ret = limare_texture_cleanup(pdata->state);
pdata->texture_slots = 0;
return ret;
}
static limare_texture_t *get_texture_handle(limare_data_t *pdata,
unsigned width, unsigned height, unsigned format) {
unsigned i;
format = (format == 0) ? pdata->upload_format : format;
for (i = 0; i < pdata->texture_slots; ++i) {
if (pdata->textures[i]->width == width &&
pdata->textures[i]->height == height &&
pdata->textures[i]->format == format) return pdata->textures[i];
}
if (pdata->texture_slots == num_max_textures) {
/* All texture slots are used, do a reset. */
if (destroy_textures(pdata)) {
RARCH_ERR("video_lima: failed to reset texture storage\n");
}
}
return NULL;
}
static limare_texture_t *add_texture(limare_data_t *pdata,
unsigned width, unsigned height,
const void *pixels, unsigned format) {
int texture = -1;
unsigned retries = 2;
const unsigned i = pdata->texture_slots;
format = (format == 0) ? pdata->upload_format : format;
/* limare_texture_upload returns -1 when the upload fails for some reason. */
while (texture == -1 && retries > 0) {
texture = limare_texture_upload(pdata->state, pixels, width, height, format, 0);
if (texture != -1) break;
destroy_textures(pdata);
retries--;
}
if (texture == -1) return NULL;
/* Set magnification to linear and minification to nearest, since we will *
* probably only ever scale the image to larger dimensions. Also set *
* wrap mode for both coords to clamp, which should eliminate some artifacts. */
limare_texture_parameters(pdata->state, texture, GL_LINEAR, GL_NEAREST,
GL_CLAMP_TO_EDGE, GL_CLAMP_TO_EDGE);
pdata->textures[i] = calloc(1, sizeof(limare_texture_t));
pdata->textures[i]->width = width;
pdata->textures[i]->height = height;
pdata->textures[i]->handle = texture;
pdata->textures[i]->format = format;
pdata->texture_slots++;
return pdata->textures[i];
}
static const void *make_contiguous(limare_data_t *pdata,
unsigned width, unsigned height,
const void *pixels, unsigned bpp,
unsigned pitch) {
unsigned i;
unsigned full_pitch;
bpp = (bpp == 0) ? pdata->upload_bpp : bpp;
full_pitch = width * bpp;
if (full_pitch == pitch) return pixels;
RARCH_LOG("video_lima: input buffer not contiguous\n");
/* Enlarge our buffer, if it is currently too small. */
if (pdata->buffer_size < full_pitch * height) {
const unsigned aligned_size = align_common(full_pitch * height, 16);
free(pdata->buffer);
pdata->buffer = NULL;
posix_memalign(&pdata->buffer, 16, aligned_size);
if (pdata->buffer == NULL) {
RARCH_ERR("video_lima: failed to allocate buffer to make pixel data contiguous\n");
return NULL;
}
pdata->buffer_size = aligned_size;
}
for (i = 0; i < height; ++i) {
memcpy(pdata->buffer + i * full_pitch, pixels + i * pitch, full_pitch);
}
return pdata->buffer;
}
#ifdef LIMA_GFX_DEBUG
static void print_status(limare_data_t *pdata) {
unsigned i;
RARCH_LOG("video_lima: upload format = 0x%x, upload bpp = %u\n", pdata->upload_format, pdata->upload_bpp);
RARCH_LOG("video_lima: buffer at %p, buffer size = %u\n", pdata->buffer, pdata->buffer_size);
RARCH_LOG("video_lima: used texture slots = %u (from %u)\n", pdata->texture_slots, num_max_textures);
for (i = 0; i < pdata->texture_slots; ++i) {
RARCH_LOG("video_lima: texture slot %u, width = %u, height = %u, handle = %u, format = 0x%x\n",
i, pdata->textures[i]->width, pdata->textures[i]->height,
pdata->textures[i]->handle, pdata->textures[i]->format);
}
}
#endif
static void destroy_data(limare_data_t *pdata) {
free(pdata->vertices);
free(pdata->coords);
}
static int setup_data(limare_data_t *pdata) {
static const unsigned num_verts = 4;
static const unsigned num_coords = 4 * 4;
unsigned i;
static const vec3f_t vertices[4] = {
{-1.0f, -1.0f, 0.0f},
{ 1.0f, -1.0f, 0.0f},
{-1.0f, 1.0f, 0.0f},
{ 1.0f, 1.0f, 0.0f}
};
static const vec2f_t coords[16] = {
{0.0f, 1.0f}, {1.0f, 1.0f}, /* 0 degrees */
{0.0f, 0.0f}, {1.0f, 0.0f},
{0.0f, 0.0f}, {0.0f, 1.0f}, /* 90 degrees */
{1.0f, 0.0f}, {1.0f, 1.0f},
{1.0f, 0.0f}, {0.0f, 0.0f}, /* 180 degrees */
{1.0f, 1.0f}, {0.0f, 1.0f},
{1.0f, 1.0f}, {1.0f, 0.0f}, /* 270 degrees */
{0.0f, 1.0f}, {0.0f, 0.0f}
};
pdata->vertices = calloc(num_verts, sizeof(vec3f_t));
if (pdata->vertices == NULL) goto fail;
pdata->coords = calloc(num_coords, sizeof(vec2f_t));
if (pdata->coords == NULL) goto fail;
for (i = 0; i < num_verts; ++i) {
pdata->vertices[i] = vertices[i];
}
for (i = 0; i < num_coords; ++i) {
pdata->coords[i] = coords[i];
}
return 0;
fail:
return -1;
}
static int create_programs(limare_data_t *pdata) {
char tmpbufm[1024]; /* temp buffer for main function */
char tmpbuf[1024]; /* temp buffer for whole program */
const char* swz = (pdata->upload_bpp == 4) ? ".bgr" : ".rgb";
/* Create shader program for regular operation first. */
pdata->program = limare_program_new(pdata->state);
if (pdata->program < 0) goto fail;
snprintf(tmpbufm, 1024, fshader_main_src, swz);
strncpy(tmpbuf, fshader_header_src, 1024);
strcat(tmpbuf, tmpbufm);
if (vertex_shader_attach(pdata->state, pdata->program, vshader_src)) goto fail;
if (fragment_shader_attach(pdata->state, pdata->program, tmpbuf)) goto fail;
if (limare_link(pdata->state)) goto fail;
/* Create shader program for menu with RGBA4444 pixel data. */
pdata->program_menu_rgba16 = limare_program_new(pdata->state);
if (pdata->program_menu_rgba16 < 0) goto fail;
snprintf(tmpbufm, 1024, fshader_menu_main_src, ".abgr");
strncpy(tmpbuf, fshader_menu_header_src, 1024);
strcat(tmpbuf, tmpbufm);
if (vertex_shader_attach(pdata->state, pdata->program_menu_rgba16, vshader_src)) goto fail;
if (fragment_shader_attach(pdata->state, pdata->program_menu_rgba16, tmpbuf)) goto fail;
if (limare_link(pdata->state)) goto fail;
/* Create shader program for menu with RGBA8888 pixel data. */
pdata->program_menu_rgba32 = limare_program_new(pdata->state);
if (pdata->program_menu_rgba32 < 0) goto fail;
snprintf(tmpbufm, 1024, fshader_menu_main_src, ".abgr");
strncpy(tmpbuf, fshader_menu_header_src, 1024);
strcat(tmpbuf, tmpbufm);
if (vertex_shader_attach(pdata->state, pdata->program_menu_rgba32, vshader_src)) goto fail;
if (fragment_shader_attach(pdata->state, pdata->program_menu_rgba32, tmpbuf)) goto fail;
if (limare_link(pdata->state)) goto fail;
return 0;
fail:
return -1;
}
static void put_glyph_rgba4444(limare_data_t *pdata, const uint8_t *src, uint8_t *f_rgb,
unsigned g_width, unsigned g_height, unsigned g_pitch,
unsigned dst_x, unsigned dst_y) {
unsigned x, y;
uint16_t *dst;
dst = (uint16_t*)pdata->buffer + dst_y * pdata->font_width + dst_x;
for (y = 0; y < g_height; ++y, src += g_pitch, dst += pdata->font_width) {
for (x = 0; x < g_width; ++x) {
const uint8_t blend = src[x];
if (blend != 0) put_pixel_rgba4444(&dst[x], f_rgb[0], f_rgb[1], f_rgb[2], blend);
}
}
}
typedef struct lima_video {
limare_data_t *lima;
void *font;
const font_renderer_driver_t *font_driver;
uint8_t font_rgb[4];
/* current dimensions */
unsigned width;
unsigned height;
/* MENU data */
int menu_rotation;
float menu_alpha;
bool menu_active;
bool menu_rgb32;
bool aspect_changed;
} lima_video_t;
static void lima_gfx_free(void *data) {
lima_video_t *vid = data;
if (!vid) return;
if (vid->lima && vid->lima->state) limare_finish(vid->lima->state);
if (vid->font) vid->font_driver->free(vid->font);
destroy_data(vid->lima);
destroy_textures(vid->lima);
free(vid->lima->textures);
free(vid->lima);
free(vid);
}
static void lima_init_font(lima_video_t *vid, const char *font_path, unsigned font_size) {
if (!g_settings.video.font_enable) return;
if (font_renderer_create_default(&vid->font_driver, &vid->font,
*g_settings.video.font_path ? g_settings.video.font_path : NULL, g_settings.video.font_size)) {
int r = g_settings.video.msg_color_r * 255;
int g = g_settings.video.msg_color_g * 255;
int b = g_settings.video.msg_color_b * 255;
vid->font_rgb[0] = r < 0 ? 0 : (r > 255 ? 255 : r);
vid->font_rgb[1] = g < 0 ? 0 : (g > 255 ? 255 : g);
vid->font_rgb[2] = b < 0 ? 0 : (b > 255 ? 255 : b);
} else {
RARCH_LOG("video_lima: font init failed\n");
}
}
static void lima_render_msg(lima_video_t *vid, const char *msg) {
unsigned req_size;
limare_data_t *lima = vid->lima;
int msg_base_x = g_settings.video.msg_pos_x * lima->font_width;
int msg_base_y = (1.0 - g_settings.video.msg_pos_y) * lima->font_height;
if (vid->font == NULL) return;
/* Font texture uses RGBA4444 pixel data (2 bytes per pixel). */
req_size = lima->font_width * lima->font_height * 2;
if (lima->buffer_size < req_size) {
const unsigned aligned_size = align_common(req_size, 16);
free(lima->buffer);
lima->buffer = NULL;
posix_memalign(&lima->buffer, 16, aligned_size);
if (lima->buffer == NULL) {
RARCH_ERR("video_lima: failed to allocate buffer to render fonts\n");
return;
}
lima->buffer_size = aligned_size;
}
memset(lima->buffer, 0, req_size);
/* FIXME: Untested new font rendering code. */
const struct font_atlas *atlas = vid->font_driver->get_atlas(vid->font);
for (; msg; msg++) {
const struct font_glyph *glyph = vid->font_driver->get_glyph(vid->font, (uint8_t)*msg);
if (!glyph)
continue;
int base_x = msg_base_x + glyph->draw_offset_x;
int base_y = msg_base_y + glyph->draw_offset_y;
const int max_width = lima->font_width - base_x;
const int max_height = lima->font_height - base_y;
int glyph_width = glyph->width;
int glyph_height = glyph->height;
const uint8_t *src = atlas->buffer + glyph->atlas_offset_x + glyph->atlas_offset_y * atlas->width;
if (base_x < 0) {
src -= base_x;
glyph_width += base_x;
base_x = 0;
}
if (base_y < 0) {
src -= base_y * (int)atlas->width;
glyph_height += base_y;
base_y = 0;
}
if (max_width <= 0 || max_height <= 0) continue;
if (glyph_width > max_width) glyph_width = max_width;
if (glyph_height > max_height) glyph_height = max_height;
put_glyph_rgba4444(lima, src, vid->font_rgb,
glyph_width, glyph_height,
atlas->width, base_x, base_y);
msg_base_x += glyph->advance_x;
msg_base_y += glyph->advance_y;
}
}
static void *lima_gfx_init(const video_info_t *video, const input_driver_t **input, void **input_data) {
lima_video_t *vid = NULL;
limare_data_t *lima = NULL;
void *lima_input = NULL;
struct limare_windowsys_drm limare_config = { 0 };
vid = calloc(1, sizeof(lima_video_t));
if (!vid) return NULL;
vid->menu_alpha = 1.0f;
lima = calloc(1, sizeof(limare_data_t));
if (!lima) goto fail;
/* Request the Exynos DRM backend for rendering. */
limare_config.type = LIMARE_WINDOWSYS_DRM;
limare_config.connector_index = g_settings.video.monitor_index;
lima->state = limare_init(&limare_config);
if (!lima->state) {
RARCH_ERR("video_lima: limare initialization failed\n");
goto fail;
}
limare_buffer_clear(lima->state);
if (limare_state_setup(lima->state, g_settings.video.fullscreen_x,
g_settings.video.fullscreen_y, 0xff000000)) {
RARCH_ERR("video_lima: limare state setup failed\n");
goto fail_lima;
}
lima->screen_aspect = get_screen_aspect(lima->state);
lima->font_height = 368;
lima->font_width = align_common((unsigned)(lima->screen_aspect * (float)lima->font_height), 16);
lima->upload_format = video->rgb32 ?
LIMA_TEXEL_FORMAT_RGBA_8888 : LIMA_TEXEL_FORMAT_BGR_565;
lima->upload_bpp = video->rgb32 ? 4 : 2;
limare_enable(lima->state, GL_DEPTH_TEST);
limare_depth_func(lima->state, GL_ALWAYS);
limare_depth_mask(lima->state, GL_TRUE);
limare_enable(lima->state, GL_CULL_FACE);
limare_blend_func(lima->state, GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
if (setup_data(lima)) {
RARCH_ERR("video_lima: data setup failed\n");
goto fail_lima;
}
if (create_programs(lima)) {
RARCH_ERR("video_lima: creating shader programs failed\n");
goto fail_lima;
}
lima->textures = calloc(num_max_textures, sizeof(limare_texture_t*));
if (input && input_data) {
*input = NULL;
*input_data = NULL;
}
vid->lima = lima;
lima_init_font(vid, g_settings.video.font_path, g_settings.video.font_size);
return vid;
fail_lima:
limare_finish(lima->state);
fail:
free(lima);
free(vid);
return NULL;
}
static bool lima_gfx_frame(void *data, const void *frame,
unsigned width, unsigned height,
unsigned pitch, const char *msg) {
lima_video_t *vid;
const void *pixels;
limare_data_t *lima;
bool upload_frame = true;
vid = data;
/* Check if neither menu nor emulator framebuffer is to be displayed. */
if (!vid->menu_active && frame == NULL) return true;
lima = vid->lima;
if (frame != NULL) {
/* Handle resolution changes from the emulation core. */
if (width != vid->width || height != vid->height) {
limare_texture_t *tex;
if (width == 0 || height == 0) return true;
RARCH_LOG("video_lima: resolution was changed by core to %ux%u\n", width, height);
tex = get_texture_handle(lima, width, height, 0);
if (tex == NULL) {
pixels = make_contiguous(lima, width, height, frame, 0, pitch);
tex = add_texture(lima, width, height, pixels, 0);
if (tex == NULL) {
RARCH_ERR("video_lima: failed to allocate new texture with dimensions %ux%u\n",
width, height);
return false;
}
upload_frame = false; /* pixel data already got uploaded during texture allocation */
}
lima->cur_texture = tex;
vid->width = width;
vid->height = height;
lima->frame_aspect = (float)width / (float)height;
vid->aspect_changed = true;
}
if (upload_frame) {
pixels = make_contiguous(lima, width, height, frame, 0, pitch);
limare_texture_mipmap_upload(lima->state, lima->cur_texture->handle, 0, pixels);
}
}
if (g_settings.fps_show) {
char buffer[128], buffer_fps[128];
gfx_get_fps(buffer, sizeof(buffer), g_settings.fps_show ? buffer_fps : NULL, sizeof(buffer_fps));
msg_queue_push(g_extern.msg_queue, buffer_fps, 1, 1);
}
if (vid->aspect_changed) {
apply_aspect(lima, g_extern.system.aspect_ratio);
vid->aspect_changed = false;
}
limare_frame_new(lima->state);
if (lima->cur_texture != NULL) {
limare_program_current(lima->state, lima->program);
limare_attribute_pointer(lima->state, "in_vertex", LIMARE_ATTRIB_FLOAT,
3, 0, 4, lima->vertices);
limare_attribute_pointer(lima->state, "in_coord", LIMARE_ATTRIB_FLOAT,
2, 0, 4, lima->coords + vid->menu_rotation * 4);
limare_texture_attach(lima->state, "in_texture", lima->cur_texture->handle);
if (limare_draw_arrays(lima->state, GL_TRIANGLE_STRIP, 0, 4)) return false;
}
/* Handle font rendering. */
if (msg) {
bool upload_font = true;
/* Both font_vertices and font_color are constant, but we can't make them *
* const, since limare_attribute_pointer expects (non-const) void pointers. */
static vec3f_t font_vertices[4] = {
{-1.0f, -1.0f, 0.0f},
{ 1.0f, -1.0f, 0.0f},
{-1.0f, 1.0f, 0.0f},
{ 1.0f, 1.0f, 0.0f}
};
static float font_color[4] = {1.0f, 1.0f, 1.0f, 1.0f};
lima_render_msg(vid, msg);
if (lima->font_texture == NULL) {
lima->font_texture = add_texture(lima, lima->font_width, lima->font_height,
lima->buffer, LIMA_TEXEL_FORMAT_RGBA_4444);
upload_font = false;
}
if (upload_font)
limare_texture_mipmap_upload(lima->state, lima->font_texture->handle, 0, lima->buffer);
/* We re-use the RGBA16 menu program here. */
limare_program_current(lima->state, lima->program_menu_rgba16);
limare_attribute_pointer(lima->state, "in_vertex", LIMARE_ATTRIB_FLOAT,
3, 0, 4, font_vertices);
limare_attribute_pointer(lima->state, "in_coord", LIMARE_ATTRIB_FLOAT,
2, 0, 4, lima->coords + vid->menu_rotation * 4);
limare_texture_attach(lima->state, "in_texture", lima->font_texture->handle);
limare_uniform_attach(lima->state, "uColor", 4, font_color);
limare_enable(lima->state, GL_BLEND);
if (limare_draw_arrays(lima->state, GL_TRIANGLE_STRIP, 0, 4)) return false;
limare_disable(lima->state, GL_BLEND);
}
if (vid->menu_active && lima->cur_texture_menu != NULL) {
float color[4] = {1.0f, 1.0f, 1.0f, vid->menu_alpha};
if (vid->menu_rgb32)
limare_program_current(lima->state, lima->program_menu_rgba32);
else
limare_program_current(lima->state, lima->program_menu_rgba16);
limare_attribute_pointer(lima->state, "in_vertex", LIMARE_ATTRIB_FLOAT,
3, 0, 4, lima->vertices);
limare_attribute_pointer(lima->state, "in_coord", LIMARE_ATTRIB_FLOAT,
2, 0, 4, lima->coords + vid->menu_rotation * 4);
limare_texture_attach(lima->state, "in_texture", lima->cur_texture_menu->handle);
limare_uniform_attach(lima->state, "uColor", 4, color);
limare_enable(lima->state, GL_BLEND);
if (limare_draw_arrays(lima->state, GL_TRIANGLE_STRIP, 0, 4)) return false;
limare_disable(lima->state, GL_BLEND);
}
if (limare_frame_flush(lima->state)) return false;
limare_buffer_swap(lima->state);
g_extern.frame_count++;
#ifdef LIMA_GFX_DEBUG
print_status(lima);
#endif
return true;
}
static void lima_gfx_set_nonblock_state(void *data, bool state) {
(void)data; /* limare doesn't export vsync control yet */
(void)state;
}
static bool lima_gfx_alive(void *data) {
(void)data;
return true; /* always alive */
}
static bool lima_gfx_focus(void *data) {
(void)data;
return true; /* limare doesn't use windowing, so we always have focus */
}
static void lima_gfx_set_rotation(void *data, unsigned rotation) {
lima_video_t *vid = data;
vid->menu_rotation = rotation;
}
static void lima_gfx_viewport_info(void *data, struct rarch_viewport *vp) {
lima_video_t *vid = data;
vp->x = vp->y = 0;
vp->width = vp->full_width = vid->width;
vp->height = vp->full_height = vid->height;
}
static void lima_set_aspect_ratio(void *data, unsigned aspect_ratio_idx) {
lima_video_t *vid = data;
switch (aspect_ratio_idx) {
case ASPECT_RATIO_SQUARE:
gfx_set_square_pixel_viewport(g_extern.system.av_info.geometry.base_width, g_extern.system.av_info.geometry.base_height);
break;
case ASPECT_RATIO_CORE:
gfx_set_core_viewport();
break;
case ASPECT_RATIO_CONFIG:
gfx_set_config_viewport();
break;
default:
break;
}
g_extern.system.aspect_ratio = aspectratio_lut[aspect_ratio_idx].value;
vid->aspect_changed = true;
}
static void lima_apply_state_changes(void *data) {
(void)data;
}
static void lima_set_texture_frame(void *data, const void *frame, bool rgb32,
unsigned width, unsigned height, float alpha) {
lima_video_t *vid = data;
limare_texture_t* tex;
const unsigned format = rgb32 ? LIMA_TEXEL_FORMAT_RGBA_8888 :
LIMA_TEXEL_FORMAT_RGBA_4444;
vid->menu_rgb32 = rgb32;
vid->menu_alpha = alpha;
tex = vid->lima->cur_texture_menu;
/* Current menu doesn't change dimensions, so we should hit this most of the time. */
if (tex != NULL && tex->width == width &&
tex->height == height && tex->format == format) goto upload;
if (tex == NULL) {
tex = get_texture_handle(vid->lima, width, height, format);
if (tex == NULL) {
tex = add_texture(vid->lima, width, height, frame, format);
if (tex != NULL) {
vid->lima->cur_texture_menu = tex;
goto upload;
}
RARCH_ERR("video_lima: failed to allocate new menu texture with dimensions %ux%u\n",
width, height);
}
}
return;
upload:
limare_texture_mipmap_upload(vid->lima->state, tex->handle, 0, frame);
}
static void lima_set_texture_enable(void *data, bool state, bool full_screen) {
lima_video_t *vid = data;
vid->menu_active = state;
}
static void lima_set_osd_msg(void *data, const char *msg, const struct font_params *params) {
(void)data;
/* TODO: what does this do? */
(void)msg;
(void)params;
}
static void lima_show_mouse(void *data, bool state) {
(void)data;
}
static const video_poke_interface_t lima_poke_interface = {
NULL, /* set_filtering */
#ifdef HAVE_FBO
NULL, /* get_current_framebuffer */
NULL, /* get_proc_address */
#endif
lima_set_aspect_ratio,
lima_apply_state_changes,
#ifdef HAVE_MENU
lima_set_texture_frame,
lima_set_texture_enable,
#endif
lima_set_osd_msg,
lima_show_mouse
};
static void lima_gfx_get_poke_interface(void *data, const video_poke_interface_t **iface) {
(void)data;
*iface = &lima_poke_interface;
}
const video_driver_t video_lima = {
lima_gfx_init,
lima_gfx_frame,
lima_gfx_set_nonblock_state,
lima_gfx_alive,
lima_gfx_focus,
NULL, /* set_shader */
lima_gfx_free,
"lima",
lima_gfx_set_rotation,
lima_gfx_viewport_info,
NULL, /* read_viewport */
#ifdef HAVE_OVERLAY
NULL, /* overlay_interface */
#endif
lima_gfx_get_poke_interface
};

139
memcpy-neon.S Normal file
View File

@ -0,0 +1,139 @@
/*
* NEON code contributed by Siarhei Siamashka <siarhei.siamashka@nokia.com>.
* Origin: http://sourceware.org/ml/libc-ports/2009-07/msg00003.html
*
* The GNU C Library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public License.
*
* Tweaked for Android by Jim Huang <jserv@0xlab.org>
*/
.arm
.fpu neon
.global memcpy_neon
/*
* ENABLE_UNALIGNED_MEM_ACCESSES macro can be defined to permit the use
* of unaligned load/store memory accesses supported since ARMv6. This
* will further improve performance, but can purely theoretically cause
* problems if somebody decides to set SCTLR.A bit in the OS kernel
* (to trap each unaligned memory access) or somehow mess with strongly
* ordered/device memory.
*/
#define ENABLE_UNALIGNED_MEM_ACCESSES 1
#define NEON_MAX_PREFETCH_DISTANCE 320
.align 4
memcpy_neon:
.fnstart
mov ip, r0
cmp r2, #16
blt 4f @ Have less than 16 bytes to copy
@ First ensure 16 byte alignment for the destination buffer
tst r0, #0xF
beq 2f
tst r0, #1
ldrneb r3, [r1], #1
strneb r3, [ip], #1
subne r2, r2, #1
tst ip, #2
#ifdef ENABLE_UNALIGNED_MEM_ACCESSES
ldrneh r3, [r1], #2
strneh r3, [ip], #2
#else
ldrneb r3, [r1], #1
strneb r3, [ip], #1
ldrneb r3, [r1], #1
strneb r3, [ip], #1
#endif
subne r2, r2, #2
tst ip, #4
beq 1f
vld4.8 {d0[0], d1[0], d2[0], d3[0]}, [r1]!
vst4.8 {d0[0], d1[0], d2[0], d3[0]}, [ip, :32]!
sub r2, r2, #4
1:
tst ip, #8
beq 2f
vld1.8 {d0}, [r1]!
vst1.8 {d0}, [ip, :64]!
sub r2, r2, #8
2:
subs r2, r2, #32
blt 3f
mov r3, #32
@ Main copy loop, 32 bytes are processed per iteration.
@ ARM instructions are used for doing fine-grained prefetch,
@ increasing prefetch distance progressively up to
@ NEON_MAX_PREFETCH_DISTANCE at runtime
1:
vld1.8 {d0-d3}, [r1]!
cmp r3, #(NEON_MAX_PREFETCH_DISTANCE - 32)
pld [r1, r3]
addle r3, r3, #32
vst1.8 {d0-d3}, [ip, :128]!
sub r2, r2, #32
cmp r2, r3
bge 1b
cmp r2, #0
blt 3f
1: @ Copy the remaining part of the buffer (already prefetched)
vld1.8 {d0-d3}, [r1]!
subs r2, r2, #32
vst1.8 {d0-d3}, [ip, :128]!
bge 1b
3: @ Copy up to 31 remaining bytes
tst r2, #16
beq 4f
vld1.8 {d0, d1}, [r1]!
vst1.8 {d0, d1}, [ip, :128]!
4:
@ Use ARM instructions exclusively for the final trailing part
@ not fully fitting into full 16 byte aligned block in order
@ to avoid "ARM store after NEON store" hazard. Also NEON
@ pipeline will be (mostly) flushed by the time when the
@ control returns to the caller, making the use of NEON mostly
@ transparent (and avoiding hazards in the caller code)
#ifdef ENABLE_UNALIGNED_MEM_ACCESSES
movs r3, r2, lsl #29
ldrcs r3, [r1], #4
strcs r3, [ip], #4
ldrcs r3, [r1], #4
strcs r3, [ip], #4
ldrmi r3, [r1], #4
strmi r3, [ip], #4
movs r2, r2, lsl #31
ldrcsh r3, [r1], #2
strcsh r3, [ip], #2
ldrmib r3, [r1], #1
strmib r3, [ip], #1
#else
movs r3, r2, lsl #29
bcc 1f
.rept 8
ldrcsb r3, [r1], #1
strcsb r3, [ip], #1
.endr
1:
bpl 1f
.rept 4
ldrmib r3, [r1], #1
strmib r3, [ip], #1
.endr
1:
movs r2, r2, lsl #31
ldrcsb r3, [r1], #1
strcsb r3, [ip], #1
ldrcsb r3, [r1], #1
strcsb r3, [ip], #1
ldrmib r3, [r1], #1
strmib r3, [ip], #1
#endif
bx lr
.fnend

View File

@ -85,6 +85,11 @@ if [ "$HAVE_EGL" != "no" ]; then
fi
fi
if [ "$HAVE_EXYNOS" != "no" ]; then
check_pkgconf EXYNOS libdrm_exynos
check_pkgconf DRM libdrm
fi
if [ "$LIBRETRO" ]; then
echo "Explicit libretro used, disabling dynamic libretro loading ..."
HAVE_DYNAMIC='no'
@ -169,11 +174,6 @@ fi
check_pkgconf ZLIB zlib
if [ "$HAVE_LIMA" = "yes" ]; then
check_lib LIMA -llimare limare_init
LIMA_LIBS="-llimare"
fi
if [ "$HAVE_THREADS" != 'no' ]; then
if [ "$HAVE_FFMPEG" != 'no' ]; then
check_pkgconf AVCODEC libavcodec 54
@ -276,6 +276,6 @@ add_define_make OS "$OS"
# Creates config.mk and config.h.
add_define_make GLOBAL_CONFIG_DIR "$GLOBAL_CONFIG_DIR"
VARS="RGUI LAKKA ALSA OSS OSS_BSD OSS_LIB AL RSOUND ROAR JACK COREAUDIO PULSE SDL OPENGL LIMA OMAP GLES GLES3 VG EGL KMS GBM DRM DYLIB GETOPT_LONG THREADS CG LIBXML2 ZLIB DYNAMIC FFMPEG AVCODEC AVFORMAT AVUTIL SWSCALE FREETYPE XKBCOMMON XVIDEO X11 XEXT XF86VM XINERAMA MALI_FBDEV NETPLAY NETWORK_CMD STDIN_CMD COMMAND SOCKET_LEGACY FBO STRL STRCASESTR MMAP PYTHON FFMPEG_ALLOC_CONTEXT3 FFMPEG_AVCODEC_OPEN2 FFMPEG_AVIO_OPEN FFMPEG_AVFORMAT_WRITE_HEADER FFMPEG_AVFORMAT_NEW_STREAM FFMPEG_AVCODEC_ENCODE_AUDIO2 FFMPEG_AVCODEC_ENCODE_VIDEO2 BSV_MOVIE VIDEOCORE NEON FLOATHARD FLOATSOFTFP UDEV V4L2 AV_CHANNEL_LAYOUT"
VARS="RGUI LAKKA ALSA OSS OSS_BSD OSS_LIB AL RSOUND ROAR JACK COREAUDIO PULSE SDL OPENGL OMAP GLES GLES3 VG EGL KMS EXYNOS GBM DRM DYLIB GETOPT_LONG THREADS CG LIBXML2 ZLIB DYNAMIC FFMPEG AVCODEC AVFORMAT AVUTIL SWSCALE FREETYPE XKBCOMMON XVIDEO X11 XEXT XF86VM XINERAMA MALI_FBDEV NETPLAY NETWORK_CMD STDIN_CMD COMMAND SOCKET_LEGACY FBO STRL STRCASESTR MMAP PYTHON FFMPEG_ALLOC_CONTEXT3 FFMPEG_AVCODEC_OPEN2 FFMPEG_AVIO_OPEN FFMPEG_AVFORMAT_WRITE_HEADER FFMPEG_AVFORMAT_NEW_STREAM FFMPEG_AVCODEC_ENCODE_AUDIO2 FFMPEG_AVCODEC_ENCODE_VIDEO2 BSV_MOVIE VIDEOCORE NEON FLOATHARD FLOATSOFTFP UDEV V4L2 AV_CHANNEL_LAYOUT"
create_config_make config.mk $VARS
create_config_header config.h $VARS

View File

@ -16,10 +16,10 @@ HAVE_GLES=no # Use GLESv2 instead of desktop GL
HAVE_MALI_FBDEV=no # Enable Mali fbdev context support
HAVE_GLES3=no # Enable OpenGLES3 support
HAVE_X11=auto # Disable everything X11.
HAVE_LIMA=no # Enable Lima video support
HAVE_OMAP=no # Enable OMAP video support
HAVE_XINERAMA=auto # Disable Xinerama support.
HAVE_KMS=auto # Enable KMS context support
HAVE_EXYNOS=no # Enable Exynos video support
HAVE_EGL=auto # Enable EGL context support
HAVE_VG=auto # Enable OpenVG support
HAVE_CG=auto # Enable Cg shader support

View File

@ -109,6 +109,8 @@ const char *config_get_default_video(void)
return "null";
case VIDEO_OMAP:
return "omap";
case VIDEO_EXYNOS:
return "exynos";
default:
return NULL;
}