some docs

2024-11-26 19:50:36 +00:00 · 2019-03-29 19:51:44 +05:00 · 2019-03-29 19:51:44 +05:00 · 418a5b503d
commit 418a5b503d
parent e0d8457730
14 changed files with 425 additions and 3 deletions
--- a/Clients/AppveyorClient/readme.md
+++ b/Clients/AppveyorClient/readme.md
@ -0,0 +1,12 @@
+AppVeyor Client
+===============
+
+[AppVeyor API documentation](https://www.appveyor.com/docs/api/) for reference. There are some inaccuracies and legacy quirks though. As we use the API for read-only queries, on a publicly available data, we don't need any form of authentication. `User-Agent` header is provided as a courtesy.
+
+What AppVeyor client is used for, is to get the direct download links for PR builds. For that we need to get the job artifacts. And for that we need job id, which can only be associated with the corresponding GitHub PR through the build info, which we can find in the build history. Phew, easy.
+
+`FindBuildAsync` is a general history search function that can be used to search through the history until some criteria is met. We use it at startup with the bogus predicate to get and cache the whole history to map job IDs to their build info, which is used for quick lookup when we want to show PR download for direct links.
+
+`GetMasterBuildAsync` is mainly used to get the build time, as [CompatApi client](../CompatApiClient) provides merge time instead of the build time currently.
+
+`GetPrDownloadAsync` that accepts `githubStatusTargetUrl` uses `string.Replace()` instead of constructing the url manually because of the legacy quirks. AppVeyor has changed the link format at some point, and there's not backwards compatibility there, so old direct links work only with the old link format.
--- a/Clients/CompatApiClient/readme.md
+++ b/Clients/CompatApiClient/readme.md
@ -0,0 +1,29 @@
+Compatibility API Client
+========================
+
+There is no documentation, but the [source code is available](https://github.com/AniLeo/rpcs3-compatibility).
+
+This project also contains all of the Web API infrastructure to facilitate the automatic serialization/deserialization of data.
+
+Some terminology:
+* `POCO` - plain old C# object, is a barebones class with fields/properties only, that is used for automatic [de]serialization.
+
+General advise on web client implementation and usage:
+* Do use `HttpClientFactory.Create()` instead of `new HttpClient()`, as every instance will reserve an outgoing port number, and factory keeps a pool.
+* Do reuse the same client instance whenever possible, it's thread-safe and there's no reason not to keep a single copy of it.
+
+[Compression](Compression/) contains handler implementation that provides support for transparent http request compression (`Content-Encoding` header), and implements standard gzip/deflate types.
+
+[Formatters](Formatters/) contain JSON contract resolver that handles popular naming conventions for [de]serialization (`dashed-style`, `underscore_style`, and `PascalStyle`).
+
+[Utils](Utils/) have some handy `Uri` extension methods for easy query parameters manipulation.
+
+Game Compatibility Status
+-------------------------
+
+Does game status lookup by `product code`, `game title` (English or using romaji for Japanese titles), or `game title abbreviation`. We use this for most embeds, including log parsing results, standalone game information embeds, compatibility lists, etc.
+
+RPCS3 Update Information
+------------------------
+
+Accepts current build commit hash as an argument. Provides information about the build requested, as well as information about the latest build available.
--- a/Clients/GithubClient/readme.md
+++ b/Clients/GithubClient/readme.md
@ -0,0 +1,8 @@
+GitHub Client
+=============
+
+[GitHub API documentation](https://developer.github.com/v3/) for reference. Anonymous API calls require `User-Agent` header, everything else is optional. Anonymous access is limited to 60 requests per hour, matched by client IP.
+
+We only use GitHub API to get PR information, and optionally, links to CIs. CI status information is unreliable though, as it's often outdated and the history is often inconsistent, so we prefer to find matching builds manually instead.
+
+As anonymous access is very limited, we try to cache every response. In the same vein, we try to limit GitHub API usage in general.
--- a/Clients/IrdLibraryClient/readme.md
+++ b/Clients/IrdLibraryClient/readme.md
@ -0,0 +1,6 @@
+IRD Library Client
+==================
+
+There's no official API or any documentation, so everything is reverse-engineered from the web UI. It's all rather straight-forward.
+
+We use this to search and download IRD files for various purposes. One note though: we cache IRD files on-disk to limit the download requests.
--- a/Clients/PsnClient/readme.md
+++ b/Clients/PsnClient/readme.md
@ -0,0 +1,55 @@
+PSN Client
+==========
+
+For obvious reasons, there's no official documentation on any Sony APIs. Everything was reverse-engineered using web store UI or from various wikis/forums.
+
+PSN Store API
+-------------
+
+You can access web store at https://store.playstation.com/, which is working on top of their store API, which is convenient.
+
+General workflow is as follows:
+1. Get session (even for anonymous access)
+2. Get storefront information
+3. Call various controllers to search or to get additional item information by its ID
+
+Some terminology:
+* [product code](http://www.psdevwiki.com/ps3/Productcode) is the game ID, of the form `NPEB12345`.
+* [content id](http://www.psdevwiki.com/ps3/Content_ID) is the unique PSN content ID that can be used to resolve its metadata. There's no straight way to map `product code` to any associated `content id`.
+* `container id` is the PSN content aggregation ID that is used to organize the content (i.e. store navigation category like a menu entry, or sale event). Container can include other containers in it.
+* `entitlement` is the content license granted to the account. You get this by purchasing the content, downloading free content, or by redeeming a PSN code.
+
+At the startup we run a task that enumerates all available PSN storefronts, and then recursively scrapes every container on respective front page to collect all available `content id`s for any PS3 content that is still available.
+
+Many de-listed or replaced titles are no longer available through anonymous API calls (they require authenticated session with respective `entitlement`s given to the account).
+
+There are rare cases where resolving metadata by `content id` still works, but there are no links for it anywhere on the store. You can still find such content using the search API.
+
+Game Update API
+---------------
+
+This is a [separate API](https://www.psdevwiki.com/ps3/Online_Connections#Game_Updating_Procedure) that can give title update information by `product code`.
+
+One quirk of this endpoint is that Sony uses non-public root CAs for TLS certificates, only redistributing their public keys in the PS3 firmware updates.
+
+In dotnet core there's no easy way to implement custom certificate pinning / chain validation.
+
+There are two possible ways:
+1. Importing root CA certificates to the Trusted Root CAs certificate store, so the default validation can work as expected.
+
+   This, however, only works on Windows, _and_ will show a confirmation prompt for every certificate being imported.
+   
+   On Linux it's much worse and require black magic to work, and is inconsistent between different distros (google `SSL_CERT_DIR` and `SSL_CERT_FILE`). The main problem is that this _overrides_ the system/user certificate store to contain _only_ the certificates specified, so any request to any other resource will fail.
+
+2. Manual certificate chain validation. As one might expect, this is not trivial or easy to implement.
+
+   What we do right now, is to check certificate Issuer on every request, and if it matches the custom Sony CA, we do manual chain validation, and then cache the result for this specific server certificate. We explicitly ignore any revocation checks (as CAs are not public) and also ignore any errors due to untrusted root (as, again, CAs are not public).
+   
+   Otherwise we simply forward the validation call to the default handler that is using proper system/user certificate store.
+
+Title Metadata API
+------------------
+
+[TMDB API](https://www.psdevwiki.com/ps3/Keys#TMDB_Key) is used by the PS3 dashboard / shell / UI to show some game information using `product code`. We mainly use this to get the game thumbnail for embeds, falling back to PSN metadata when it's not available.
+
+The main quirk here is how the URL is constructed using a specific HMAC key and ID format.
--- a/Clients/readme.md
+++ b/Clients/readme.md
@ -0,0 +1,18 @@
+Clients
+=======
+
+Here we keep all the 3rd party service clients used by the bot. Most infrastructure is in the CompatApi client, and other clients reference it to use these classes, along with the configured `Log`.
+
+* CompatApi is the [custom API](https://github.com/AniLeo/rpcs3-compatibility) provided by the [RPCS3 website](https://rpcs3.net/). It provides information about game compatibility and RPCS3 updates.
+
+* [IRD Library](http://jonnysp.bplaced.net/) contains the largest public repository of [IRD files](http://www.psdevwiki.com/ps3/Bluray_disc#IRD_file). It has no official API, so everything is reverse-engineered from the website web UI.
+
+  > Client implements automatic caching of the downloaded IRD files on the local filesystem for future uses.
+
+* PSN Client is a result of reverse-engineering the JSON API of the [Playstation Store](https://store.playstation.com/). Currently it implements resolving metadata content by its ID, as well as full-text search.
+
+* GitHub Client implements a barebone [set of requests](https://developer.github.com/v3/) to resolve pull-request information, along with some additional data about the CI states.
+
+  > We do not use any form of authentication, and are limited by the regular rate of 60 API requests per hour.
+
+* AppVeyor Client implements most of the [read-only calls](https://www.appveyor.com/docs/api/) to read the build history, job status, and artifact information.
--- a/CompatBot/EventHandlers/readme.md
+++ b/CompatBot/EventHandlers/readme.md
@ -0,0 +1,162 @@
+Event Handlers
+==============
+
+Some points to keep in mind:
+* Handlers for the same event run one after another in the order they have been subscribed.
+* Any handler can mark event as handled, which will cut the chain, so the subsequent handlers won't be invoked.
+* Even though events are asynchronous, they still run on the same thread pool and the same task queue, which means they should return fast. Any prolonged task will stall the queue and will effectively cause denial of service.
+  > If you require to run some heavy stuff on event, queue explicit background processing and return.
+* _Every_ event triggers the handler, including the actions made by the bot itself. Remember to do proper checks for commands and bot's own actions to prevent loops / unintended spam messages / etc.
+
+Antipiracy Monitor
+------------------
+
+Should be first in event queue. Checks every message for a possible piracy trigger, and removes it to prevent breaking Discord ToS / legal issues.
+
+For specifics of the filter, see [PiracyStringProvider](../Database/Providers/) implementation.
+
+AppVeyor Links
+--------------
+
+This handler checks messages for AppVeyor links and provides information about the associated PR build when available.
+
+Bot Reactions
+-------------
+
+This is a fun silly feature to make bot react in some way to the user message. The idea behind it is to create random matching `reaction` or to send a matching random `message` when reactions are not permitted.
+
+Triggers are a simple substring lookup, with no regards to word boundaries. Usually there's nothing wrong to react to seemingly random messages.
+
+To reduce spam, some triggers have explicit checks that the message was addressed to bot. Also there's a mechanism to verbally mute the bot spam for a while that is also used for some other event handlers / commands. It is usually used when bot does something wrong or is being annoying in conversation.
+
+Discord Invite Filter
+---------------------
+
+Monitors invite links to other servers, and checks them against the white list. As invite codes are random, we try to resolve the `guild` and match its id.
+
+We also try to resolve discord.me links, as it's a popular 3rd party service, but it doesn't provide any API, and is often locked out by CloudFlare, so most of the time we simple block all their links.
+
+For any link that is not white listed, we remove the message and verbally warn the user to ask mods first.
+
+There is a fun feature where we detect an attempts of filter circumvention by posting the invite code alone, which is normally indistinguishable from normal text. It is done by caching the recent invite codes and doing explicit substring matches, which works ok in practice. Also giving a `warn` record for such an attempt, as it's very impolite to ignore friendly request to follow the rules.
+
+GitHub Links
+------------
+
+Similar to AppVeyor links monitor, we're looking for possible GitHub issues or PR mentions, and link them for convenience.
+
+As we currently do not authorize GitHub API client, to reduce its use, we simply construct the link and rely on default embed generator instead of doing something custom.
+
+Every PR has a hidden issue associated with it, which automatically redirects to the appropriate PR page, so we always link to issues.
+
+In practice, even with 60 requests per hour, we can do custom embeds, and only fallback to URL generation if we hit some kind of threshold on available calls.
+
+Greeter
+-------
+
+A simple handler that sends a DM to every new `member`. For greater flexibility, the message is formed from the `motd` [explain](../Commands).
+
+Is the Game Playable
+--------------------
+
+This is part fun, part hopeless attempt to do natural language processing with regular expressions.
+
+The idea here is to answer the regular "Is the game X playable yet?" questions. Mainly new users who do not know about the bot, or the compatibility list.
+
+The challenge here, of course, is that every user is different, and many do not speak proper English to begin with. We can't extract the intent without any form of AI training, so instead we have a ginormous regex to match most common forms of questions.
+
+To reduce false positive hits, we only do game lookups in two channels (main and help), for users without any role that would imply basic bot knowledge, and we only show the result if the fuzzy matching score of result has a high confidence score.
+
+Log as Text
+-----------
+
+This simple monitor looks for logs copy/pasted from UI, and prompts user to upload the full log file instead.
+
+Log Parsing
+-----------
+
+The meat and potatoes of this bot. Looks up for potential RPCS3 logs, and queues a background [log analysis](LogParsing/) job.
+
+Log analysis queue is limited by utilizing a [Semaphore](https://docs.microsoft.com/en-us/dotnet/standard/threading/semaphore-and-semaphoreslim) and making an `async void` function call. One must be very careful with it, but you get the desired effect on the cheap.
+
+New Builds Monitor
+------------------
+
+This one has a background task implementation that is continuously checking for new RPCS3 builds through [Compatibility API](../Clients/CompatApiClient/).
+
+The main idea here is to do the check once in a while, mostly for the time when something goes wrong and we can't detect the new build trigger (Yappy / Discord are down).
+
+The event handler comes in handy to check for the new successful build announcements in [#github](https://discordapp.com/channels/272035812277878785/272363592077017098) channel. If we see such a message, there's a good chance we'll get a new update information _soon_, so we _speed up_ the new updates check from once per however minutes/hours to once per few seconds/minutes.
+
+Once we detect a new build, or if nothing happened after a while, we reset the check interval to default.
+
+Post Log Help
+-------------
+
+This is another fun, but less complex attempt at natural language processing. The idea is to send instructions on how to upload full RPCS3 log in the [#help](https://discordapp.com/channels/272035812277878785/277227681836302338) channel.
+
+For greater flexibility, it tries to use the `log` [explain](../Commands/).
+
+Product Code Lookup
+-------------------
+
+This handler monitors for the `product code` mentions, and posts game compatibility embeds for them.
+
+We limit it to 5 unique codes in public channels, and to a greater number in DMs to avoid Discord API throttling and general possibility of bot spam.
+
+We also do `shut up` checks to reduce spam (see [Bot Reactions](#bot-reactions) for more info).
+
+Starbucks
+---------
+
+This handler is a [#media](https://discordapp.com/channels/272035812277878785/272875751773306881) moderation handler to allow users help with the no-chatting rule enforcement.
+
+People can react with the ☕ emoji, and if a certain threshold is met, a notice is generated to the moderation queue.
+
+To prevent abuse, only users with certain roles are counted for this.
+
+Table Flip Monitor
+------------------
+
+This is a pure fun handler that does nothing useful. It looks for the table flip [kaomoji](http://japaneseemoticons.me/) and sends a message with the matching reversed one.
+
+For more fun, it is using pattern matching instead of hard coded samples, so in theory, it can find and generate appropriate response for any variation.
+
+Thumbnail Cache Monitor
+-----------------------
+
+This is a management handler that is clearing the re-uploaded [thumbnail](../Database/Providers/) url for embeds when someone deletes said image from the appropriate channel.
+
+Unknown Command
+---------------
+
+This handler is used when the user issues an unknown command. This happens _a lot_. Most users _do not understand_ how to use the bot.
+
+So we try to guess what the intention was. We handle two most used cases: [explain](../Commands) and [compat](../Commands/) lookups.
+
+To reduce spam and false positives, we only redirect the calls to the appropriate commands if we have high confidence score for fuzzy matched results.
+
+If everything else fails, we show the help message that explains how to properly use the bot. And to reduce the spam further, we DM the instructions most of the time, to keep public channels clean.
+
+There's also a fun part where you can mention the bot and ask the question (denoted by the question mark at the end of the message), in whih case we redirect the query tothe [8ball](../Commands/) command.
+
+Username Spoof Monitor
+----------------------
+
+This was created after one accident when some user tried to impersonate a developer and asked for money from users in DMs.
+
+To prevent such events in the future, we monitor every `username` and `nickname` change, and also check every new `member`.
+
+For better results, we also employ [homoglyph](../../HomoglyphConverter/) disambiguation.
+
+Username Zalgo Monitor
+----------------------
+
+In the same vein, this monitor checks every _display_ user name for [zalgo](https://knowyourmeme.com/memes/zalgo) and other Unicode abuses where the text can creep up on adjacent lines above or below.
+
+As there's no sure way to check if some symbol will be drawn above or below the base line, and how it reacts with stacking, we do a simple check:
+1. [Normalize](https://docs.microsoft.com/en-us/dotnet/api/system.string.normalize?view=netcore-2.1) the string to get rid of regular diacritics.
+2. Iterate through the symbols and check their Unicode category (in a special way, because UTF-16 can't handle higher planes without [surrogate pairs](https://en.wikipedia.org/wiki/UTF-16#U+010000_to_U+10FFFF)).
+3. Ignore visually invisible symbols, count the number of combining characters.
+4. If there are more than 2 visually consecutive combining characters, it's a good indication of Unicode abuse.
+5. On top of that, we check with a list of known normal characters that are often rendered above or below the base line.
--- a/CompatBot/readme.md
+++ b/CompatBot/readme.md
@ -0,0 +1,50 @@
+RPCS3 Compatibility Bot
+=======================
+
+Configuration
+-------------
+
+Currently all configurable tunables are stored in the [Config](Config.cs) static class that is initialized once on startup. This is mostly done like this because it's very easy to implement and is enough for current needs.
+
+Some settings are grouped in additional static classes for easier use (e.g. `Reactions` and `Colors`).
+
+Everything is initialized with default values that correspond to the main bot instance (except for sensitive tokens), and require overriding through configuration for any test instance.
+
+Currently, configuration is possible through the `$ dotnet user-secrets` command. Configuration through the environment variables was disabled as it had some unintended consequences between bot restarts (preserved values; require complete manual shutdown and restart to update configuration).
+
+> Be careful with input validation during configuration, as unhandled exceptions in static constructor will lead to `TypeInitializationException` and program termination that might be tricky to debug.
+
+In addition to the configuration variables, `Config` contains the `Log` instance and the global `CancellationTokenSource`.
+
+We're using [NLog](https://nlog-project.org/) for logging, configured to mimic the default Log4Net layout (mostly because I already have the [syntax highlighter](https://github.com/13xforever/kontur-logs) for Sublime Text 3 for it). It is also configured to ignore `TaskCancelledException`s that occur when `CancellationToken`s are being cancelled to reduce spam in logs.
+
+> Do log exceptions as an argument to log methods, instead of calling `.ToString()` on them
+
+Global `CancellationTokenSource` (`Config.Cts`) is used to signal the program shutdown.
+
+> Do check it and abort whenever possible to reduce the restart time and prevent infinite code execution.
+
+
+
+Program entry point
+-------------------
+
+On startup we check for other instances. We wait a bit for their shutdown, or shutdown ourselves otherwise. This is done through spinning a separate thread and using global [mutex](https://docs.microsoft.com/en-us/dotnet/api/system.threading.mutex?view=netcore-2.1). Unlike semaphores, mutex **must** be released in the same thread it was acquired, which is impossible to do in asynchronous code, thus the dedicated thread.
+
+> Note that `Config` initialization will happen on first call to the class.
+
+Next we open the databases and run [migrations](https://docs.microsoft.com/en-us/ef/core/managing-schemas/migrations/) to upgrade their structure if needed. Currently we have two of them:
+* BotDb is used to store all the settings and custom data for the bot.
+* ThumbsDb is used to store PSN metadata and game thumbnail links.
+
+When databases are ready, we immediately restore bot [runtime statistics](Databese/Providers/).
+
+Next we start all the background tasks that will run periodically while the bot is up and running. This includes [thumbnail scraping](ThumbScraper/), AppVeyor build history scraper, AMD Driver version updater, etc.
+
+Next we configure the discord client. This includes registering all the [commands](Commands/) and [event handlers](EventHandlers/). We use the built-in help formatter and hook up the built-in discord client logging to our own logger.
+
+Of particular note is the `GuildAvailable` event, where we check and make sure the bot is running in the configured guild. There was one time when bot wasn't configured properly and someone quietly added it to their own private server, which caused crash on startup. That was fun.
+
+We also fun backlog checks where it makes sense for moderation, in case something slipped past while the bot was unavailable.
+
+On restart we try to gracefully wait for a bit to let any outstanding task to complete, but not too long. This is why it is important to check `Config.Cts` cancellation status whenever possible.
--- a/HomoglyphConverter/confusables.txt
+++ b/HomoglyphConverter/confusables.txt
@ -1,11 +1,11 @@
 # confusables.txt
-# Date: 2018-05-25, 00:12:52 GMT
+# Date: 2018-11-05, 07:39:47 GMT
 # © 2018 Unicode®, Inc.
 # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
 # For terms of use, see http://www.unicode.org/terms_of_use.html
 #
 # Unicode Security Mechanisms for UTS #39
-# Version: 11.0.0
+# Version: 12.0.0
 #
 # For documentation and usage, see http://www.unicode.org/reports/tr39
 #
--- a/HomoglyphConverter/confusables.txt.gz
+++ b/HomoglyphConverter/confusables.txt.gz
--- a/HomoglyphConverter/readme.md
+++ b/HomoglyphConverter/readme.md
@ -0,0 +1,13 @@
+Homoglyph Converter
+===================
+
+This is a straight up implementation of the recommended [confusable detection algorithm](http://www.unicode.org/reports/tr39/#Confusable_Detection). It is mainly used to check for mod impersonation.
+
+You can get the latest version of the mappings from the [Unicode.org](http://www.unicode.org/Public/security/latest/confusables.txt). You'll need to manually gzip it for embedding in the resources.
+
+Code is split in two parts:
+* Builder will load the mapping file from the resources and will build the mapping dictionary that can be used to quickly substitute the character sequences.
+
+  > One gotcha is that a lot of the characters are from the extended planes and require use of [surrogate pairs](https://en.wikipedia.org/wiki/UTF-16#U+010000_to_U+10FFFF), so we convert them to UTF32 and store as `uint`.
+
+* Normalizer implements the mapping and reducing steps of the algorithm
--- a/README.md
+++ b/README.md
@ -3,6 +3,8 @@ RPCS3 Compatibility Bot

 This is a tech support / moderation / crowd entertainment bot for the [RPCS3 discord server](https://discord.me/rpcs3) [![RPCS3 discord server](https://discordapp.com/api/guilds/272035812277878785/widget.png)](https://discord.me/rpcs3)

+You can read the design and implementation notes by visiting the folders in the web interface, or from the [architecture overview notes](architecture.md).
+
 Development Requirements
 ------------------------
 * [.NET Core 2.1 SDK](https://www.microsoft.com/net/download/windows) or newer
@ -19,7 +21,7 @@ Runtime Requirements
 * Optionally Google API credentials to access Google Drive:
  * Create new project in the [Google Cloud Resource Manager](https://console.developers.google.com/cloud-resource-manager)
  * Select the project and enable [Google Drive API](https://console.developers.google.com/apis/library/drive.googleapis.com)
-  * Open [API & Services Credendials](https://console.developers.google.com/apis/credentials)
+  * Open [API & Services Credentials](https://console.developers.google.com/apis/credentials)
  * Create new credentials:
    * **Service account** credentials
    * New service account
@ -53,3 +55,4 @@ How to Run in Production
 External resources that need manual updates
 -------------------------------------------
 * [Unicode confusables](http://www.unicode.org/Public/security/latest/confusables.txt) gzipped, for Homoglyph checks
+* Optionally [Redump disc key database](http://redump.org/downloads/) in text format (requires membership)
--- a/Tests/readme.md
+++ b/Tests/readme.md
@ -0,0 +1,13 @@
+Tests
+=====
+
+I am using [NUnit](https://github.com/nunit/docs/wiki/NUnit-Documentation), mostly because I'm most familiar with this test framework. There's not a lot of tests for the code itself, it is mostly used for testing things out before implementation.
+
+You can use the regular `$ dotnet test` command to run the tests without any additional tools.
+
+If you want to contribute new test code, I have a couple of preferences:
+* Do use `Assert.That(expr, Is/Does/etc)` format instead of deprecated `Assert.AreEqual()` and similar.
+
+* Try to write the code in the way that does not require the use of `InternalsVisibleTo` attribute.
+
+* Tests that require any external data that must be manually supplied, should be disabled by default.
--- a/architecture.md
+++ b/architecture.md
@ -0,0 +1,53 @@
+Project file structure
+======================
+
+* [CompatBot](CompatBot/) contains the main bot logic, including all the commands and event handlers.
+* [Clients](Clients/) contains implementation of various 3rd party service clients with their respective object models.
+* [HomoglyphConverter](HomoglyphConverter/) is a library that implements Unicode text canonicalization and [homoglyph](https://en.wikipedia.org/wiki/Homoglyph) text comparison.
+* [Tests](Tests/) contains miscellaneous tests and is useful to try out things in general.
+
+High-level code structure overview
+==================================
+
+This version of the bot targets [dotnet core](https://docs.microsoft.com/en-us/dotnet/core/) 2.1+, using [DSharp+](https://dsharpplus.github.io/api/index.html) 4.0 discord client library. For settings and state persistance we use SQLite database engine accessed through [Entity Framework Core](https://docs.microsoft.com/en-us/ef/core/index).
+
+Historically speaking, this is a version 2.0 of the bot. From [the beginning](https://github.com/RPCS3/discord-bot/tree/python) it was built using python and [discord-py](https://discordpy.readthedocs.io/en/rewrite/api.html), which left some legacy traces after complete rewrite to C#, particularly for database format compatibility and command invocation syntax.
+
+On startup we check and run [database migrations](CompatBot/Database/Migrations/) to get to the expected table structure. Forward migration must always be lossless.
+
+Next we register all the [commands](CompatBot/Commands/) and [event handlers](CompatBot/EventHandlers/), configure the client for specific discord server (test servers must override most settings).
+
+Command dispatching and scheduling is done by the DSharp+ automatically. Event handlers run one by one and can terminate the call chain when necessary (e.g. if piracy was detected, we simply delete the message and abort every other check).
+
+In case of network problems client will attempt to reconnect automatically, but after several failed retries it aborts, in which case our global error handler will restart the instance automatically.
+
+General considerations
+======================
+
+* Please familiarize yourself with the [official Discord documentation](https://discordapp.com/developers/docs/reference). You'll see a lot of terms defined there (like guilds vs servers, users vs members, etc).
+
+* Always keep in mind that users _will_ find and exploit everything they can, including, but not limited to: spamming through bot responses, abusing response wording with the user input, hit performance through specially crafted messages or data (speed, memory, task queue depth, etc), provoke denial of service in the same vein, etc.
+  
+  >  Never trust user input. Validate and sanitize everything. Always limit access to the management commands.
+
+* This is not a big project, resources are limited, and shared with other services.
+
+  > Use streaming processing whenever possible. Limit memory usage, don't keep caches in memory just because you can. If you write to the disk, try to remove the trash automatically when you're done. Limit queues for background tasks. Do run tasks asynchronously whenever possible.
+
+* Bot gets special permissions on a case by case basis, don't assume it will have the required permissions at all times, in every context.
+
+  > Check permissions when possible. Catch exceptions always, log them when it makes sense. Have contingency plans always. Ideally everything should be controllable at runtime, without updates.
+
+* Functionality and helpfulness trump fun and memes.
+
+  > This is a help / moderation bot first and foremost. Fun stuff is secondary, keep it out of the way.
+
+* Please do go to the trouble of making or joining a test server, and check basic functionality before making a pull request and deploying to the main instance.
+
+  > There's also an [Azure Pipelines](https://github.com/marketplace/azure-pipelines) config in this repo that you can set up on your fork to check for basic CI checks.
+
+* Do use `dotnet user-secrets` to configure the bot and user-accessible [app data folders](https://docs.microsoft.com/en-us/dotnet/api/system.environment.specialfolder?view=netcore-2.1) do store any persistent data.
+
+* Everything runs asynchronously. Please familiarize yourself with the [basics](https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/task-based-asynchronous-programming) and the [pitfalls](https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/potential-pitfalls-in-data-and-task-parallelism) of asynchronous programming.
+
+  > Rule of thumb: use `.ConfigureAwait(false)` everywhere, don't use `async` function modifier for synchronous code (return `Task.FromResult()` or `Task.CompletedTask` instead). Avoid `async void` as a plague unless you know what you're doing, and always _always_ handle exceptions, if you must use it.