* feat(plugin-server): add env SESSION_RECORDING_PROCESSING_ENABLED
This allows us to enable/disable session recording processing in the
plugin server, which is useful such that we can be very specific about
which workloads run on which e.g. K8s deployments.
* perf: avoid loading piscina if not needed
* fix test
* remove env var, use modes
* wip
* feat(ingestion-slowlane): Add token-bucket utility
* feat(ingestion-slowlane): Re-route overflow events
* fix: Import missing stringToBoolean
* fix(ingestion-slowlane): Flip around kafka topics according to mode
* refactor(ingestion-slowlane): Use dash instead of underscore in filename
* fix(ingestion-slowlane): Do not increase tokens beyond bucket capacity
* feat(ingestion-slowlane): Add ingestion-overflow mode/capability/consumer
* feat(ingestion-slowlane): Add ingestion warning for capacity overflow
* test(ingestion-slowlane): Add test for ingestion of overflow events
* fix(ingestion-slowlane): Rate limit warnings to 1 per hour
* test(ingestion-slowlane): Add a couple more tests for overflow re-route
* fix(slowlane-ingestion): Look at batch topic to determine message topic
* refactor(slowlane-ingestion): Use refactored consumer model
* fix(slowlane-ingestion): Undo topic requirement in eachMessageIngestion
* refactor(slowlane-ingestion): Only produce events if ingestionOverflow is also enabled
* refactor(slowlane-ingestion): Use an env variable to determine if ingestionOverflow is enabled
* chore(slowlane-ingestion): Add a comment explaining env variable
* refactor(plugin-server): split out plugin server functionality
To get better isolation we want to allow specific functionality to run
in separate pods. We already have the ingestion / async split, but there
are further divides we can make e.g. the cron style scheduler for plugin
server `runEveryMinute` tasks.
* split jobs as well
* Also start Kakfa consumers on processAsyncHandlers
* add status for async
* add runEveryMinute test
* avoid fake timers, just accept slower tests
* make e2e concurrent
* chore: also test ingestion/async split
* increase timeouts
* increase timeouts
* lint
* Add functional tests dir
* fix
* fix
* hack
* hack
* fix
* fix
* fix
* wip
* wip
* wip
* wip
* wip
* fix
* remove concurrency
* remove async-worker mode
* add async-handlers
* wip
* add modes to overrideWithEnv validation
* fix: async-handlers -> exports
* update comment
* chore: remove onSnapshot from e2e test
* chore(plugin-server): add option to enable buffer topic for all
This adds an option to allow specifying `'*'` for the teams the buffer
topic should be enabled for.
It also improves the concurrency of the e2e tests.
* Tests/unify-multi-process-tests
Previously we were running different tests, now we run the same.
## Problem
<!-- Who are we building for, what are their needs, why is this important? -->
## Changes
<!-- If there are frontend changes, please include screenshots. -->
<!-- If a reference design was involved, include a link to the relevant Figma frame! -->
👉 *Stay up-to-date with [PostHog coding conventions](https://posthog.com/docs/contribute/coding-conventions) for a smoother review.*
## How did you test this code?
<!-- Briefly describe the steps you took. -->
<!-- Include automated tests if possible, otherwise describe the manual testing routine. -->
* remove unexpected changes
* further remove changes
* Remove concurrency in e2e tests, it seems a little flaky
* increase timeout, seems to be slow in CI
* truncate tables before e2e. We're not updating the id sequence properly.
* remove fake timers
* feat(ingestion): remove Graphile worker as initial ingest dependency
At the moment if the Graphile enqueing of an anonymous event fails e.g.
due to the database that it is uing to store scheduling information
fails, then we end up pushing the event to the Dead Letter Queue and to
not do anything with it further.
Here, instead of directly sending the event to the DB, we first push it
to Kafka, an `anonymous_events_buffer`, which is then committed to the
Graphile database. This means that if the Graphile DB is down, but then
comes back up, we will end up with the same results as if it was always
up*
(*) not entirely true as what is ingested also depends on the timings of
other events being ingested
* narrow typing for anonymous event consumer
* fix types import
* chore: add comment re todos for consumer
* wip
* wip
* wip
* wip
* wip
* wip
* fix typing
* Include error message in warning log
* Update plugin-server/jest.setup.fetch-mock.js
Co-authored-by: Guido Iaquinti <4038041+guidoiaquinti@users.noreply.github.com>
* Update plugin-server/src/main/ingestion-queues/anonymous-event-buffer-consumer.ts
Co-authored-by: Guido Iaquinti <4038041+guidoiaquinti@users.noreply.github.com>
* include warning icon
* fix crash message
* Update plugin-server/src/main/ingestion-queues/anonymous-event-buffer-consumer.ts
* Update plugin-server/src/main/ingestion-queues/anonymous-event-buffer-consumer.ts
Co-authored-by: Yakko Majuri <38760734+yakkomajuri@users.noreply.github.com>
* setup event handlers as KafkaQueue
* chore: instrument buffer consumer
* missing import
* avoid passing hub to buffer consumer
* fix statsd reference.
* pass graphile explicitly
* explicitly cast
* add todo for buffer healthcheck
* set NODE_ENV=production
* Update comment re. failed batches
* fix: call flush on emitting to buffer.
* chore: flush to producer
* accept that we may drop some anonymous events
* Add metrics for enqueue error/enqueued
* fix comment
* chore: add CONVERSION_BUFFER_TOPIC_ENABLED_TEAMS to switch on buffer
topic
Co-authored-by: Guido Iaquinti <4038041+guidoiaquinti@users.noreply.github.com>
Co-authored-by: Yakko Majuri <38760734+yakkomajuri@users.noreply.github.com>
* refactor(plugin-server): use JSON logs when not in dev
To improve observability of the plugin-server, for instance easily being
able to view all error logs, we enable JSON logs in production. This
should enable us to, for instance easily parse for KafkaJS events like
GROUP_JOIN but initially we can just use for filtering down on log
level.
In development we will still get a plain text log line.
* ensure stderr goes through pino-pretty as well
* output log level names not number
* update versions
* Add PLUGIN_SERVER_MODE
* Make capabilities dependent on PLUGIN_SERVER_MODE
* Subscribe to kafka-events topic
* runAsyncHandlersEventPipeline
* Test fixup: fix typing error
* Test fixup: flush right after queueing message
* Parse clickhouse event correctly
* Different consumer group ids for kafka queue based on mode
* Set different prompts for different modes
* Capability for http, disabled in tests
* Elements chain handling in async ingestion
* Test for runner.test.ts
* Update a snapshot
* Update plugin-server/README.md
Co-authored-by: Yakko Majuri <38760734+yakkomajuri@users.noreply.github.com>
* Solve review-related issues
* Fix a test
* Fix imports
* Capabilities test fix
* Update tests
Co-authored-by: Yakko Majuri <38760734+yakkomajuri@users.noreply.github.com>