mirror of
https://github.com/BillyOutlast/posthog.com.git
synced 2026-02-04 03:11:21 +01:00
433 lines
16 KiB
Plaintext
433 lines
16 KiB
Plaintext
---
|
||
title: Hog
|
||
sidebar: Docs
|
||
showTitle: true
|
||
availability:
|
||
free: full
|
||
selfServe: full
|
||
enterprise: full
|
||
---
|
||
|
||
We want you to have the power to transform and export your data in real time, without the headache of maintaining your own service to get it done.
|
||
|
||
Enter Hog: the language that drives our [realtime destinations](/docs/cdp/destinations).
|
||
|
||
We’ve written Hog destination templates for dozens of services. But you can also write destinations yourself. Here’s how it works.
|
||
|
||
> **Note:** Hog shouldn't be confused with HogQL, the former name of our SQL-like query language used inside PostHog. If you're looking to query data in PostHog, see our [SQL docs](/docs/sql).
|
||
|
||
## Hog by example: webhook
|
||
|
||
When creating a destination, you’ll start with an input interface sheet. This enables easy changes and maintenance to your destination.
|
||
|
||
You can include additional inputs while editing the destination:
|
||
|
||
- Click **Edit source code**
|
||
- Click **Add input variable**
|
||
|
||
While editing your destination, each input's property name is noted beside its title.
|
||
|
||
Let's walk through the default [webhook template](/docs/cdp/destinations/webhook). It's a great starting point for your own destinations.
|
||
|
||
```hog
|
||
|
||
let payload := {
|
||
// This is the object we use to specify our HTTP request
|
||
'headers': inputs.headers,
|
||
'body': inputs.body,
|
||
'method': inputs.method
|
||
}
|
||
|
||
if (inputs.debug) {
|
||
// If we're running in debug mode, we'll take an extra step:
|
||
// Printing the request's URL and payload object
|
||
print('Request', inputs.url, payload)
|
||
}
|
||
|
||
// Create and fire a request with the specified URL and payload object
|
||
let res := fetch(inputs.url, payload);
|
||
|
||
// When the request returns, execution continues
|
||
// The response is stored to the above constant, `res`
|
||
// You can write code that handles the response from here
|
||
|
||
if (inputs.debug) {
|
||
// If we're in debug mode, print the status code and body for the response
|
||
print('Response', res.status, res.body);
|
||
}
|
||
```
|
||
|
||
It’s okay if your destination service is a little pickier, though. Let’s say the endpoint you’re targeting expects to receive a body with a specific structure. You could wrap the `inputs.body` JSON content in another object, then pass this along to the payload:
|
||
|
||
```hog
|
||
|
||
let wrapped_input := {
|
||
'some_necessary_property': 'its necessary value',
|
||
'content': inputs.body
|
||
}
|
||
|
||
let payload := {
|
||
'headers': inputs.headers,
|
||
'body': wrapped_input,
|
||
'method': inputs.method
|
||
}
|
||
```
|
||
|
||
You could even add a new input field to your destination, which is convenient if you think you’ll want to change the value again in the future, or reuse this destination for multiple targets.
|
||
|
||

|
||
|
||
Imagine a destination endpoint that can update one of many tables, and therefore requires a table name to be specified. With a string input named `table`, you could write this:
|
||
|
||
```hog
|
||
|
||
let table_patch := {
|
||
'table': inputs.table,
|
||
'content': inputs.body
|
||
}
|
||
```
|
||
### Keep in mind
|
||
|
||
Hog was designed to be syntax-compatible with our SQL flavor. This means all our SQL expressions work as Hog code.
|
||
|
||
This compatibility imposes a few key differences comparted to other programming languages:
|
||
|
||
- Variable assignment in Hog is done with the `:=` operator, as `=` and `==` are both used for equality comparisons in SQL
|
||
- You must type out `and`, `or` and `not`. Currently `&&` and `!` raise syntax errors, whereas `||` is used as the string concatenation operator.
|
||
- All arrays in Hog start from index `1`. Yes, for real. Trust us, we know. However that's how SQL has always worked, so we adopted it.
|
||
- The easiest way to debug your code is to `print()` the variables in question, and then check the logs.
|
||
- Strings must always be written with `'single quotes'`. You may use `f`-string templates like `f'Hello {name}'`.
|
||
|
||
Read more about the story behind Hog in [Why hog?](#why-hog) below.
|
||
|
||
## Syntax
|
||
|
||
### Comments
|
||
|
||
Hog comments start with `//`. You can also use SQL style comments with `--` or C++ style multi line blocks with `/*`.
|
||
|
||
```
|
||
// Hog comments start with //
|
||
-- You can also use SQL style comments with --
|
||
/* or C++ style multi line
|
||
blocks */
|
||
```
|
||
|
||
### Variables
|
||
|
||
Use `:=` to assign a value to a variable because `=` is just equals in SQL.
|
||
|
||
```rust
|
||
// assign 12 to myVar
|
||
let myVar := 12
|
||
myVar := 13
|
||
myVar := myVar + 1
|
||
```
|
||
|
||
### Comparisons
|
||
|
||
On top of standard comparisons, `like`, `ilike`, `not like`, and `not ilike` work.
|
||
|
||
```rust
|
||
let myVar := 12
|
||
print(myVar = 12 or myVar < 10) // prints true
|
||
print(myVar < 12 and myVar > 12) // prints false
|
||
|
||
let string := 'mystring'
|
||
print(string ilike '%str%') // prints true
|
||
```
|
||
|
||
### Regex
|
||
|
||
Compares strings against regex patterns. `=~` matches exactly, `=~*` matches case insensitively, `!~` does not match, and `!~*` does not match case insensitively.
|
||
|
||
```rust
|
||
print('string' =~ 'i.g$') // true
|
||
print('string' !~ 'i.g$') // false
|
||
print('string' =~* 'I.G$') // true, case insensitive
|
||
print('string' !~* 'I.G$') // false, case insensitive
|
||
```
|
||
|
||
### Arrays
|
||
|
||
Supports both dot notation and bracket notation.
|
||
|
||
Arrays in Hog (and our SQL flavor) are **1-indexed**!
|
||
|
||
```rust
|
||
let myArray := [1,2,3]
|
||
print(myArray.2) // prints 2
|
||
print(myArray[2]) // prints 2
|
||
```
|
||
|
||
### Tuples
|
||
|
||
Supports both dot notation and bracket notation.
|
||
|
||
Tuples in Hog (and our SQL flavor) are **1-indexed**!
|
||
|
||
```rust
|
||
let myTuple := (1,2,3)
|
||
print(myTuple.2) // prints 2
|
||
print(myTuple[2]) // prints 2
|
||
```
|
||
|
||
### Objects
|
||
|
||
You must use single quotes for object keys and values.
|
||
|
||
```rust
|
||
let myObject := {'key': 'value'}
|
||
print(myObject.key) // prints 'value'
|
||
print(myObject['key']) // prints 'value'
|
||
|
||
print(myObject?.this?.is?.not?.found) // prints 'null'
|
||
print(myObject?.['this']?.['is']?.not?.found) // prints 'null'
|
||
```
|
||
|
||
### Strings
|
||
|
||
Strings must always start and end with a single quote. Includes f-string support.
|
||
|
||
```rust
|
||
let str := 'string'
|
||
print(str || ' world') // prints 'string world', SQL concat
|
||
print(f'hello {str}') // prints 'hello string'
|
||
print(f'hello {f'{str} world'}') // prints 'hello string world'
|
||
```
|
||
|
||
### Functions and lambdas
|
||
|
||
Functions are first class variables, just like in JavaScript. You can define them with `fun`, or inline as lambdas:
|
||
|
||
```rust
|
||
fun addNumbers(num1, num2) {
|
||
let newNum := num1 + num2
|
||
return newNum
|
||
}
|
||
print(addNumbers(1, 2))
|
||
|
||
let square := (a) -> a * a
|
||
print(square(4))
|
||
```
|
||
|
||
See [Hog's standard library](#hogs-standard-library) for a list of built-in functions.
|
||
|
||
|
||
### Logic
|
||
|
||
```rust
|
||
let a := 3
|
||
if (a > 0) {
|
||
print('yes')
|
||
}
|
||
```
|
||
|
||
### Ternary operations
|
||
|
||
```rust
|
||
print(a < 2 ? 'small' : 'big')
|
||
```
|
||
|
||
### Nulls
|
||
|
||
```rust
|
||
let a := null
|
||
print(a ?? 'is null') // prints 'is null'
|
||
```
|
||
|
||
### While loop
|
||
|
||
```rust
|
||
let i := 0
|
||
while(i < 3) {
|
||
print(i) // prints 0, 1, 2
|
||
i := i + 1
|
||
}
|
||
```
|
||
|
||
### For loop
|
||
|
||
```rust
|
||
for(let i := 0; i < 3; i := i + 1) {
|
||
print(i) // prints 0, 1, 2
|
||
}
|
||
```
|
||
|
||
### For-in loop
|
||
|
||
```rust
|
||
let arr = ['banana', 'tomato', 'potato']
|
||
for (let food in arr) {
|
||
print(food)
|
||
}
|
||
|
||
let obj = {'banana': 3, 'tomato': 5, 'potato': 6}
|
||
for (let food, value in arr) {
|
||
print(food, value)
|
||
}
|
||
```
|
||
|
||
|
||
## Hog's standard library
|
||
|
||
Hog's standard library includes the following functions and will expand. To see the the most update-to-date list, check the [Python VM's `stl/__init__.py` file](https://github.com/PostHog/posthog/blob/master/common/hogvm/python/stl/__init__.py).
|
||
|
||
### Type conversion
|
||
|
||
- `toString(arg: any): string`
|
||
- `toUUID(arg: any): UUID`
|
||
- `toInt(arg: any): integer`
|
||
- `toFloat(arg: any): float`
|
||
- `toDate(arg: string | integer): Date`
|
||
- `toDateTime(arg: string | integer): DateTime`
|
||
- `tuple(...args: any[]): tuple`
|
||
- `typeof(arg: any): string`
|
||
|
||
### Comparisons
|
||
|
||
- `ifNull(value: any, alternative: any)`
|
||
|
||
### String functions
|
||
|
||
- `print(...args: any[])`
|
||
- `concat(...args: string[]): string`
|
||
- `match(arg: string, regex: string): boolean`
|
||
- `length(arg: string): integer`
|
||
- `empty(arg: string): boolean`
|
||
- `notEmpty(arg: string): boolean`
|
||
- `lower(arg: string): string`
|
||
- `upper(arg: string): string`
|
||
- `reverse(arg: string): string`
|
||
- `trim(arg: string, char?: string): string`
|
||
- `trimLeft(arg: string, char?: string): string`
|
||
- `trimRight(arg: string, char?: string): string`
|
||
- `splitByString(separator: string, str: string, maxParts?: integer): string[]`
|
||
- `jsonParse(arg: string): any`
|
||
- `jsonStringify(arg: object, indent = 0): string`
|
||
- `base64Encode(arg: string): string`
|
||
- `base64Decode(arg: string): string`
|
||
- `tryBase64Decode(arg: string): string`
|
||
- `encodeURLComponent(arg: string): string`
|
||
- `decodeURLComponent(arg: string): string`
|
||
- `replaceOne(arg: string, needle: string, replacement: string): string`
|
||
- `replaceAll(arg: string, needle: string, replacement: string): string`
|
||
- `generateUUIDv4(): string`
|
||
- `position(haystack: string, needle: string): integer`
|
||
- `positionCaseInsensitive(haystack: string, needle: string): integer`
|
||
- `substring(arg: string, start: int, length?: int): string` - `start` is 1-based, e.g. `substring('hello world',1,5)` will return `hello`
|
||
|
||
### Objects and arrays
|
||
- `length(arg: any[] | object): integer`
|
||
- `empty(arg: any[] | object): boolean`
|
||
- `notEmpty(arg: any[] | object): boolean`
|
||
- `keys(arg: any[] | object): string[]`
|
||
- `values(arg: any[] | object): string[]`
|
||
- `indexOf(array: any[], elem: any): integer`
|
||
- `has(array: any[], element: any)`
|
||
- `arrayPushBack(arr: any[], value: any): any[]`
|
||
- `arrayPushFront(arr: any[], value: any): any[]`
|
||
- `arrayPopBack(arr: any[]): any[]`
|
||
- `arrayPopFront(arr: any[]): any[]`
|
||
- `arraySort(arr: any[]): any[]`
|
||
- `arrayReverse(arr: any[]): any[]`
|
||
- `arrayReverseSort(arr: any[]): any[]`
|
||
- `arrayStringConcat(arr: any[], separator?: string): string`
|
||
- `arrayMap(callback: (arg: any): any, array: any[]): any[]`
|
||
- `arrayFilter(callback: (arg: any): boolean, array: any[]): any[]`
|
||
- `arrayExists(callback: (arg: any): boolean, array: any[]): boolean`
|
||
- `arrayCount(callback: (arg: any): boolean, array: any[]): integer`
|
||
|
||
### Date functions
|
||
- `now(): DateTime`
|
||
- `toUnixTimestamp(input: DateTime | Date | string, zone?: string): float`
|
||
- `fromUnixTimestamp(input: number): DateTime`
|
||
- `toUnixTimestampMilli(input: DateTime | Date | string, zone?: string): float`
|
||
- `fromUnixTimestampMilli(input: integer | float): DateTime`
|
||
- `toTimeZone(input: DateTime, zone: string): DateTime | Date`
|
||
- `toDate(input: string | integer | float): Date`
|
||
- `toDateTime(input: string | integer | float, zone?: string): DateTime`
|
||
- `formatDateTime(input: DateTime, format: string, zone?: string): string` - we use use the [ClickHouse formatDateTime syntax](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions#formatdatetime).
|
||
- `toInt(arg: any): integer` - Converts `arg` to a 64-bit integer. Converts `Date`s into days from epoch, and `DateTime`s into seconds from epoch
|
||
- `toFloat(arg: any): float` - Converts `arg` to a 64-bit float. Converts `Date`s into days from epoch, and `DateTime`s into seconds from epoch
|
||
- `toDate(arg: string | integer): Date` - `arg` must be a string `YYYY-MM-DD` or a Unix timestamp in seconds
|
||
- `toDateTime(arg: string | integer): DateTime` - `arg` must be an ISO timestamp string or a Unix timestamp in seconds
|
||
|
||
### Cryptographic functions
|
||
|
||
- `md5Hex(arg: string): string`
|
||
- `sha256Hex(arg: string): string`
|
||
- `sha256HmacChainHex(arg: string[]): string`
|
||
|
||
## Why Hog?
|
||
|
||
Truth be told, we didn't set out to build our own programming language. We actually wanted to build the **HogVM**: a way to run HogQL expressions synchronously within our NodeJS ingestion pipeline. However, things escalated quickly.
|
||
|
||
### HogVM
|
||
|
||
The HogVM is a stack-based bytecode virtual machine (VM). It was built to power event filtering in our real-time delivery system.
|
||
|
||
It's paired with a compiler that converts SQL expressions like `properties.$screen_width < properties.$screen_height` into bytecode like `["_H", 1, 32, "$screen_height", 32, "properties", 1, 2, 32, "$screen_width", 32, "properties", 1, 2, 15, 35]`. The HogVM then executes that bytecode.
|
||
|
||
The HogVM had a few design constraints:
|
||
|
||
- Virtually no startup time (thousands of different expressions must run against millions of events every minute).
|
||
- Fast enough to filter the firehose of our event stream without drowning us in hosting costs.
|
||
- Runtime flexibility: the ability to embed Hog in many target languages like Python, NodeJS and the browser.
|
||
- Small surface area that makes it easy maintain multiple implementations.
|
||
- Secure enough to run user provided code.
|
||
- Syntax compatible with SQL.
|
||
|
||
Effectively, we wanted to run SQL (AKA HogQL) synchronously within a NodeJS process, at scale. So that's what we built, twice.
|
||
|
||
We now have two identical HogVMs: one in [Python](https://github.com/PostHog/posthog/blob/master/hogvm/python/execute.py) (to run alongside queries, unreleased), and one in [TypeScript](https://github.com/PostHog/posthog/blob/master/hogvm/typescript/src/execute.ts) (to run in the pipeline, and in the browser). We might even build a third version in Rust soon. The surface area for the VM is really that small.
|
||
|
||
### Async HogVM
|
||
|
||
We chose NodeJS for our [old ingestion pipeline](https://posthog.com/blog/how-we-built-an-app-server) because we wanted to give users the ability to write their own custom JavaScript functions. That was a mistake. We never released this on Cloud because it was too risky:
|
||
|
||
1. There's no way to run untrusted JavaScript without heavy sandboxing (our approach with VM2 fell short).
|
||
2. ~~User provided~~ JavaScript is unpredictable and leaky (unfinished promises, data in globals, etc).
|
||
3. Functions must run until the end. We can't pause or resume and must set short global runtime limits.
|
||
|
||
The "aha" moment came when we realized we could build a real programming language on top of HogVM to get around these limitations.
|
||
|
||
The key insight was this: we can pause and resume our own VM at any point, including right before `fetch()` calls. When a script calls `fetch`, we serialize the entire VM state (stack, bytecode, metadata), pass all of that onto the "fetch service" (via a queue like Kafka), push the response back onto the VM's stack, and resume execution on what might be a completely different machine. It doesn't matter if the fetch was called in a for loop or in an extremely recursive call stack, we can pause and resume the language at will. It's magic.
|
||
|
||
During the [MyKonos hackathon](https://posthog.com/blog/mykonos-hackathon) in May we built the first version of Hog, and then productized it over the next 5 months as part of our new CDP app.
|
||
|
||
So that's why we built Hog: to run multi-hop user-provided async scripts in a safe, predictable, and scalable way.
|
||
|
||
### Future Hog
|
||
|
||
These are the early days for Hog and the HogVM. We will be evolving the language and the experience based on user feedback, so please let us know what you think. We're excited to see what you're going to build with it.
|
||
|
||
## Running Hog locally
|
||
|
||
To run Hog, first, you need to [clone and set up PostHog locally](/handbook/engineering/developing-locally). The repo has VMs to run the source code and complied bytecode as well as example files. The [default VM](https://github.com/PostHog/posthog/blob/master/hogvm/python) relies on PostHog's Python dependencies, but we also have a [Typescript VM](https://github.com/PostHog/posthog/blob/master/hogvm/typescript) that relies on those dependencies.
|
||
|
||
Once you have PostHog set up, go into the repo and run `bin/hog` with a `.hog` file.
|
||
|
||
```bash
|
||
cd posthog
|
||
bin/hog hogvm/__tests__/mandelbrot.hog
|
||
```
|
||
|
||
You can add the `--debug` flag to step through and see the stack trace.
|
||
|
||
### Compiling Hog
|
||
|
||
You can compile a `.hog` file to a `.hoge` executable with `bin/hoge`.
|
||
|
||
```bash
|
||
bin/hoge hogvm/__tests__/mandelbrot.hog
|
||
```
|
||
|
||
You can then run the complied `.hoge` file automatically with `bin/hog`.
|
||
|
||
```bash
|
||
bin/hog hogvm/__tests__/mandelbrot.hoge
|
||
```
|
||
|
||
|