Installation Instructions
Docker
The easiest way to install MacroBase SQL is by downloading our Docker image, which has everything properly configured for you out of the box. Simply run
docker pull macrobase/macrobase
to download the latest version. Then, you can run a new container by executing
docker run -i -t macrobase/macrobase /bin/bash
Building from Source
You can also download the latest release of MacroBase SQL from our GitHub page and install it from source: https://github.com/stanford-futuredata/macrobase/releases.
Building MacroBase SQL requires Apache Maven 3.3.9
and Java 8+. Once Maven and Java 8 are installed, simply run ./build.sh sql
in the top-level directory, and MacroBase SQL should successfully build.
Running MacroBase SQL
Once you've installed MacroBase SQL, you can run it by executing bin/macrobase-sql
.
You should see this:
Welcome to
__ ___ ____
/ |/ /___ _______________ / __ )____ _________
/ /|_/ / __ `/ ___/ ___/ __ \/ __ / __ `/ ___/ _ \
/ / / / /_/ / /__/ / / /_/ / /_/ / /_/ (__ ) __/
/_/ /_/\__,_/\___/_/ \____/_____/\__,_/____/\___/
macrobase-sql>
You can also import a .sql file with pre-written SQL queries; just run bin/macrobase-sql -f [path/to/file]
.
Demo
To make sure MacroBase SQL has been successfully built, we have a simple demo
on sample data to get you started. If you run the bin/macrobase-sql -f sql/demo.sql
, you should see the following output:
Welcome to
__ ___ ____
/ |/ /___ _______________ / __ )____ _________
/ /|_/ / __ `/ ___/ ___/ __ \/ __ / __ `/ ___/ _ \
/ / / / /_/ / /__/ / / /_/ / /_/ / /_/ (__ ) __/
/_/ /_/\__,_/\___/_/ \____/_____/\__,_/____/\___/
IMPORT FROM CSV FILE 'core/demo/sample.csv' INTO sample(usage double, latency
double, location string, version string);
1057 rows
-----------------------------------------------------
| usage | latency | location | version |
-----------------------------------------------------
| 30.77 | 238.0 | CAN | v2 |
| 31.28 | 611.0 | CAN | v2 |
| 31.17 | 768.0 | RUS | v4 |
| 30.94 | 192.0 | AUS | v3 |
| 35.36 | 401.0 | UK | v3 |
| 39.12 | 531.0 | RUS | v4 |
| 33.9 | 223.0 | UK | v3 |
...
| 1000.77 | 864.0 | CAN | v2 |
| 1000.77 | 864.0 | CAN | v2 |
| 1000.77 | 864.0 | CAN | v2 |
| 1000.77 | 864.0 | CAN | v2 |
| 1000.77 | 864.0 | CAN | v2 |
| 1000.77 | 864.0 | CAN | v2 |
| 1000.77 | 864.0 | CAN | v2 |
-----------------------------------------------------
SELECT * FROM
DIFF
(SELECT * FROM sample WHERE usage > 1000.0) outliers,
(SELECT * FROM sample WHERE usage < 1000.0) inliers
ON *;
2 rows
-------------------------------------------------------------------------------------------------------
| location | version | support | global_ratio | outliers | count |
-------------------------------------------------------------------------------------------------------
| CAN | - | 1.0 | 4.11284 | 38.0 | 257.0 |
| - | v1 | .789474 | 2.990945 | 30.0 | 279.0 |
-------------------------------------------------------------------------------------------------------
SELECT * FROM
DIFF
(SELECT * FROM sample WHERE usage > 1000.0) outliers,
(SELECT * FROM sample WHERE usage < 1000.0) inliers
ON
location, version;
2 rows
-------------------------------------------------------------------------------------------------------
| location | version | support | global_ratio | outliers | count |
-------------------------------------------------------------------------------------------------------
| CAN | - | 1.0 | 4.11284 | 38.0 | 257.0 |
| - | v1 | .789474 | 2.990945 | 30.0 | 279.0 |
-------------------------------------------------------------------------------------------------------
SELECT * FROM
DIFF
(SELECT * FROM sample WHERE usage > 1000.0) outliers,
(SELECT * FROM sample WHERE usage < 1000.0) inliers
ON
location, version
WITH MIN RATIO 5.0 MIN SUPPORT 0.75
COMPARE BY
risk_ratio(COUNT(*));
2 rows
-------------------------------------------------------------------------------------------
| location | version | support | risk_ratio | outliers | count |
-------------------------------------------------------------------------------------------
| CAN | - | 1.0 | ∞ | 38.0 | 257.0 |
| - | v1 | .789474 | 10.456989 | 30.0 | 279.0 |
-------------------------------------------------------------------------------------------
SELECT * FROM
DIFF
(SELECT * FROM sample WHERE usage > 1000.0) outliers,
(SELECT * FROM sample WHERE usage < 1000.0) inliers
ON *
WITH MIN SUPPORT 0.75 MIN RATIO 5.0
COMPARE BY
risk_ratio(COUNT(*));
2 rows
-------------------------------------------------------------------------------------------
| location | version | support | risk_ratio | outliers | count |
-------------------------------------------------------------------------------------------
| CAN | - | 1.0 | ∞ | 38.0 | 257.0 |
| - | v1 | .789474 | 10.456989 | 30.0 | 279.0 |
-------------------------------------------------------------------------------------------
SELECT * FROM
DIFF
(SPLIT (
SELECT *, percentile(usage) as pct FROM sample)
WHERE pct > 0.9641)
ON *
WITH MIN SUPPORT 0.75 MIN RATIO 5.0;
1 row
-------------------------------------------------------------------------------------------------------
| location | version | support | global_ratio | outliers | count |
-------------------------------------------------------------------------------------------------------
| CAN | v1 | .789474 | 10.562958 | 30.0 | 79.0 |
-------------------------------------------------------------------------------------------------------