Sampling With Replacement In Presto

2 min readJun 12, 2019

While analyzing an experiment data, I encountered an interesting brain teaser. I wanted to use the bootstrap method and for that needed to sample my data with replacement. After some iterations, I got the perfect way to do it. The trick is using UNNEST with sequence operation to duplicate the data. Below is an example.

In the below example, I assume we have two different treatment groups (A and B), and we need 3 samples of data. In each sample, we need 40% of users i.e. (2 of 5 users from each treatment group).

The most important step is UNNEST("sequence"(0, 3, 1)) T1(x) which duplicates each user 3 times. Then we use NTILE operator to split the data into 5 groups and thereafter select two buckets for each group. Since there are only 5 users per treatment group, each tile will only have one user. If you have more than 100 users, then set the number of tiles to 100 and replace tile < 3 with tile < 40 to select 40% of users. You can read more about how to use NTILE operator for sampling in my earlier blog over here

WITH dataset AS (
    SELECT *
    FROM (
        VALUES
            (1, 'A'), 
            (2, 'A'), 
            (3, 'A'),
            (4, 'A'),
            (5, 'A'),            
            (6, 'B'),
            (7, 'B'),
            (8, 'B'),            
            (9, 'B'),            
            (10, 'B')           
    ) AS t(user_uuid, treatment_group)
)
 
SELECT
    user_uuid
    , treatment_group 
    , iterNo    
FROM
(
    SELECT
        user_uuid
        , treatment_group 
        , iterNo
        , NTILE(5) OVER (PARTITION BY treatment_group, iterNo ORDER BY rnd) as tile
    FROM
    (
        SELECT
            user_uuid
            , treatment_group
            , x as iterNo
            , RAND() as rnd 
        FROM dataset
        CROSS JOIN UNNEST("sequence"(0, 3, 1)) T1(x)
    ) A 
) B 
WHERE tile < 3

Originally published at http://ragrawal.wordpress.com on June 12, 2019.

Sampling With Replacement In Presto

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Ritesh Agrawal

No responses yet