Here is as much as the pen-and-paper test that I remember:
<2 questions on Linux>
Maths and stats
1) 200 people responded to a mail out of 1000 customers. They want to send to another 500, what is the probability that 100 respond?
2) Joint distribution with p(x) > p(y). What is the probability that x > y?
3) Why does milk powder come in a cylinder and not in a box?
Experiment Design
Create two landing pages for a new competition to increase the number of subscribers and want to see which one is successful.
1) How would you test this?
2) How do you work out which landing page is more successful?
3) What’s the point? The competition is already over.
4) How could you improve subscriptions?
Programming
(i) Check to see if a string is a palindrome. How would you do unit testing?
(ii) An array A consists of numbers 1-n, but is missing a single value. Identify which one, using O(n) and only one additional variable.
(iii) Using a random number generator, generate a distribution where one value is weighted more heavily than others. (i.e., if [1,2,3] is the output, 1 is output 80% of the time and [2,3] are output 10% of the time each).
Databases
Generate a database flowchart for a social network. Include users with username, picture, password, their messages, their friends, and any groups they’ve signed up to (with only one admin).
1) Write a query to:
(i) Extract a message list between two people.
(ii) Extract friends of friends of a user.
2) A) How is a left join different from an inner join?
B) How is where different from having?
3) A) Given a database of financial transactions (transaction ID, user ID, amount, date), how do you identify the balance of each user?
B) How do you find the mean of the user balances?
C) And their standard deviation?
Machine Learning
What is a confusion matrix? Give an example.
Build an algorithm to determine spam based on the description somebody has written on the site.
(i) How would you clean the data?
(ii) What other data would be useful to determine this?
(iii) What features would you use?
(iv) What algorithm would you use?
(v) How would you evaluate the algorithm?