Search…
πŸ“°
Publishing Your Data
In this final installment of the quick start tutorial, you will learn how to take any dataset you've created, and publish it so that you can access it programmatically.

Why Publish Data?

DataDistillr is a great tool for exploring data, but undoubtedly there will be use cases which DataDistillr is not well suited. Machine Learning for example. In this case, we have made it easy for you to do your data exploration in DataDistillr, then publish your dataset and easily move it into a Pandas, PySpark or R dataframe for machine learning!

Publishing Data

After you've run a query, in the top bar, you will see a button labeled Manage API Access. Click on that button as shown below.
Manage API Access Button

API Management Screen

The API Management Screen is where you can control access to the various datasets you have created.
When you publish a dataset, it should go without saying that anyone with the access information can view the data. Exercise caution in exposing data, and make sure you protect the access tokens to prevent unauthorized access.

Create an API Access Client

Since this is your first project, you will have to create an API Access Client. Enter a name and enable the API app.
Create an API client

Understanding API Permissions

When you create an API Access Client, all your projects and queries within those projects will be visible. You can set permissions for each of these to allow or disallow API access to your datasets. Allow, and deny are self-explanatory. When the permission is set to inherit, the dataset will inherit the permission from the project.
Users will have to have an access token in order to access the data. Publishing a data set does not automatically make the data visible to the world.
You can expand a project tab and see all the queries in a project. From there, you can set the access permissions for every dataset, as well as view API usage statistics.
API Access Management

Accessing Published Data

Once you've published your data, you can access this data programmatically. To retrieve the access tokens, as well as code snippets, click on the </> icon next to the permissions. DataDistillr will generate code snippets similar to the examples below, prepopulated with your API key and data set ids, so all you will have to do is cut and paste this code into your Python or R scripts.
Python
Python with DataDistillr SDK
R
1
import pandas as pd
2
import requests
3
​
4
url = "https://app.datadistillr.io/v1/results/<dataset id>"
5
api_key = "<YOUR API KEY HERE>"
6
headers = {"Authorization": api_key}
7
​
8
response = requests.get(url, headers=headers)
9
data = pd.DataFrame(response.json()['results'], columns=response.json()['summary']['columnNames'])
Copied!
1
import datadistillr.datadistillr as ddr
2
​
3
url = "https://app.datadistillr.io/v1/results/<dataset id>"
4
auth_token = "<YOUR API KEY HERE>"
5
dataframe = ddr.datadistillr.get_dataframe(url, auth_token)
Copied!
1
library(dplyr)
2
library(httr)
3
library(data.table)
4
​
5
URL <- "https://app.datadistillr.io/v1/results/<dataset id>"
6
AUTH_HEADER <- "<YOUR API KEY>"
7
​
8
response <- GET(URL, add_headers(Authorization = AUTH_HEADER))
9
content <- content(response, as = "parsed", type = "application/json")
10
data <- data.frame(rbindlist(content$results))
11
colnames(data) <- content$summary$columnNames
Copied!
That's it!
Last modified 1mo ago