DUCKDB, S3 Tables with iceberg using Iceberg Rest API
I wrote my previous article about Duckberg, a combination of PyIceberg with Duckdb to access iceberg tables on aws. But on Pi-Day (3 March) DuckDB released a preview with native iceberg glue integration for read and write. I'll show you how this works AWS, create S3 table bucket Open S3 Select Table Bucket Enable the integration with AWS analytics services once. Create Table Bucket Give it a name like testtablebucket AWS, create s3 Table Select the bucket you created before Click Create Table with Athena Create namespace or select an existing one. This is similar to a Glue Database, but not the same. Create testnamespace This opens Athena If you didn't do this before, create a bucket and edit the workgroup to use this bucket as output location Execute the Create Table Statement CREATE TABLE `testnamespace`.daily_sales ( sale_date date, product_category string, sales_amount double) PARTITIONED BY (month(sale_date)) TBLPROPERTIES ('table_type' = 'iceberg') Create a new query to insert some test values INSERT INTO daily_sales VALUES (DATE '2024-01-15', 'Laptop', 900.00), (DATE '2024-01-15', 'Monitor', 250.00), (DATE '2024-01-16', 'Laptop', 1350.00), (DATE '2024-02-01', 'Monitor', 300.00), (DATE '2024-02-01', 'Keyboard', 60.00), (DATE '2024-02-02', 'Mouse', 25.00), (DATE '2024-02-02', 'Laptop', 1050.00), (DATE '2024-02-03', 'Laptop', 1200.00), (DATE '2024-02-03', 'Monitor', 375.00); Test that the data is correctly loaded SELECT product_category, COUNT(*) as units_sold, SUM(sales_amount) as total_revenue, AVG(sales_amount) as average_price FROM daily_sales WHERE sale_date BETWEEN DATE '2024-02-01' and DATE '2024-02-29' GROUP BY product_category ORDER BY total_revenue DESC; Duckdb, setup This follows the duckdb blog. Set the credentials in your console, with export AWS_PROFILE=myuser or the aws key credentials Make sure you have duckdb 1.21 or newer installed. Run the new duckdb webinterface with : duckdb -ui FORCE INSTALL aws FROM core_nightly; FORCE INSTALL httpfs FROM core_nightly; FORCE INSTALL iceberg FROM core_nightly; CREATE SECRET ( TYPE s3, PROVIDER credential_chain ); ATTACH 'arn:aws:s3tables:eu-west-1::bucket/testtablebucket' AS s3_tables_db ( TYPE iceberg, ENDPOINT_TYPE s3_tables ); Replace the with your account id and use the correct s3 table bucket Run SHOW ALL TABLES; to view the internal tables and the available s3 tables Duckdb , Query Data Run: FROM s3_tables_db.testnamespace.daily_sales; Duckdb , Alter data In the blogpost there is a option to use alter table, unfortunatly i still get: Not implemented Error: Alter Schema Entry alter table s3_tables_db.testnamespace.daily_sales add column test STRING; Maybe this works only in the iceberg rest method. Duckdb , Attach Catalog via iceberg The steps before access the tables directly on S3 Tables, but the latest extension also support the Iceberg Rest API from Glue Catalog/AWS Sagemaker Lakehouse to access the data through a standardized method ATTACH '311141556126:s3tablescatalog/testnamespace' AS glue_db ( TYPE iceberg, ENDPOINT_TYPE glue ); show all tables; Gives me: IO Error: Failed to query https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/311141556126:s3tablescatalog:testnamespace/namespaces, http error 404 thrown. Message: {"error":{"code":404,"message":"Catalog not found.","type":"EntityNotFoundException"}} My pyiceberg script only sees glue-databases also. Not the s3table namespace. region = 'eu-west-1' rest_catalog = load_catalog( "testnamespace", **{ "type": "rest", "uri": f"https://glue.{region}.amazonaws.com/iceberg", "rest.sigv4-enabled": "true", "rest.signing-name": "glue", "rest.signing-region": region } ) print(rest_catalog.list_namespaces()) Time for some more investigation

I wrote my previous article about Duckberg, a combination of PyIceberg with Duckdb to access iceberg tables on aws.
But on Pi-Day (3 March) DuckDB released a preview with native iceberg glue integration for read and write. I'll show you how this works
AWS, create S3 table bucket
- Open S3
- Select Table Bucket
- Enable the integration with AWS analytics services once.
- Create Table Bucket
- Give it a name like
testtablebucket
AWS, create s3 Table
- Select the bucket you created before
- Click Create Table with Athena
- Create namespace or select an existing one. This is similar to a Glue Database, but not the same. Create
testnamespace
- This opens Athena
- If you didn't do this before, create a bucket and edit the workgroup to use this bucket as output location
- Execute the Create Table Statement
-
CREATE TABLE `testnamespace`.daily_sales (
sale_date date,
product_category string,
sales_amount double)
PARTITIONED BY (month(sale_date))
TBLPROPERTIES ('table_type' = 'iceberg')
- Create a new query to insert some test values
INSERT INTO daily_sales
VALUES
(DATE '2024-01-15', 'Laptop', 900.00),
(DATE '2024-01-15', 'Monitor', 250.00),
(DATE '2024-01-16', 'Laptop', 1350.00),
(DATE '2024-02-01', 'Monitor', 300.00),
(DATE '2024-02-01', 'Keyboard', 60.00),
(DATE '2024-02-02', 'Mouse', 25.00),
(DATE '2024-02-02', 'Laptop', 1050.00),
(DATE '2024-02-03', 'Laptop', 1200.00),
(DATE '2024-02-03', 'Monitor', 375.00);
- Test that the data is correctly loaded
SELECT
product_category,
COUNT(*) as units_sold,
SUM(sales_amount) as total_revenue,
AVG(sales_amount) as average_price
FROM daily_sales
WHERE sale_date BETWEEN DATE '2024-02-01' and DATE '2024-02-29'
GROUP BY product_category
ORDER BY total_revenue DESC;
Duckdb, setup
This follows the duckdb blog.
Set the credentials in your console, with export AWS_PROFILE=myuser
or the aws key credentials
Make sure you have duckdb 1.21 or newer installed.
Run the new duckdb webinterface with : duckdb -ui
FORCE INSTALL aws FROM core_nightly;
FORCE INSTALL httpfs FROM core_nightly;
FORCE INSTALL iceberg FROM core_nightly;
CREATE SECRET (
TYPE s3,
PROVIDER credential_chain
);
ATTACH 'arn:aws:s3tables:eu-west-1::bucket/testtablebucket'
AS s3_tables_db (
TYPE iceberg,
ENDPOINT_TYPE s3_tables
);
Replace the with your account id and use the correct s3 table bucket
Run SHOW ALL TABLES;
to view the internal tables and the available s3 tables
Duckdb , Query Data
Run: FROM s3_tables_db.testnamespace.daily_sales;
Duckdb , Alter data
In the blogpost there is a option to use alter table, unfortunatly i still get: Not implemented Error: Alter Schema Entry
alter table s3_tables_db.testnamespace.daily_sales
add column test STRING;
Maybe this works only in the iceberg rest method.
Duckdb , Attach Catalog via iceberg
The steps before access the tables directly on S3 Tables, but the latest extension also support the Iceberg Rest API from Glue Catalog/AWS Sagemaker Lakehouse to access the data through a standardized method
ATTACH '311141556126:s3tablescatalog/testnamespace'
AS glue_db (
TYPE iceberg,
ENDPOINT_TYPE glue
);
show all tables;
Gives me:
IO Error: Failed to query https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/311141556126:s3tablescatalog:testnamespace/namespaces, http error 404 thrown. Message: {"error":{"code":404,"message":"Catalog not found.","type":"EntityNotFoundException"}}
My pyiceberg script only sees glue-databases also. Not the s3table namespace.
region = 'eu-west-1'
rest_catalog = load_catalog(
"testnamespace",
**{
"type": "rest",
"uri": f"https://glue.{region}.amazonaws.com/iceberg",
"rest.sigv4-enabled": "true",
"rest.signing-name": "glue",
"rest.signing-region": region
}
)
print(rest_catalog.list_namespaces())
Time for some more investigation