20240828

How to download a file from Amazon S3 in GitHub Actions

Motivation

Git is designed to manage text files, not binary files, and it's generally not recommended to have binary files that tend to have large sizes.

About large files on GitHub - GitHub Docs

GitHub limits the size of files allowed in repositories. If you attempt to add or update a file that is larger than 50 MiB, you will receive a warning from Git. The changes will still successfully push to your repository, but you can consider removing the commit to minimize performance impact.

GitHub blocks files larger than 100 MiB.

However, it is totally possible that in some cases, we want to perform integration tests with a machine-learning model. For example, running a classification algorithm and assert that the accuracy is above 90% or something.

and as I did some research, I think using Amazon S3 bucket for that is the most straightforward approach.

AWS CLI: cp and aws-actions/configure-aws-credentials

cp - AWS CLI 1.34.7 Command Reference

There is a convenient command we can use to copy a S3 object to a location locally or in S3.

$ aws s3 cp {s3 uri} .

aws-actions/configure-aws-credentials

This action seems to provide a concise way to configure credentials in GitHub Actions. When setting up AWS CLI, $ aws configure is what we do in general, but doing it correctly in a workflow looks a bit tricky because we need to provide multiple values as prompts.

Authenticate with IAM user credentials - AWS Command Line Interface

Using secrets in GitHub Actions

Using secrets in GitHub Actions - GitHub Docs

Repository-level secrets for GitHub Actions can be added under “Settings” > “Secrets and variables” > “Actions”.

After adding secrets, they can be accessed from a workflow by ${{ secrets.YOUR_SECRET_NAME }}

Sample workflow YAML

name: Testing

on: push

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout a commit
        uses: actions/checkout@v4

      - name: Set up AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          # loaded from GitHub secrets
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          # region is required
          aws-region: us-east-1

      - name: Download a S3 object from S3
        run: aws s3 cp s3://BUCKET_NAME/OBJECT_NAME .

Future improvements

Assume Role directly using GitHub OIDC provider

aws-actions/configure-aws-credentials

API calls to AWS need to be signed with credential information, so when you use one of the AWS SDKs or an AWS tool, you must provide it with AWS credentials and and AWS region. One way to do that in GitHub Actions is to use a repository secret with IAM credentials, but this doesn’t follow AWS security guidelines on using long term credentials. Instead, we recommend that you use a long term credential or JWT to fetch a temporary credential, and use that with your tools instead. This GitHub Action facilitates just that.

We recommend using the first option above: GitHub’s OIDC provider. This method uses OIDC to get short-lived credentials needed for your actions. See OIDC for more information on how to setup your AWS account to assume a role with OIDC.

I need some time to digest the authentication flow. For now, I'm manually rotating access keys.

Cache the S3 object

It might make sense to cache the S3 object you need to run a workflow if the object does not change frequently.

On second thought, caching does not provide much advantages because $ aws s3 cp is well optimized, and the step to copy objects finish very quickly. Also, the cost to access the S3 objects is very low and negligible.

Amazon S3 Simple Storage Service Pricing - Amazon Web Docs

You pay for requests made against your S3 buckets and objects.

S3 Standard / GET/SELECT, and all the other requests (per 1000 requests) -> $0.0004

I think keeping the workflow simple is more important than optimizing the tiny metrics. Not every engineer is great at engineering, and from my experience, I believe making it possible for most collaborators maintain the feature is critical.


Rice 400 Mexican rice 500 Protein shake 200 Sushi Salad 500 Nori 100

Total 1700 kcal

6k run


MUST:

TODO:


index 20240827 20240829