Git
is designed to manage text files, not binary files,
and it's generally not recommended to have binary files that tend to
have large sizes.
About large files on GitHub - GitHub Docs
GitHub limits the size of files allowed in repositories. If you attempt to add or update a file that is larger than 50 MiB, you will receive a warning from Git. The changes will still successfully push to your repository, but you can consider removing the commit to minimize performance impact.
GitHub blocks files larger than 100 MiB.
However, it is totally possible that in some cases, we want to perform integration tests with a machine-learning model. For example, running a classification algorithm and assert that the accuracy is above 90% or something.
and as I did some research, I think using Amazon S3 bucket for that is the most straightforward approach.
cp
and
aws-actions/configure-aws-credentials
cp - AWS CLI 1.34.7 Command Reference
There is a convenient command we can use to copy a S3 object to a location locally or in S3.
$ aws s3 cp {s3 uri} .
aws-actions/configure-aws-credentials
This action seems to provide a concise way to configure credentials
in GitHub Actions. When setting up AWS CLI, $ aws configure
is what we do in general, but doing it correctly in a workflow looks a
bit tricky because we need to provide multiple values as prompts.
Authenticate with IAM user credentials - AWS Command Line Interface
Using secrets in GitHub Actions - GitHub Docs
Repository-level secrets for GitHub Actions can be added under “Settings” > “Secrets and variables” > “Actions”.
After adding secrets, they can be accessed from a workflow by
${{ secrets.YOUR_SECRET_NAME }}
name: Testing
on: push
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout a commit
uses: actions/checkout@v4
- name: Set up AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
# loaded from GitHub secrets
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
# region is required
aws-region: us-east-1
- name: Download a S3 object from S3
run: aws s3 cp s3://BUCKET_NAME/OBJECT_NAME .
aws-actions/configure-aws-credentials
API calls to AWS need to be signed with credential information, so when you use one of the AWS SDKs or an AWS tool, you must provide it with AWS credentials and and AWS region. One way to do that in GitHub Actions is to use a repository secret with IAM credentials, but this doesn’t follow AWS security guidelines on using long term credentials. Instead, we recommend that you use a long term credential or JWT to fetch a temporary credential, and use that with your tools instead. This GitHub Action facilitates just that.
We recommend using the first option above: GitHub’s OIDC provider. This method uses OIDC to get short-lived credentials needed for your actions. See OIDC for more information on how to setup your AWS account to assume a role with OIDC.
I need some time to digest the authentication flow. For now, I'm manually rotating access keys.
It might make sense to cache the S3 object you need to run a workflow if the object does not change frequently.
On second thought, caching does not provide much advantages because
$ aws s3 cp
is well optimized, and the step to copy objects
finish very quickly. Also, the cost to access the S3 objects is very low
and negligible.
Amazon S3 Simple Storage Service Pricing - Amazon Web Docs
You pay for requests made against your S3 buckets and objects.
S3 Standard / GET/SELECT, and all the other requests (per 1000 requests) -> $0.0004
I think keeping the workflow simple is more important than optimizing the tiny metrics. Not every engineer is great at engineering, and from my experience, I believe making it possible for most collaborators maintain the feature is critical.
Rice 400 Mexican rice 500 Protein shake 200 Sushi Salad 500 Nori 100
Total 1700 kcal
6k run
MUST:
TODO: