Skills Operations
Deployment, secret-readiness, collector invocation, smoke-check, and rollback runbook.
Skills Operations
Last updated: 2026-05-17
Use this runbook to deploy Skills, confirm data collection readiness, and avoid leaking secret material.
Safety Rules
- Never print PATs, passwords, Cognito tokens,
.envvalues, or raw deployment logs that could contain credentials. - Treat AWS Secrets Manager values as secret-bearing even when checking whether a secret is ready.
- Enable collector schedules only when the target environment has an approved Azure DevOps PAT secret.
- Production must stay on disabled collector schedules until the production secret has been replaced with an approved value.
Safe Secret Readiness Check
This pattern checks whether a secret still has the scaffold replacement marker without printing the secret:
aws secretsmanager get-secret-value \
--secret-id "$SKILLS_PAT_SECRET_ID" \
--query SecretString \
--output text |
node -e 'let input="";process.stdin.on("data",(c)=>input+=c);process.stdin.on("end",()=>{process.stdout.write(input.includes("REPLACE_WITH_APPROVED_AZURE_DEVOPS_PAT")?"placeholder\n":"non-placeholder\n")})'
Use AWS_PROFILE and AWS_REGION for the intended account before running the check.
Deploy
Development deploy with collector schedules enabled requires the existing approved dev secret ARN:
AWS_PROFILE=iDPCC-DEV-New \
AWS_REGION=us-east-1 \
STACK_NAME=skillz-idpcc-dev \
ENVIRONMENT_NAME=skills-idpcc \
COGNITO_DOMAIN_PREFIX=skills-idpcc-dev-852507783007 \
AZURE_DEVOPS_PAT_SECRET_ARN="$DEV_SKILLS_PAT_SECRET_ARN" \
DASHBOARD_DOMAIN_NAME=skills.idpcc.ceirr-network-dev.org \
DASHBOARD_HOSTED_ZONE_ID=Z05416731M9AYVKD2JHC \
scripts/deploy-skills-aws.sh
Production deploy keeps schedules disabled unless AZURE_DEVOPS_PAT_SECRET_ARN is explicitly set to an approved production secret:
AWS_PROFILE=iDPCC-PROD-PowerUser \
AWS_REGION=us-east-1 \
STACK_NAME=skillz-idpcc-prod \
ENVIRONMENT_NAME=skills-idpcc-prod \
COGNITO_DOMAIN_PREFIX=skills-idpcc-prod-643366142385 \
AZURE_DEVOPS_PAT_SECRET_NAME=skills/idpcc/prod/azure-devops-pat \
DASHBOARD_DOMAIN_NAME=skills.idpcc.ceirr-network.org \
DASHBOARD_HOSTED_ZONE_ID=Z03823772NH0I14KS76J9 \
scripts/deploy-skills-aws.sh
Manual Collector Run
After a development deploy with an approved secret, invoke the collector once before trusting the schedule:
aws lambda invoke \
--function-name "$SKILLS_COLLECTOR_FUNCTION_NAME" \
--payload '{}' \
--cli-binary-format raw-in-base64-out \
/tmp/skills-collector-result.json
Then run the recommendation function:
aws lambda invoke \
--function-name "$SKILLS_RECOMMENDATION_FUNCTION_NAME" \
--payload '{}' \
--cli-binary-format raw-in-base64-out \
/tmp/skills-recommendation-result.json
The Lambda responses should contain counts and status fields only. If an invocation fails because of source access or secret readiness, disable schedules before finishing the ship.
Smoke Checks
Run these checks after each deploy:
curl -fsSI https://skills.idpcc.ceirr-network-dev.org/
curl -fsSI https://skills.idpcc.ceirr-network.org/
curl -fsS https://skills.idpcc.ceirr-network.org/data/skills.json
When the CloudFront default distribution domain is known, confirm it redirects to the meaningful hostname:
curl -sI "https://$SKILLS_CLOUDFRONT_DOMAIN/index.html?host-check=1" | sed -n '1,8p'
Expected result: 301 with a Location header on the skills.idpcc.ceirr-network... hostname.
Rollback
- Static-site rollback: redeploy from the previous merge commit and invalidate CloudFront.
- Collector safety rollback: redeploy without
AZURE_DEVOPS_PAT_SECRET_ARN, or setCOLLECTOR_SCHEDULE_STATE=DISABLEDandRECOMMENDATION_SCHEDULE_STATE=DISABLED. - CloudFront hostname issue: remove the custom-domain parameters only for an emergency test stack; production should normally keep the meaningful hostname.
Record every rollback or schedule-state change in Skills Status.
Note: the CloudFormation stack names still carry the original pre-rename spelling so existing Cognito users, DynamoDB data, and CloudFront resources are updated in place instead of recreated.