APEX Admin

Total Revenue

—

All time

Active Keys

—

Currently valid

Total Keys

—

All time issued

Expiring Soon

—

Within 48 hours

```

Live Skill Rating

👑

—

Auto-updates every 30 min

Rating History

Collecting data...

Waiting for first auto-refresh reading...

Recent Orders

Date	Email	Plan	Amount	Key
Loading...

Revenue Chart

📖 Overview Guide

Total Revenue

Total money earned from all Stripe purchases ever made.

Active Keys

Number of keys that are currently valid — not expired, not revoked.

Total Keys

Every key ever generated including expired and revoked ones.

Expiring Soon

Keys expiring within the next 48 hours. Customers may need reminders.

Recent Orders

The last 8 purchases made through the website. Shows who bought what plan.

All Orders

Date	Email	Plan	Amount	Session ID
Loading...

📖 Orders Guide

Session ID

The unique Stripe transaction ID. Use this if you need to look up a payment in Stripe dashboard.

Filter orders by customer email to find a specific person's purchase history.

Plans

Daily = $15 (3 days), Weekly = $20 (7d), Monthly = $50 (30d), Lifetime = $200 (forever).

Key Manager

Show Revoked

⏱ Bulk Extend All Active Keys

Adds days to every non-revoked, non-lifetime key at once. Use after downtime or updates to compensate customers. Lifetime keys are automatically skipped.

Amount to Add

Unit

Reason (optional)

Plan Type

Quantity

Note (optional)

Key	Plan	Status	HWID	Expires	Actions
Loading...

📖 Key Manager Guide

HWID Bound

Once a customer activates their key, it locks to their PC hardware. "Unbound" means they haven't used it yet.

Reset HWID

Clears the machine binding so a customer can activate on a new PC. Use when they upgrade or reinstall.

Extend

Add days to an existing key without revoking it. Great for comp prizes, loyalty rewards, or fixing issues.

Revoke

Instantly cuts off a customer's access. Use for chargebacks, rule violations, or fraud.

Restore

Re-enables a previously revoked key. Use if you revoked by mistake or resolved a dispute.

Bulk Generate

Generate up to 50 keys at once. All keys appear in the display box — copy them for giveaways or events.

Customers

Email	Plan	Key	Status	Expires	Notes
Loading...

📖 Customers Guide

Status: Active

Customer's key is valid and working. They have full access to APEX.

Status: Expired

Their key ran out. They need to buy a new plan to regain access.

Status: Revoked

Access was manually cut off by staff. Go to Key Manager to restore if needed.

Notes

Private staff notes on a customer. Click the note field to edit. Saved automatically. Never visible to customers.

Discord Lookup

Enter a Discord User ID to find all keys associated with that person.

📖 How to Find a Discord User ID

Enable Developer Mode

In Discord, go to Settings → Advanced → turn on Developer Mode.

Copy User ID

Right-click any user's name in Discord → Click "Copy User ID". Paste it in the field above.

Announcements

Post a banner announcement on the public website. Customers will see it at the top of the page.

Announcement Text

📖 Announcements Guide

Post Announcement

Saves the text to Firebase and immediately shows it as a banner at the top of apexbot.store for all visitors.

Clear Announcement

Removes the banner from the website instantly. Use when the announcement is no longer relevant.

Good uses

New training milestone, upcoming maintenance, limited time discount, new feature launch, major bot update.

Launcher

Push a new version label to all running launchers. The label updates on customers' next launcher startup.

Version Label

Channel

📖 Launcher Guide

Version Label

The text shown in the launcher's "VERSION" stat box (e.g. "BETA 0.6"). Customers see this when they open the launcher.

Channel

For future use — beta/stable/dev channels. For now just informational; doesn't change customer behavior.

When does it update

Customers see the new version on their next launcher startup. The endpoint is edge-cached for 60 seconds, so it can take up to a minute to propagate worldwide.

When to update

Whenever you ship new launcher logic, a new SDK build, a new model — anything customers should know is "the new version".

Training Monitor

Loading wandb data...

👑

Skill Rating — 1v1

—

Estimated competitive MMR based on simulated 1v1 performance

Manual Override

Override shown on website rank card

Total Timesteps

—

Total Iterations

—

PPO updates

Overall Steps/Sec

—

Training speed

Episode Length

—

Avg game length

Avg Step Reward

—

Higher = better

Policy Entropy

—

Exploration level

Policy Update Mag

—

Learning rate indicator

Critic Update Mag

—

Value learning

Collection Steps/Sec

—

Data gathering speed

Consumption Steps/Sec

—

Learning speed

Player Speed

—

Avg velocity

In Air Ratio

—

Time airborne

Ball Touch Ratio

—

Touch frequency

Run Info

Run Name

—

Project

gigalearncpp

Entity

ttvabstractonyt-apex-reinforcement-learning

Last Updated

—

Rating Trends

Multi-window trends — 1h / 6h / 24h / 7d, velocity, peak.

1 Hour Avg

—

6 Hour Avg

—

24 Hour Avg

—

7 Day Avg

—

All-Time Peak

—

Velocity (6h)

—

points/hour

Direction

—

Min (30d)

—

rollback reference

Rating Chart

Orange = rating · Gold dashed = 24h rolling avg · Vertical dashed = deploy markers

Rollback Triggers

Click Refresh to evaluate.

Deploy Log

When	Phase	Rating @ Deploy	Current Δ	Description
No deploys logged yet.

Reward Health

Fire-rate-based stats — replaces the v1 “latest value” check that falsely flagged sparse rewards as dormant.

Active

—

fires >5% of samples

Sparse

—

0.1–5% — events

Rare

—

<0.1% — rare events

Dormant

—

weight>0, zero fires

Reward	Weight	Fire Rate	Mean (firing)	Peak	Effective	Status
Click Refresh to load.

⚙ Reward Weight Config

[expand]

⚙ Deployed Rewards Allowlist

[expand]

Aerial Engagement

Score

—

/ 10

Sanity Checks

Reward Landscape

Effective signal weight by group. Intent-check: does the landscape match what you meant to train?

Group	Rewards	Effective Sum	% of Total	Share
Click Refresh.

⚙ Reward Group Config

[expand]

Behavior Metrics

Current vs pre-deploy baseline.

Metric	Current	Baseline	Δ (abs)	Δ %
Click Refresh.

📖 Training Stats Guide

Core Training Metrics

Skill Rating (1v1)

Estimated MMR from internal self-play 1v1 matches against past checkpoints. Higher = stronger. SSL starts ~2200. Climbing = improving; stalled or dropping = regression.

Total Timesteps

Simulation steps trained on. Accumulates forever. Useful for comparing run progress: ~10B steps = early, ~50B+ = mature.

Total Iterations

PPO network update cycles. Each iteration batches 50k steps, runs a policy + critic update. 1M+ iterations = deep training.

Steps/Sec (Overall)

End-to-end throughput. GPU target is 50k–100k+. Below 20k suggests CPU bottleneck or heavy reward computation.

Collection SPS

How fast RocketSim generates gameplay data. Should be close to Overall SPS — if much lower, the env is the bottleneck.

Consumption SPS

How fast the neural net learns from collected data. Should exceed Collection SPS; if not, GPU is idle waiting for data.

Avg Step Reward

Average reward per step averaged across all agents. Expected to rise over training. Sudden drops = regression or reward weight change.

Policy Entropy

How much exploration the bot is doing. Healthy zone is 0.4–0.7. Below 0.3 = overconfident / stuck. Above 1.5 = too random / broken.

Policy Update Mag

Size of weight changes each iteration. 0.01–0.05 is healthy. Spikes above 0.1 = instability; near-zero = stuck / learning collapsed.

Critic Update Mag

Same for the value estimator. Should track Policy mag roughly. Big divergence between the two signals instability.

Episode Length

Average game length in steps before a reset (goal or timeout). Longer = more meaningful gameplay per episode. 500+ is typical; rising usually means less aimless behavior.

Player Speed

Average velocity across all agents. ~1000–1300 = active play. Dropping sharply = agents stopped moving (bad).

In Air Ratio

Fraction of time airborne. 40%–60% is healthy for aerial-heavy training. Above 68% = likely abusing AirReward instead of landing plays.

Ball Touch Ratio

Fraction of steps where a ball touch occurred. Low (2–4%) is normal — ball is only touched briefly. Near zero = passive / not engaging.

Rating & Trends Panels

1h/6h/24h/7d Avg

Rolling averages of skill rating. The delta vs current tells you if the bot is above or below its recent trend. Current above 6h = momentum up; below 24h = softening.

All-Time Peak

Highest rating ever reached in this run, with how long ago. If peak is recent and current is close to it, you're at the top. If peak is days old, you're in a drawdown.

Velocity (6h)

Points gained or lost per hour from a linear fit over the last 6 hours. Positive = climbing, negative = declining. ±1 pt/hr is stable.

Direction

CLIMBING (strongly up), RECOVERING (modestly up from a dip), STABLE (sideways), SOFTENING (modestly down), DECLINING (sharp down). Quick health indicator.

30-Day Min

Lowest rating in the last 30 days. Acts as a floor reference for rollback decisions — if you dip near this, something regressed.

Rating Chart

Orange line = raw rating. Gold dashed = 24-hour rolling average (smoothed). Vertical orange dashed lines = deploy markers from the Deploy Log. Gold dot = peak, bigger orange dot = current.

Reward Panels

Fire Rate

Percentage of sampled steps where the reward was non-zero. Continuous rewards (velocity, air) should be ~100%. Event rewards (goals, flip resets) are expected to be much lower.

Status: ACTIVE

Fire rate ≥ 5%. Reward is contributing to the training signal regularly. This is the default healthy state for continuous rewards.

Status: SPARSE

Fire rate 0.1%–5%. Normal for event-based rewards (flip resets, double taps, ceiling shots). Seeing SPARSE on an event reward is good.

Status: RARE

Fire rate below 0.1%. Very rare events — should be reserved for milestones. If a reward you expect often shows RARE, investigate state setters or terminal conditions.

Status: DORMANT

Zero fires in the sampled window despite weight > 0. Means sampling missed all firings OR the reward's state setters never trigger it. Cross-reference with the wandb chart before assuming it's broken.

Status: DISABLED

Weight is 0. Reward is loaded but not contributing. Either intentional (turned off) or needs its weight set.

Mean (firing)

Average value when the reward does fire. Tells you the magnitude. A reward firing 1% of time with mean 0.5 may be more impactful than one firing 100% of time with mean 0.001.

Effective

fire_rate × mean × weight — the actual signal contribution. This is what matters for training. Sort by this to see which rewards dominate.

Reward Landscape

Aggregate Effective signal by group (ground / aerial / defense / kickoff / terminal). Intent check: does the group balance match what you meant to train? If you're pushing aerial but ground is 60%, you're actually training ground.

Aerial Engagement Score

Composite 0–10 score combining In-Air Ratio, aerial signal share of total, and AirDribbleChain fire rate. Rough target is 7+. Higher = aerial behavior is firing as intended.

Stability & Deploys

Rollback Triggers

Hard rules (floors, entropy range, velocity). If any trigger fires, the banner turns red. This does NOT auto-rollback — it only tells you to evaluate. Rollback decisions are always manual.

Sanity Checks

Broader health indicators beyond rollback triggers: update magnitudes, SPS targets, wandb freshness, in-air ceiling. ✓ pass, ⚠ watch, ✗ fail. A few ⚠ is normal; any ✗ warrants attention.

Deploy Log

Record of every reward weight / state distribution change you've made. Each entry captures the rating and metrics at deploy time (the "baseline"). Log a deploy every time you change reward weights or state setters.

Behavior Metrics

Current values vs the baseline from your latest Deploy Log entry. Shows what's changed since your last phase. If In Air Ratio is +4pp and rating dropped, the phase change likely caused regression.

Capture Baseline

Shortcut button on Behavior Metrics — saves current metrics as a baseline without requiring a full deploy entry. Use this if you just want a quick "remember this state" reference.

Copy Full Snapshot

Generates a complete markdown report (every panel above) and copies it to your clipboard. Paste into your training chat so another Claude instance has full context for Phase change decisions.

Typical Workflows

"Is training OK right now?"

Scroll to Rollback Triggers + Sanity Checks. Green across the board = fine. Then glance at Rating Trends Direction — CLIMBING or RECOVERING is good.

"Did my last change help or hurt?"

Look at Rating Chart (deploy marker shows when). Check Behavior Metrics — which metrics shifted? Cross-reference with what your phase change targeted.

"Should I make a new change?"

Check Direction. If it's STABLE for hours and velocity ~0 = plateau, ready for the next phase. If CLIMBING or DECLINING = wait for it to stabilize first.

"Making a phase change"

Click Log Deploy → fill in phase name + description of weight changes → Save. This captures baseline + places a marker on the chart. Then edit your training code and restart.

"Decision time, need Claude's help"

Click Copy Full Snapshot → paste into training chat → ask for phase recommendation. The markdown contains everything Claude needs.

SDK DLL

Ready.

Single slot. Uploading replaces the live DLL atomically. Launchers fetch the new build on next startup.

Bot Library

How it works

Bot library lives in Firebase /bots/. Each bot has metadata (name, tags, inference shape) and a list of versions. Model files live in R2 at bots/<id>/<version>.onnx.

The launcher reads /api/bots/list on startup, shows the user a picker, then fetches the chosen bot via /api/model/fetch. Default bot is what new launchers select before the user picks anything. Current version is the version served when no specific version is requested.

```