Total Revenue
β
All time
Active Keys
β
Currently valid
Total Keys
β
All time issued
Expiring Soon
β
Within 48 hours
Live Skill Rating
π
β
β
Auto-updates every 30 min
Rating History
Collecting data...
Waiting for first auto-refresh reading...
Recent Orders
| Date | Plan | Amount | Key | |
|---|---|---|---|---|
| Loading... | ||||
Revenue Chart
π Overview Guide
Total Revenue
Total money earned from all Stripe purchases ever made.
Active Keys
Number of keys that are currently valid β not expired, not revoked.
Total Keys
Every key ever generated including expired and revoked ones.
Expiring Soon
Keys expiring within the next 48 hours. Customers may need reminders.
Recent Orders
The last 8 purchases made through the website. Shows who bought what plan.
All Orders
| Date | Plan | Amount | Session ID | |
|---|---|---|---|---|
| Loading... | ||||
π Orders Guide
Session ID
The unique Stripe transaction ID. Use this if you need to look up a payment in Stripe dashboard.
Search
Filter orders by customer email to find a specific person's purchase history.
Plans
Daily = $15 (3 days), Weekly = $20 (7d), Monthly = $50 (30d), Lifetime = $200 (forever).
Key Manager
β± Bulk Extend All Active Keys
Adds days to every non-revoked, non-lifetime key at once. Use after downtime or updates to compensate customers. Lifetime keys are automatically skipped.
| Key | Plan | Status | HWID | Expires | Actions |
|---|---|---|---|---|---|
| Loading... | |||||
π Key Manager Guide
HWID Bound
Once a customer activates their key, it locks to their PC hardware. "Unbound" means they haven't used it yet.
Reset HWID
Clears the machine binding so a customer can activate on a new PC. Use when they upgrade or reinstall.
Extend
Add days to an existing key without revoking it. Great for comp prizes, loyalty rewards, or fixing issues.
Revoke
Instantly cuts off a customer's access. Use for chargebacks, rule violations, or fraud.
Restore
Re-enables a previously revoked key. Use if you revoked by mistake or resolved a dispute.
Bulk Generate
Generate up to 50 keys at once. All keys appear in the display box β copy them for giveaways or events.
Customers
| Plan | Key | Status | Expires | Notes | |
|---|---|---|---|---|---|
| Loading... | |||||
π Customers Guide
Status: Active
Customer's key is valid and working. They have full access to APEX.
Status: Expired
Their key ran out. They need to buy a new plan to regain access.
Status: Revoked
Access was manually cut off by staff. Go to Key Manager to restore if needed.
Notes
Private staff notes on a customer. Click the note field to edit. Saved automatically. Never visible to customers.
Discord Lookup
Enter a Discord User ID to find all keys associated with that person.
π How to Find a Discord User ID
Enable Developer Mode
In Discord, go to Settings β Advanced β turn on Developer Mode.
Copy User ID
Right-click any user's name in Discord β Click "Copy User ID". Paste it in the field above.
Announcements
Post a banner announcement on the public website. Customers will see it at the top of the page.
π Announcements Guide
Post Announcement
Saves the text to Firebase and immediately shows it as a banner at the top of apexbot.store for all visitors.
Clear Announcement
Removes the banner from the website instantly. Use when the announcement is no longer relevant.
Good uses
New training milestone, upcoming maintenance, limited time discount, new feature launch, major bot update.
Launcher
Push a new version label to all running launchers. The label updates on customers' next launcher startup.
π Launcher Guide
Version Label
The text shown in the launcher's "VERSION" stat box (e.g. "BETA 0.6"). Customers see this when they open the launcher.
Channel
For future use β beta/stable/dev channels. For now just informational; doesn't change customer behavior.
When does it update
Customers see the new version on their next launcher startup. The endpoint is edge-cached for 60 seconds, so it can take up to a minute to propagate worldwide.
When to update
Whenever you ship new launcher logic, a new SDK build, a new model β anything customers should know is "the new version".
Training Monitor
Loading wandb data...
π
Skill Rating β 1v1
β
β
Estimated competitive MMR based on simulated 1v1 performance
Manual Override
Override shown on website rank card
Total Timesteps
β
β
Total Iterations
β
PPO updates
Overall Steps/Sec
β
Training speed
Episode Length
β
Avg game length
Avg Step Reward
β
Higher = better
Policy Entropy
β
Exploration level
Policy Update Mag
β
Learning rate indicator
Critic Update Mag
β
Value learning
Collection Steps/Sec
β
Data gathering speed
Consumption Steps/Sec
β
Learning speed
Player Speed
β
Avg velocity
In Air Ratio
β
Time airborne
Ball Touch Ratio
β
Touch frequency
Run Info
Run Name
β
Project
gigalearncpp
Entity
ttvabstractonyt-apex-reinforcement-learning
Last Updated
β
Rating Trends
Multi-window trends β 1h / 6h / 24h / 7d, velocity, peak.
1 Hour Avg
β
β
6 Hour Avg
β
β
24 Hour Avg
β
β
7 Day Avg
β
β
All-Time Peak
β
β
Velocity (6h)
β
points/hour
Direction
β
β
Min (30d)
β
rollback reference
Rating Chart
Orange = rating Β· Gold dashed = 24h rolling avg Β· Vertical dashed = deploy markers
Rollback Triggers
Click Refresh to evaluate.
Deploy Log
| When | Phase | Rating @ Deploy | Current Ξ | Description |
|---|---|---|---|---|
| No deploys logged yet. | ||||
Reward Health
Fire-rate-based stats β replaces the v1 βlatest valueβ check that falsely flagged sparse rewards as dormant.
Active
β
fires >5% of samples
Sparse
β
0.1β5% β events
Rare
β
<0.1% β rare events
Dormant
β
weight>0, zero fires
| Reward | Weight | Fire Rate | Mean (firing) | Peak | Effective | Status |
|---|---|---|---|---|---|---|
| Click Refresh to load. | ||||||
β Reward Weight Config
[expand]
β Deployed Rewards Allowlist
[expand]
Aerial Engagement
Score
β
/ 10
Sanity Checks
Loading...
Reward Landscape
Effective signal weight by group. Intent-check: does the landscape match what you meant to train?
| Group | Rewards | Effective Sum | % of Total | Share |
|---|---|---|---|---|
| Click Refresh. | ||||
β Reward Group Config
[expand]
Behavior Metrics
Current vs pre-deploy baseline.
| Metric | Current | Baseline | Ξ (abs) | Ξ % |
|---|---|---|---|---|
| Click Refresh. | ||||
π Training Stats Guide
Core Training Metrics
Skill Rating (1v1)
Estimated MMR from internal self-play 1v1 matches against past checkpoints. Higher = stronger. SSL starts ~2200. Climbing = improving; stalled or dropping = regression.
Total Timesteps
Simulation steps trained on. Accumulates forever. Useful for comparing run progress: ~10B steps = early, ~50B+ = mature.
Total Iterations
PPO network update cycles. Each iteration batches 50k steps, runs a policy + critic update. 1M+ iterations = deep training.
Steps/Sec (Overall)
End-to-end throughput. GPU target is 50kβ100k+. Below 20k suggests CPU bottleneck or heavy reward computation.
Collection SPS
How fast RocketSim generates gameplay data. Should be close to Overall SPS β if much lower, the env is the bottleneck.
Consumption SPS
How fast the neural net learns from collected data. Should exceed Collection SPS; if not, GPU is idle waiting for data.
Avg Step Reward
Average reward per step averaged across all agents. Expected to rise over training. Sudden drops = regression or reward weight change.
Policy Entropy
How much exploration the bot is doing. Healthy zone is 0.4β0.7. Below 0.3 = overconfident / stuck. Above 1.5 = too random / broken.
Policy Update Mag
Size of weight changes each iteration. 0.01β0.05 is healthy. Spikes above 0.1 = instability; near-zero = stuck / learning collapsed.
Critic Update Mag
Same for the value estimator. Should track Policy mag roughly. Big divergence between the two signals instability.
Episode Length
Average game length in steps before a reset (goal or timeout). Longer = more meaningful gameplay per episode. 500+ is typical; rising usually means less aimless behavior.
Player Speed
Average velocity across all agents. ~1000β1300 = active play. Dropping sharply = agents stopped moving (bad).
In Air Ratio
Fraction of time airborne. 40%β60% is healthy for aerial-heavy training. Above 68% = likely abusing AirReward instead of landing plays.
Ball Touch Ratio
Fraction of steps where a ball touch occurred. Low (2β4%) is normal β ball is only touched briefly. Near zero = passive / not engaging.
Rating & Trends Panels
1h/6h/24h/7d Avg
Rolling averages of skill rating. The delta vs current tells you if the bot is above or below its recent trend. Current above 6h = momentum up; below 24h = softening.
All-Time Peak
Highest rating ever reached in this run, with how long ago. If peak is recent and current is close to it, you're at the top. If peak is days old, you're in a drawdown.
Velocity (6h)
Points gained or lost per hour from a linear fit over the last 6 hours. Positive = climbing, negative = declining. Β±1 pt/hr is stable.
Direction
CLIMBING (strongly up), RECOVERING (modestly up from a dip), STABLE (sideways), SOFTENING (modestly down), DECLINING (sharp down). Quick health indicator.
30-Day Min
Lowest rating in the last 30 days. Acts as a floor reference for rollback decisions β if you dip near this, something regressed.
Rating Chart
Orange line = raw rating. Gold dashed = 24-hour rolling average (smoothed). Vertical orange dashed lines = deploy markers from the Deploy Log. Gold dot = peak, bigger orange dot = current.
Reward Panels
Fire Rate
Percentage of sampled steps where the reward was non-zero. Continuous rewards (velocity, air) should be ~100%. Event rewards (goals, flip resets) are expected to be much lower.
Status: ACTIVE
Fire rate β₯ 5%. Reward is contributing to the training signal regularly. This is the default healthy state for continuous rewards.
Status: SPARSE
Fire rate 0.1%β5%. Normal for event-based rewards (flip resets, double taps, ceiling shots). Seeing SPARSE on an event reward is good.
Status: RARE
Fire rate below 0.1%. Very rare events β should be reserved for milestones. If a reward you expect often shows RARE, investigate state setters or terminal conditions.
Status: DORMANT
Zero fires in the sampled window despite weight > 0. Means sampling missed all firings OR the reward's state setters never trigger it. Cross-reference with the wandb chart before assuming it's broken.
Status: DISABLED
Weight is 0. Reward is loaded but not contributing. Either intentional (turned off) or needs its weight set.
Mean (firing)
Average value when the reward does fire. Tells you the magnitude. A reward firing 1% of time with mean 0.5 may be more impactful than one firing 100% of time with mean 0.001.
Effective
fire_rate Γ mean Γ weight β the actual signal contribution. This is what matters for training. Sort by this to see which rewards dominate.
Reward Landscape
Aggregate Effective signal by group (ground / aerial / defense / kickoff / terminal). Intent check: does the group balance match what you meant to train? If you're pushing aerial but ground is 60%, you're actually training ground.
Aerial Engagement Score
Composite 0β10 score combining In-Air Ratio, aerial signal share of total, and AirDribbleChain fire rate. Rough target is 7+. Higher = aerial behavior is firing as intended.
Stability & Deploys
Rollback Triggers
Hard rules (floors, entropy range, velocity). If any trigger fires, the banner turns red. This does NOT auto-rollback β it only tells you to evaluate. Rollback decisions are always manual.
Sanity Checks
Broader health indicators beyond rollback triggers: update magnitudes, SPS targets, wandb freshness, in-air ceiling. β pass, β watch, β fail. A few β is normal; any β warrants attention.
Deploy Log
Record of every reward weight / state distribution change you've made. Each entry captures the rating and metrics at deploy time (the "baseline"). Log a deploy every time you change reward weights or state setters.
Behavior Metrics
Current values vs the baseline from your latest Deploy Log entry. Shows what's changed since your last phase. If In Air Ratio is +4pp and rating dropped, the phase change likely caused regression.
Capture Baseline
Shortcut button on Behavior Metrics β saves current metrics as a baseline without requiring a full deploy entry. Use this if you just want a quick "remember this state" reference.
Copy Full Snapshot
Generates a complete markdown report (every panel above) and copies it to your clipboard. Paste into your training chat so another Claude instance has full context for Phase change decisions.
Typical Workflows
"Is training OK right now?"
Scroll to Rollback Triggers + Sanity Checks. Green across the board = fine. Then glance at Rating Trends Direction β CLIMBING or RECOVERING is good.
"Did my last change help or hurt?"
Look at Rating Chart (deploy marker shows when). Check Behavior Metrics β which metrics shifted? Cross-reference with what your phase change targeted.
"Should I make a new change?"
Check Direction. If it's STABLE for hours and velocity ~0 = plateau, ready for the next phase. If CLIMBING or DECLINING = wait for it to stabilize first.
"Making a phase change"
Click Log Deploy β fill in phase name + description of weight changes β Save. This captures baseline + places a marker on the chart. Then edit your training code and restart.
"Decision time, need Claude's help"
Click Copy Full Snapshot β paste into training chat β ask for phase recommendation. The markdown contains everything Claude needs.
SDK DLL
Ready.
Single slot. Uploading replaces the live DLL atomically. Launchers fetch the new build on next startup.
Bot Library
Loading...
How it works
Bot library lives in Firebase /bots/. Each bot has metadata (name, tags, inference shape) and a list of versions.
Model files live in R2 at bots/<id>/<version>.onnx.The launcher reads
/api/bots/list on startup, shows the user a picker, then fetches the chosen bot via /api/model/fetch.
Default bot is what new launchers select before the user picks anything.
Current version is the version served when no specific version is requested.