
AI Did Not Replace Junior Developers. It Created Fake Senior Code.
AI-generated backend code is not hard to reject when it is obviously broken. The harder case is code that looks organized and still skips production rules.
This study uses a small Laravel wallet-transfer project. One user sends credits to another user. The same feature is implemented twice: once as code that looks clean but misses key boundaries, and once as production-aware code with validation, authorization, transactions, row locks, failure records, logs, and tests.
The study command runs 16 scenarios against both services. The fake service passed 9 and failed 7, scoring 51.7 out of 100. The real service passed all 16 and scored 100 out of 100.
The Claim
The problem is not that AI can generate bad code. Developers wrote bad code before AI. The current risk is different: generated code can look complete before it has handled the rules that decide whether it is safe to ship.
The fake implementation in this project is not nonsense. It has a service class, readable method names, a shared result object, a numeric amount check, a negative amount check, and a successful transaction record. On the happy path, it works.
That is still not enough for wallet logic. A transfer service must check who is spending, reject invalid amounts, prevent self-transfer, keep balances consistent, record failed attempts, and update both wallets inside one database transaction.
External Context
The evidence around AI coding tools is mixed. The GitHub Copilot productivity study reported faster completion on a bounded JavaScript HTTP-server task. A later METR study on experienced open-source developers found slower completion when AI tools were allowed in familiar repositories.
Stack Overflow's 2025 Developer Survey on AI reported that more developers distrusted the accuracy of AI output than trusted it. The DORA 2025 State of AI-assisted Software Development report also points to the same practical issue: tools do not replace the engineering system around them.
This project stays smaller. It does not ask whether AI is good or bad in general. It asks whether one Laravel service survives 16 explicit backend checks.
Project Shape
The project is called fake-senior-code-laravel-wallet. It is a runnable Laravel app with migrations, models, services, a policy, tests, a scoring helper, and an Artisan command.
The domain uses Laravel's default users table and two wallet tables:
wallets: one wallet per user, with a decimal balance.wallet_transactions: one row for each successful or failed transfer attempt.
Both services expose the same method:
public function transfer(
User $actor,
int $senderId,
int $receiverId,
mixed $amount
): TransferResult;That keeps the comparison narrow. Same input. Same expected behavior. Different implementation quality.
Wallet Contract
The benchmark does not model a full payment system. It checks the minimum backend rules a wallet transfer should obey.
| Item | Rule |
|---|---|
actor | Must be allowed to spend from the sender wallet. |
sender_id | Must identify an existing wallet with enough balance. |
receiver_id | Must identify an existing wallet and must not equal the sender. |
amount | Must be numeric, greater than zero, and valid to two decimal places. |
| Balance update | Debit and credit must happen inside one database transaction. |
| Wallet rows | Rows should be locked before balance calculation. |
| Failure path | Failed attempts should be recorded with a reason. |
The Two Services
FakeSeniorTransferService looks reasonable at first glance. It checks whether the amount is numeric. It rejects negative amounts. It loads sender and receiver wallets. It checks available balance. It updates balances. It records a successful transaction.
The missing parts are the issue:
No authorization check. An actor can spend from another user's wallet.
0.00is treated as a successful transfer.Self-transfer is allowed and can corrupt the balance.
No
DB::transaction.No
lockForUpdate.Most failed attempts are not written to
wallet_transactions.
RealSeniorTransferService handles the same feature with explicit checks: amount normalization, self-transfer rejection, actor ownership, wallet policy, wallet existence checks, transaction wrapper, row locking, failed transaction rows, and warning logs.
Scenario Protocol
The command php artisan wallet:fake-senior-study creates fresh users and wallets for each case. It runs both services through the same scenario set and scores the result.
The measured scenario count is:
S = 16The core scenarios are:
successful transfer
insufficient balance
transfer to self
zero amount
negative amount
missing sender wallet
missing receiver wallet
unauthorized transfer attempt
repeated transfer attempt
transaction history consistency
failure reason logging
balance unchanged after failed transfer
The remaining checks cover happy-path coverage, failure-path coverage, structured result object, and stable service API.
Scoring
The Production Readiness Score uses a weighted checklist:
PRS =
0.20 Validation
+ 0.20 Authorization
+ 0.20 Database Consistency
+ 0.15 Failure Handling
+ 0.15 Test Coverage
+ 0.10 MaintainabilityEach category score is based on passed checks:
Category Score = Passed Checks / Total Checks x 100Measured Results
The project was run locally with:
php artisan test
php artisan wallet:fake-senior-study --jsonThe test suite passed: 20 tests and 72 assertions.
| Service | Passed | Failed | Score |
|---|---|---|---|
| Fake Senior Code | 9 | 7 | 51.7 / 100 |
| Real Senior Code | 16 | 0 | 100 / 100 |
The core scenario result:
| Scenario | Fake | Real |
|---|---|---|
| Happy path result | PASS | PASS |
| Transfer to self | FAIL | PASS |
| Zero amount | FAIL | PASS |
| Negative amount | PASS | PASS |
| Insufficient balance | PASS | PASS |
| Missing sender wallet | FAIL | PASS |
| Missing receiver wallet | FAIL | PASS |
| Unauthorized transfer attempt | FAIL | PASS |
| Repeated transfer attempt | PASS | PASS |
| Transaction history consistency | FAIL | PASS |
| Failure reason logging | FAIL | PASS |
| Balance unchanged on failed transfer | PASS | PASS |
Why Some Scores Match
The fake service scored 100% in the measured database-consistency category. That does not make it safe under production traffic.
That category only checked sequential cases: balances after success, insufficient balance, repeated transfer, and failed transfer. The fake service passed those checks. It still lacks DB::transaction and lockForUpdate. This benchmark does not run parallel workers against the same sender wallet.
A stronger version should split that category into two checks:
sequential consistency
concurrency safety
Test coverage and maintainability also match because both services expose the same method and return TransferResult. That is the point. Fake senior code can have structure and still miss the rules that matter.
Reading the Numbers
The fake service passed 9 of 16 scenarios. The passes are real. The happy path works. Negative amounts are rejected. Basic insufficient-balance behavior leaves balances unchanged.
The failures matter more for shipping. Unauthorized transfers work. Zero-value transfers are accepted. Self-transfer can corrupt the wallet balance. Missing-wallet failures are not recorded. Failure reasons are not stored. The update path has no explicit transaction or row lock.
The real service passed all 16 scenarios. That does not make it a complete payment system. It means it satisfies the contract defined in this project.
Practical Notes
The benchmark uses SQLite. Check locking behavior and performance again on MySQL or PostgreSQL before using this as a production pattern.
The concurrency scenario is not a full concurrent load test. It verifies the real service has transaction and row-lock boundaries, and it checks repeated transfer behavior sequentially. A next version should run parallel requests against the same wallet.
A real wallet system should also add idempotency keys, immutable ledger rows, currency handling, retry behavior, reconciliation, and stronger audit rules.
Conclusion
The result is simple: the fake service passed 9 of 16 scenarios and scored 51.7. The real service passed 16 of 16 and scored 100.
The fake service is useful because it is not bad everywhere. It has structure. It passes simple checks. It fails when the test asks about ownership, invalid amounts, failure history, and database boundaries.
For AI-assisted backend work, do not stop at a service class and a success response. Define the production rules and test them. In this Laravel project, 16 scenarios were enough to show the difference.
For anyone who wants to dive deeper, I’ve attached the full study as a PDF here with all its details.

