Silicon Valley AI bait-and-switch

vbresults

Well-known member
Licensed customer
There has been a lot of talk about the US H-1B visa fee being increased from $5,000 to $100,000. What was surprising was this policy's positive reception by Silicon Valley executives. At first, one would think they were just being sycophantic or trying to avoid upsetting Trump.

When AI started going mainstream, I had a realization. Yes, AI is not ready to replace most white-collar work. Instead, these companies are offshoring work to humans who stay in their home country and work at a far lower salary. The new H-1B visa pricing accelerated this strategy.

They can now do aggressive mass-layoffs, displace the blame to "AI" while using offshore consulting firms to obscure cheap new hires, and evade political consequences and social backlash at home. Silicon Valley is not betting on flawed AI, but on offshored AI-augmented humans.

They are making use of AI, partly as malicious compliance. Many think mass displacement requires AGI and it does not, look at Waymo. This blows open the doors to offshoring in-person jobs heavily occupied by new immigrants like taxis and trucking and white-collar jobs.

Right now, AI works best with human augmentation. It's diabolical, but really clever.. and it seems to be working?
 
Last edited:
Personally.. i could care less about silicon valley.
Their software quality going to crap lately and so much of the output is VC funded too, meaning it's guaranteed that they'll do a first one's cheap, last one's super expensive kind of pricing play.

I got so tired of using their crap as an IT company that i started my own dev shop, then software company, just to get out from under them + give others an escape hatch too someday.

We use AI at our shop for assist, but it is not pivotal so far; to preserve the level of code quality and attention to detail means there's still a lot of manual work to do.. it also involves that we ask our programmers to think differently.. but stay balanced!


On the topic of AI, the signal to noise ratio from Silicon Valley's region is awful.. most of these AI companies are cash burn operations, and if you get dependent on them, one day you are going to get surprised by the real retail cost, lol.

Also whatever you feed into them creates multiple logs and we know how good American cybersecurity is today :LOL:

To really sour the deal, we find out that all these AI companies are getting used for automated weapons.. which is morally repugnant to me.

For these reasons we built our own AI system ( 1100W ), and run a 197B open source AI model on it.. providing us with what was basically state of the art last year level quality and speed.

baby agi 2.5.webp


..for a mere $13k 🤕

But.... small price to pay to get the capability for a dev shop of 8 people.. privacy for our clients.. and to not have to give these companies a dime.
 
Apple hardware:
  • reasonably efficient
  • gobs of ram
  • ok speed ( m5 and up )
  • expensive
  • little benefit in parallelizing multiple units

Nvidia hardware:
  • reasonably efficient but out of the box usually tuned to use 33% more power for 10-15% more perf ( if this is tuned down, it's very close to apple's efficiency )
  • not enough ram
  • awesome speed
  • expensive
  • huge benefit in parallelizing multiple cards

The larger the AI model you want to run, the more speed you need. This puts Apple in an awkward place where it has the ram to run a big model, but not the performance.

Case in point:
Mac M5 Max running GPT OSS 120b: 78 tokens/sec
RTX PRO 6000 running GPT OSS 120b: 220 tokens/sec
Commercial quality speed: 60-300 tokens/sec

Now consider that my rig must:
  • provide near commercial intelligence quality ( ~200B model ) because my users are all senior developers and won't benefit from subpar AI
  • provide near commercial speed for 2 users at once, or be very snappy for 1

For this case, Nvidia makes more sense.
If you don't have such intense needs, you could definitely get by with an Apple for slightly less money.
 
@ES Dev Team GPT OSS 120b is a MoE model designed to use less compute and post-trained on MXFP4 to consume less RAM, and this comes at the cost of consistency and quality as it compounds errors.

With 512GB and 80 GPU cores on the M3 Ultra (not Max with 40), you should be able to run 200B dense (FP16/no quantization) with bigger context.

That results in less manual tuning of the output, so while you have slower token/s, the result should be vastly superior. Tokens per second are not a 1:1 comparison in this scenario, and total time and quality is something you can benchmark with your team.

The upcoming M5 Ultra apparently has 1.2TB/s memory bandwidth and is also reported to be tuned to increase prompting performance.
 
Last edited:
Yeah i know a lot about open source models, been running them for over a year now, just not at this scale.

GPT OSS 120b is a good model for programming, but is based on ChatGPT 4 era technology, so it is prone to hallucination. This means you really need to direct it.
We run Step 3.5 Flash 197B. It's a substantial step up, and, outside of a lower 'breadth of information' versus Deepseek v3.2/R1, it's coding capabilities are outstanding and feels like a much bigger model.

It's written very good documentation for our software, better than i write, so it's killer at english too!

If what you are saying is correct about the M5 Ultra, then it should be 70%-90% as fast as one of my 5090 or RTX PRO 6000.
What'd be unfortunate is if it's priced accordingly... which i'm betting is the case.

But yeah, as of this year.. it's going to be feasible for more people to run their own AI systems. Next year, moreso. It makes me wonder how that's going to affect these big AI companies.. when you 'have chatGPT at home'
 
If what you are saying is correct about the M5 Ultra, then it should be 70%-90% as fast as one of my 5090 or RTX PRO 6000.
What'd be unfortunate is if it's priced accordingly... which i'm betting is the case.
A 5090 or 6000 requires a motherboard, RAM, disk, PSU, man hours for diagnostics, assembly/disassembly of the PC and exercising the warrant(ies?) if one or more parts fail (with associated downtime for each), and a higher electric bill sprinkled on top.

The M3/5 Ultra is a complete unit. This should be factored into the price when making comparisons. Also, as your team grows, it should be substantially less complex to add another M Ultra than it is another RTX.
But yeah, as of this year.. it's going to be feasible for more people to run their own AI systems. Next year, moreso. It makes me wonder how that's going to affect these big AI companies.. when you 'have chatGPT at home'
I've thought since the beginning that profit was going to shift away from software (including AI companies) to hardware manufacturers, as software no longer has a moat, but could not figure out when it would happen.

I did not expect Chinese companies to in one way or another open-source frontier models and now here we are. Just think about how that single move completely screwed over OpenAI. What a time to be alive.
 
Last edited:
Back
Top Bottom