Honest Take On DS Automation?

Curious about other DS’s honest take on automation of different aspects of our roles.

I work at a top tech company and we’re building a DS agent that’s too unreliable to be handed to PMs and ENG but still unlocks enormous productivity when used (and validated) by DS.

I’ve personally built two LLM-integrated statistical analysis tools that will eventually automate 40-60% of the analytical work I did last year.

I find that building and validating Python packages that cover a core area of analytical work that I do and then exposing it to Claude as a skill (along with skills that capture that judgement that I apply when interrogating analyses) gets me 80% of the way of automating a major DS responsibility. It’s much more reliable than giving Claude open agency to define and execute every aspect of an analysis. Claude without its execution compartmentalized by validated analysis templates leads to too frequently data or statistical hallucinations.

From that experience, I’m guessing that significant partial automation of junior data scientist tasks is feasible today. In 1-2 years, I would only be interested in hiring junior DS that are comfortable with fairly open ended and ambiguous analysis tasks, otherwise I can ask a senior or staff DS to do the task well once, add abstraction and parameterization, package it as a Python package, and then turn it into a Claude skill.

Is everyone else arriving to a similar conclusion?

submitted by /u/anomnib
[link] [comments]

Honest Take On DS Automation?

Want to read more?

Tagged with