• intensely_human
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 months ago

    I mean, I hate myself for being this pedantic but technically there is code. But the code to run an LLM as it trains or generates responses is almost analogous to the hardware in the traditional hardware/software split.

    I guess the layers are:

    • Actual hardware: GPUs etc
    • ”The algorithm” / “The software hardware”: Matrix math, back propagation, etc
    • The configuration: a number of layers, number of parameters, etc
    • The … test suite?: training dataset
    • The app: a trained model
    • Data: prompts, including the prompt that is the entire conversation so far

    I dunno. It’s harder than I thought to make an analogy between these layers.

    • Kg. Madee Ⅱ.@mathstodon.xyz
      link
      fedilink
      arrow-up
      3
      ·
      3 months ago

      @intensely_human yes, that’s about what I meant: you can’t make any directed changes to the actual code level, so the vendor has to make their customization at the same data level that users make their inputs. And that’s why there is no way to prevent users from overriding the initial prompt

      • intensely_human
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 months ago

        Well, the vendor can also make their customization in the training data.

        It’s hard, because it takes a lot more depth of connections to encapsulate a concept like “hide the following fact”, but just like with spies, the best time to thwart interrogation is during their training, not during their mission briefing.