I’m mostly using ChatGPT4, because I don’t use vscode (helix), and as far as I could see it from colleagues, the current Copilot(X) is not helpful at all…
I’m describing the problem (context etc.), maybe paste some code there, and hope that it gets what I mean, when it doesn’t (which seems to be rather often), I’ll try to help it with the context it hasn’t gotten, but it very often fails, unless the code stuff is rather simple (i.e. boilerplaty).
But even if I want the GPT4 to generate a bunch of boilerplate, it introduces something like // repeat this 20 times in between the code that it should actually generate, and even if I tell it multiple times that it should generate the exact code, it fails pretty much all the time, also with increased context size via the API, so that it should actually be able to do it in one go, the gpt4-0314 model (via the API) seems to be a bit better here.
I’m absolutely interested where this leads, and I’m the first that monitors all the changes, but right now it slows me down, rather than really helping me. Copilot may be interesting in the future, but right now it’s dumb as fu… I’m not writing boilerplaty code, it’s rather complex stuff, and it fails catastrophically there, I don’t see that this will change in the near future. GPT4 got dumber over the course of the last half year, it was certainly better at the beginning. I can remember being rather impressed by it, but now meh…
It’s good for natural language stuff though, but not really for novel creative stuff in code (I’m doing most stuff in Rust btw.).
But GPT5 will be interesting. I doubt, that I’ll really profit from it for code related stuff (maybe GPT6 then or so), but we’ll see… All the other developments in that space are also quite interesting. So when it’s actually viable to train or constrain your own LLM on your own bigger codebase, such that it really gets the details, and gives actual helpful suggestions, (e.g. something like the recent CodeLlama release) this stuff may be more interesting for actual coding.
I’m not even letting it generate comments (e.g. above functions) because it’s kinda like this currently (figurative, more fancy but wordy, and not really helpful)
I can’t disagree with your colleagues more, and suppose that perhaps they are reporting experiences in a fresh codebase or early on in its release.
With a mature codebase, it feeds a lot of that in as context, and so suggestions match your naming conventions, style, etc.
It could definitely use integration with a linter so it doesn’t generate subtle bugs around generative naming mismatching actual methods/variables, but it’s become remarkably good, particularly in the past few weeks.
BTW, if you want more milage out of ChatGPT, I would actually encourage it to be extremely verbose with comments. You can always strip them out later, but the way generative models work, the things it generates along the way impact where it ends up. There’s a whole technique around having it work through problems in detailed thoughts called “chain of thought prompting” and you’ll probably have much better results instructing it to work through what needs to be done in a comment preceding its activity writing the code than just having it write the code.
And yes, I’m particularly excited to see where the Llama models go, especially as edge hardware is increasingly tailored for AI workloads over the next few years.
It could definitely use integration with a linter so it doesn’t generate subtle bugs around generative naming mismatching actual methods/variables, but it’s become remarkably good, particularly in the past few weeks.
Maybe I should try it again, I doubt thought that it really helps me, I’m a fast typer, and I don’t like to be interrupted by something wrong all the time (or not really useful) when I have a creative phase (a good LSP like rust-analyzer seems to be a sweet spot I think). And something like copilot seems to just confuse me all the time, either by showing plain wrong stuff, or something like: what does it want? ahh makes sense -> why this way, that way is better (then writing instead how I would’ve done it), so I’ll just skip that part for more complex stuff at least.
But it would be interesting how it may look like with code that’s a little bit less exotic/living on the edge of the language. Like typical frontend or backend stuff.
In what context are you using it, that it provides good results?
I would actually encourage it to be extremely verbose with comments
Yeah I don’t know, I’m not writing the code to feed it to an LLM, I like to write it for humans, with good function doc (for humans), I hope that an LLM is smart enough at some day to get the context. And that may be soon enough, but til then, I don’t see a real benefit of LLMs for code (other than (imprecise) boilerplate generators).
I’m mostly using ChatGPT4, because I don’t use vscode (helix), and as far as I could see it from colleagues, the current Copilot(X) is not helpful at all…
I’m describing the problem (context etc.), maybe paste some code there, and hope that it gets what I mean, when it doesn’t (which seems to be rather often), I’ll try to help it with the context it hasn’t gotten, but it very often fails, unless the code stuff is rather simple (i.e. boilerplaty). But even if I want the GPT4 to generate a bunch of boilerplate, it introduces something like
// repeat this 20 times
in between the code that it should actually generate, and even if I tell it multiple times that it should generate the exact code, it fails pretty much all the time, also with increased context size via the API, so that it should actually be able to do it in one go, thegpt4-0314
model (via the API) seems to be a bit better here.I’m absolutely interested where this leads, and I’m the first that monitors all the changes, but right now it slows me down, rather than really helping me. Copilot may be interesting in the future, but right now it’s dumb as fu… I’m not writing boilerplaty code, it’s rather complex stuff, and it fails catastrophically there, I don’t see that this will change in the near future. GPT4 got dumber over the course of the last half year, it was certainly better at the beginning. I can remember being rather impressed by it, but now meh…
It’s good for natural language stuff though, but not really for novel creative stuff in code (I’m doing most stuff in Rust btw.).
But GPT5 will be interesting. I doubt, that I’ll really profit from it for code related stuff (maybe GPT6 then or so), but we’ll see… All the other developments in that space are also quite interesting. So when it’s actually viable to train or constrain your own LLM on your own bigger codebase, such that it really gets the details, and gives actual helpful suggestions, (e.g. something like the recent CodeLlama release) this stuff may be more interesting for actual coding.
I’m not even letting it generate comments (e.g. above functions) because it’s kinda like this currently (figurative, more fancy but wordy, and not really helpful)
// this variable is of type int let a = 8;
I can’t disagree with your colleagues more, and suppose that perhaps they are reporting experiences in a fresh codebase or early on in its release.
With a mature codebase, it feeds a lot of that in as context, and so suggestions match your naming conventions, style, etc.
It could definitely use integration with a linter so it doesn’t generate subtle bugs around generative naming mismatching actual methods/variables, but it’s become remarkably good, particularly in the past few weeks.
BTW, if you want more milage out of ChatGPT, I would actually encourage it to be extremely verbose with comments. You can always strip them out later, but the way generative models work, the things it generates along the way impact where it ends up. There’s a whole technique around having it work through problems in detailed thoughts called “chain of thought prompting” and you’ll probably have much better results instructing it to work through what needs to be done in a comment preceding its activity writing the code than just having it write the code.
And yes, I’m particularly excited to see where the Llama models go, especially as edge hardware is increasingly tailored for AI workloads over the next few years.
Maybe I should try it again, I doubt thought that it really helps me, I’m a fast typer, and I don’t like to be interrupted by something wrong all the time (or not really useful) when I have a creative phase (a good LSP like rust-analyzer seems to be a sweet spot I think). And something like copilot seems to just confuse me all the time, either by showing plain wrong stuff, or something like: what does it want? ahh makes sense -> why this way, that way is better (then writing instead how I would’ve done it), so I’ll just skip that part for more complex stuff at least.
But it would be interesting how it may look like with code that’s a little bit less exotic/living on the edge of the language. Like typical frontend or backend stuff.
In what context are you using it, that it provides good results?
Yeah I don’t know, I’m not writing the code to feed it to an LLM, I like to write it for humans, with good function doc (for humans), I hope that an LLM is smart enough at some day to get the context. And that may be soon enough, but til then, I don’t see a real benefit of LLMs for code (other than (imprecise) boilerplate generators).