' char* strInput =(char*) malloc(sizeof(char));
int ch;
int letNum = 0;
while((ch = getchar()) != EOF){
letNum++;
strInput = (char*)realloc(strInput,letNum*sizeof(char));
*(strInput + letNum - 1) = ch;
}
printf("\n");
printf("%s\n",strInput);
free(strInput);`
This is the contents of main in a program I wrote that takes an undefined number of chars and prints the final string. I don’t understand why but it only works if I press ctrl+D twice, and only once if I press enter before.
does anyone get what’s going on? And how would you have written the program?
It’s due to the way getchar() and console input work. When you enter “abcdCTRL+D” on the keyboard, here’s what happens:
- abcd characters are added to the stdin buffer and consumed by getchar()
- CTRL+D (or EOF on Unix, CTRL+Z on Windows) is left in the input buffer (it’s not yet consumed by getchar()!))
- The console is waiting for more input, and the next time you press ENTER or CTRL+D the previous one will be consumed by getchar()
Think about this scenario: What happens if you only enter “abcd” and not press anything else? The program will still be waiting for more input. It needs to receive a signal telling it to stop with the input and proceed with the code execution. But if you press enter, it won’t automatically add a new line to the string, because the new line character is still in the input buffer.
This guy is right. I saw OP’s post but I did not have enough time to reply at the time. I came back to reply to OP’s post, but you are already right.
Thanks a lot for your answer. This might be because of my shallow understanding of how a buffer works, but I don’t understand why EOF isn’t consumed by getchar() when the other bytes are consumed. Isn’t a char just a number and EOF too (-1 I think)? I probably should try and understand buffers more
If you’re on Linux then I’m pretty sure the confusing behavior you’re seeing is due to the line buffering the kernel does by default.
Ctrl+D
does not actually mean “send EOF”, and it’s not the “EOF character”, rather it means “complete the current or next stdin read() request immediately”. That’s a very different thing, and sometimes it means EOF and other times it does not.In practice what this means is that, if there is no data waiting to be sent on stdin then
read()
returns zero, andread()
returning zero is howgetchar()
knows an EOF happened. The flow looks like this:- Your program calls
getchar()
. getchar()
callsread()
on stdin and your program blocks waiting for input.- The user presses
Ctrl+D
on the tty, having not typed anything else. - The kernel immediately ends the blocked
read()
call and returns zero bytes read. getchar()
sees that it got no bytes fromread()
and returnsEOF
.- Your program sees that and exits the loop.
However, in practice it doesn’t work that cleanly because the tty is normally operating in “cooked” mode, where the kernel sends input to your program line by line, allowing the user to edit a single line before sending it. The way this works is by buffering the
stdin
contents and sending it when the user hits enter. Going back toCtrl-D
, you can see how this screws things up, leading to the behavior you see:- Your program calls
getchar()
. getchar()
callsread()
on stdin and your program blocks waiting for input.- The user types some input, but does not hit enter. This data sits in the kernel’s
stdin
buffer and is not send to your program yet. - The user presses
Ctrl+D
on the tty. - The kernel immediately ends the blocked
read()
call and starts returning the currently bufferedstdin
input, without waiting for an enter press. getchar()
sees that it got a byte fromread()
and thus returns it.- Your program starts getting all the previously buffered bytes and keeps running until
getchar()
has seen all of them. getchar()
callsread()
on stdin. There’s now no bytes in the buffer so you block waiting for input, the same as before. The previousCtrl+D
was already “used up” to end the previousread()
call so it doesn’t matter any more.- The user types
Ctrl+D
. - Because there is currently no input in the line buffer,
read()
returns zero.getchar()
sees this and returnsEOF
.
In the above case
Ctrl+D
doesn’t work as expected because of the line buffering. Theread()
call ended early without waiting as expected, but your program just starts receiving all the buffered input so it doesn’t have any idea you pressedCtrl+D
and never gets theread() == 0
EOF condition. Additionally theCtrl+D
is a one-time deal, it ends oneread()
call early and sends the buffered input. When you callread()
again with nothing to send it just blocks and you have to do anotherCtrl+D
to actually getread()
to return zero.You can see the line buffering behavior if you add a
putchar()
inside your loop. Theputchar()
doesn’t actually print while you type the characters, it only prints after you hit either enter orCtrl+D
, showing that your program did not receive any of the characters until one of those two actions happened.Thanks a lot for the in depth explanation, this makes things a lot clearer. I’ll try ‘putchar()’ and test a few more things and then come back to read this post again
- Your program calls
Here are couple suggestions about how to improve your algorithm:
First of all, you should reduce the number of calls to
realloc
function. This is because, this function will often need to switch to the kernel space to be able to do the reallocation. I think it is nice to allocate the same size as a single page or multiples of page size from the virtual memory. I think you should allocate 4KB or 2MB of memory at the beginning of the function. Then reallocate multiples of the page size when you need to reallocate memory.Second of all, reading the input one character at a time is also time-consuming. Repeatedly calling this function means you will end up going to the kernel space, grabbing a single character from there, then coming back to the user space (I used this as an example, there are many buffers between your application and kernel space). Instead I would suggest you to read 4 kB at the time using read or fread functions.
If you do not know about files, caching, virtual memory, page sizes, kernel space, user space, and optimization then please disregard everything I said. This will only confuse you now. I know it is a lot of fun to start thinking about optimization when you are learning a new programming language, on the other hand as mathematician and computer scientist Donald Knuth said, premature optimization is the root of all evil.
I hope this answer helps you.
Thank you, I realize that there’s a whole other aspect I didn’t even consider. I’m new to C and Linux so I’ll follow your advice but it’s making me want to learn more. Thanks again to both you and @adriator for your answers