Tokens

Tokens are simply groups of characters.

Tokens are chunks of text into which a given input or output is divided.

1 token is approximately 4 characters

Example: The phrase, "She sells seashells by the seashore." equals to 6 tokens.

Calculation of the tokens

Tokens are calculated by breaking down text into smaller units, which can include words, punctuation marks, and even sub-words or characters depending on the tokenization method.

Tokens are utilized in both the input (data sent to the AI model) and the output (data received from the AI model).

Let's take an example: When you train a website say, www.reply.cx, below will be the tokens that would be used:

  • Training – Total number of characters extracted from the website (Counted once, as you only train once)

  • Input tokens – All the tokens used to send question to OpenAI are counted here. They include:

    • Chunks – When a question is asked to the AI model, the system would fetch a chunk that closely matches the question.

    • Input question – Question asked by the user.

    • Prompt – System prompt and the instructions defined.

  • Output tokens – Size of the response received from OpenAI.

Token calculation changes based on the size of your AI model, how short your prompts are, and how long the AI's responses are.

Last updated