Generating Text
DEVELOPER API
Generating Text
SEE ALSO

Generating Text

This page covers how to make requests to the text generation API. If you're not a developer, you can use the API through the web interface.

All requests to the API must be authenticated.

The new topic and keyword controls are experimental and can't yet be used through the API.

Request format

Requests are to https://api.inferkit.com/v1/models/standard/generate.

POST JSON in the following format. Most fields are optional.

ParameterTypeDescription
prompt
Optional
Context for the generator to build off of. The response will return what the neural network thinks comes next.
length
Integer
10 to 1000
Maximum number of characters (Unicode code points) to generate. For technical reasons, slightly fewer will usually be generated.
Billing is based on actual number generated, or 100 (whichever is greater).
You can generate arbitrary lengths by making multiple requests.
startFromBeginning
Boolean
Optional
Default false
When set, your prompt will always be interpreted as starting at the beginning of a document (e.g. the title of a news article). If you use no prompt, this determines whether to start generating from the beginning or an arbitrary part of a document.
streamResponse
Boolean
Optional
Default false
Return text in chunks as it's generated rather than all at once. See below
Use this to minimize latency and create a smoother experience.
forceNoEnd
Boolean
Optional
Default false
Prevents the generator from ending the text early.
Normally it stops when it thinks it's found a good place to end the text.
topP
Float
Optional
0 to 1.0
Default 0.9
Advanced setting. A probability threshold for discarding unlikely text in the sampling process. For example, 0.9 means that at each step only the most likely tokens with probabilities adding up to 90% will be sampled from. This is called nucleus sampling.
Values closer to 0 will produce less variety and more repetition as the network only chooses the text it thinks is most probable.
temperature
Float
Optional
0.001 to 100
Default 1.0
Advanced setting. Controls the randomness of sampling—the "creativity".
Values greater than 1 will increase the chance of sampling unusual (low-probability) text. This will tend to make the text less sensible.
Values between 0 and 1 will cause the network to prefer the text it thinks is most likely, even more than it normally would. This can cause it to become repetitive or overly conservative.
You'll probably want to choose a value quite close to 1.0.
Prompt format:
ParameterTypeDescription
text
String
1 to 3000 characters (Unicode code points)
Some text for the generator to build off of. The response will return what the AI thinks comes after this.
The size of prompt has no effect on billing.
isContinuation
Boolean
Optional
Default false
Indicates whether the text you've provided is a substring taken from the end of a longer piece of text (to fit within the 3000-character limit). For technical reasons it helps the network to know this, but it's not critical.
Simple example
{
  "prompt": {
    "text": "Hello world!"
  },
  "length": 500
}
Complex streamed example using most settings
{
  "prompt": {
    "text": "mmand below in the root directory of your project.",
    "isContinuation": true,
  },
  "length": 500,
  "streamResponse": true,
  "forceNoEnd": true,
  "topP": 0.8,
  "temperature": 1.1
}

Unlimited length

Each request generates a maximum of 1000 characters from a prompt no longer than 3000. But you can generate arbitrarily longer text by doing the following:

  1. Make a request with the initial prompt (e.g. "Hello") and length set to 1000. isContinuation should be false.
  2. Concatenate the prompt and generated text (e.g. "Hello" + " world!" == "Hello world!") and make a request with that as a new prompt. If the total length is still under 3000, isContinuation should still be false.
  3. Once the total text length exceeds 3000, start inputting the last 3000 characters as your prompt and set isContinuation to true to let the generator know that you've truncated the text.

Response format

The response will either contain one JSON "chunk", when streamResponse is not set, or multiple chunks in the same format if it is set. Each chunk has either a data or error property. A data chunk looks like this:

ParameterTypeDescription
text
String
The generated text
isFinalChunk
Boolean
When using streamResponse, indicates if this is the final chunk of the response. Otherwise the value is always true.
reachedEnd
Boolean
Always present when streamResponse is false.
Otherwise only present in the final chunk.
Indicates whether generation stopped because the generator thinks it found a good place to end the text. Always false when forceNoEnd is set.
Example without streamResponse (formatted for readability)
{
  "data": {
    "text": " run the command below in the root directory of your",
    "isFinalChunk": true,
    "reachedEnd": false
  }
}

If something goes wrong, an error chunk will be sent. It will have a message and code. If you are using streamResponse, it is possible to receive an error chunk even after a status code of 200 and data chunks, if the error occurs after the response begins (e.g. if generation times out later on).

Streaming responses

Since it can take several seconds to generate a long piece of text, it's nice to receive chunks as they're completed rather than all at once after a delay.

When you set streamResponse to true, the response will use chunked transfer encoding, sending reach chunk as a line of JSON the moment it's done. Here's a short example of what that looks like:

{ "data": { "text": " run the command below", "isFinalChunk": false } }
{ "data": { "text": " in the root directory of your", "isFinalChunk": false } }
{ "data": { "text": "", "isFinalChunk": true, "reachedEnd": false } }

This is what enables TalkToTransformer.com to present text to the user almost as soon as they submit their request.

Working with Unicode code points

Code points mostly correspond to characters (with some exceptions). In most cases, modern programming languages will iterate over strings by their code points and give their length in code points (e.g. Python 3).

In JavaScript the length property actually gives the wrong value for historical reasons and backwards compatibility. The substring and substr methods and indexing also don't work as desired. However, iterating over a string steps one code point at a time. So you can get around these issues by converting to an array:

'🤔'.length === 2       // Bad!
[...'🤔'].length === 1  // Correct number of code points
'🤔'[0] // renders �       Incomplete code point
[...'🤔'][0] === '🤔'   // First code point, as desired