This page covers how to make requests to the text generation API. If you're not a developer, you can use the API through the web interface.
All requests to the API must be authenticated.
The new topic and keyword controls are experimental and can't yet be used through the API.
Requests are to https://api.inferkit.com/v1/models/standard/generate
.
POST JSON in the following format. Most fields are optional.
Parameter | Type | Description |
---|---|---|
prompt | Optional | Context for the generator to build off of. The response will return what the neural network thinks comes next. |
length | Integer 10 to 1000 | Maximum number of characters (Unicode code points) to generate. For technical reasons, slightly fewer will usually be generated. Billing is based on actual number generated, or 100 (whichever is greater). You can generate arbitrary lengths by making multiple requests. |
startFromBeginning | Boolean Optional Default false | When set, your prompt will always be interpreted as starting at the beginning of a document (e.g. the title of a news article). If you use no prompt, this determines whether to start generating from the beginning or an arbitrary part of a document. |
streamResponse | Boolean Optional Default false | Return text in chunks as it's generated rather than all at once. See below Use this to minimize latency and create a smoother experience. |
forceNoEnd | Boolean Optional Default false | Prevents the generator from ending the text early. Normally it stops when it thinks it's found a good place to end the text. |
topP | Float Optional 0 to 1.0 Default 0.9 | Advanced setting. A probability threshold for discarding unlikely text in the sampling process. For example, 0.9 means that at each step only the most likely tokens with probabilities adding up to 90% will be sampled from. This is called nucleus sampling. Values closer to 0 will produce less variety and more repetition as the network only chooses the text it thinks is most probable. |
temperature | Float Optional 0.001 to 100 Default 1.0 | Advanced setting. Controls the randomness of sampling—the "creativity". Values greater than 1 will increase the chance of sampling unusual (low-probability) text. This will tend to make the text less sensible. Values between 0 and 1 will cause the network to prefer the text it thinks is most likely, even more than it normally would. This can cause it to become repetitive or overly conservative. You'll probably want to choose a value quite close to 1.0. |
Parameter | Type | Description |
---|---|---|
text | String 1 to 3000 characters (Unicode code points) | Some text for the generator to build off of. The response will return what the AI thinks comes after this. The size of prompt has no effect on billing. |
isContinuation | Boolean Optional Default false | Indicates whether the text you've provided is a substring taken from the end of a longer piece of text (to fit within the 3000-character limit). For technical reasons it helps the network to know this, but it's not critical. |
{
"prompt": {
"text": "Hello world!"
},
"length": 500
}
{
"prompt": {
"text": "mmand below in the root directory of your project.",
"isContinuation": true,
},
"length": 500,
"streamResponse": true,
"forceNoEnd": true,
"topP": 0.8,
"temperature": 1.1
}
Each request generates a maximum of 1000 characters from a prompt no longer than 3000. But you can generate arbitrarily longer text by doing the following:
length
set to 1000. isContinuation
should be false
.isContinuation
should still be false
.isContinuation
to true
to let the generator know that you've truncated the text.The response will either contain one JSON "chunk", when streamResponse
is not set, or multiple chunks in the same format if it is set. Each chunk has either a data
or error
property. A data
chunk looks like this:
Parameter | Type | Description |
---|---|---|
text | String | The generated text |
isFinalChunk | Boolean | When using streamResponse , indicates if this is the final chunk of the response. Otherwise the value is always true. |
reachedEnd | Boolean Always present when streamResponse is false.Otherwise only present in the final chunk. | Indicates whether generation stopped because the generator thinks it found a good place to end the text. Always false when forceNoEnd is set. |
streamResponse
(formatted for readability){
"data": {
"text": " run the command below in the root directory of your",
"isFinalChunk": true,
"reachedEnd": false
}
}
If something goes wrong, an error
chunk will be sent. It will have a message
and code
. If you are using streamResponse
, it is possible to receive an error
chunk even after a status code of 200 and data
chunks, if the error occurs after the response begins (e.g. if generation times out later on).
Since it can take several seconds to generate a long piece of text, it's nice to receive chunks as they're completed rather than all at once after a delay.
When you set streamResponse
to true
, the response will use chunked transfer encoding, sending reach chunk as a line of JSON the moment it's done. Here's a short example of what that looks like:
{ "data": { "text": " run the command below", "isFinalChunk": false } }
{ "data": { "text": " in the root directory of your", "isFinalChunk": false } }
{ "data": { "text": "", "isFinalChunk": true, "reachedEnd": false } }
This is what enables TalkToTransformer.com to present text to the user almost as soon as they submit their request.
Code points mostly correspond to characters (with some exceptions). In most cases, modern programming languages will iterate over strings by their code points and give their length in code points (e.g. Python 3).
In JavaScript the length
property actually gives the wrong value for historical reasons and backwards compatibility. The substring
and substr
methods and indexing also don't work as desired. However, iterating over a string steps one code point at a time. So you can get around these issues by converting to an array:
'🤔'.length === 2 // Bad!
[...'🤔'].length === 1 // Correct number of code points
'🤔'[0] // renders � Incomplete code point
[...'🤔'][0] === '🤔' // First code point, as desired