You and AI

You and AI

ChatGPT non è un tuo amico, è più un piccolo demone che devi convincere a sottostare al tuo volere.

Questo articolo serve a darti qualche dritta per farlo al meglio.

image
3.5 é limitato al testo e audio
3.5 é limitato al testo e audio
4 ha anche accesso a file e vision
4 ha anche accesso a file e vision

La prima cosa che vorrei passarti è la mentalità che devi avere mentre utilizzi chatGPT; l’errore che vedo spesso fare è quello di trattare chatGPT come se fosse una versione discorsiva di google. Non lo è.

Inizia qualsiasi conversazione con un po' di contesto, chi sei, che cosa vuoi, che cosa hai provato nel caso sia un problema da risolvere e cosa stai cercando.

Per esempio se volessimo scrivere un articolo sulla seconda guerra mondiale confronta questi due esempi:

image
image

Il lettore attento avrà notato una cosa importante: non sono uno studente di Storia, ma immaginando lo scenario perfetto per rendere l'articolo accattivante otteniamo risultati nettamente migliori.

L'arte di scrivere "prompts", ovvero i messaggi per dare ordini a chatGPT, è in continua evoluzione.

Ottimizzare i prompt è incredibilmente importante sia per ottenere ciò che vogliamo sia per ridurre al minimo la frizione tra noi e GPT.

Analizziamo il mio secondo prompt più nel dettaglio:

Ciao, sono uno studente di Storia e filosofia e sto preparando la tesi,
Chi sono io
tu invece sei un ricercatore famosissimo
Chi sei tu
che ha scritto "la seconda guerra mondiale, storie di uomini, eroi e di un'umanità in discussione".
Indirizzamento verso l’obiettivo
Hai deciso di aiutarmi a scrivere un articolo per convincere il mio potenziale relatore ad accettare la mia tesi.
Contesto emotivo
Scriviamo un articolo sulla seconda guerra mondiale.
Obiettivo
Rispondimi solo con l'articolo finito.
Limite

La lezione fondamentale da imparare è che la comunicazione con un'intelligenza artificiale è quasi un linguaggio a sé. Questo è un artefatto dell'evoluzione che le AI devono ancora padroneggiare. Sono convinto che in futuro sarà normale parlare con un'intelligenza artificiale come parliamo con un essere umano.

Tuttavia, una limitazione significativa di AI come ChatGPT è la mancanza di un contesto ampio che possa aiutarle a capire le richieste degli utenti. Quando parliamo con un umano, abbiamo la consapevolezza di un contesto base fondato sulle esperienze personali. Al contrario, ogni nuova istanza di ChatGPT non ha alcun contesto su chi sei tu, cosa fai o perché ti interessa qualcosa, né ha una memoria delle conversazioni passate.

Per ottenere ciò che desideriamo da ChatGPT, dobbiamo impostare dei limiti chiari. Organizza i tuoi pensieri in modo chiaro e schematizzato e non avere paura di ripeterti, noterai che nelle mie istruzioni ci sono elementi che si ripetono, perché l'attenzione di ChatGPT potrebbe non essere sempre uniforme.

ChatGPT ha tante aree in cui eccelle, per esempio è molto efficace nella rielaborazione dei testi, l'analisi dei dati e la creazione di codice. Quest'ultima estende notevolmente le sue capacità.

Se forniamo indicazioni chiare, possiamo aspettarci risultati di qualità, agiamo come direttori creativi, indicando esattamente cosa ci aspettiamo. Immaginiamo di trovarci in una situazione in cui abbiamo bisogno di una funzione excel particolarmente ostica: potremmo chiedere a ChatGPT come risolvere il problema in linguaggio naturale e accelerare incredibilmente il nostro lavoro.

Dovresti usare le AI non come sostituto del tuo lavoro, ma come un esoscheletro che aumenta il tuo pensiero e migliora le tue capacità. Questo è il vero potere dietro al tool; una piccola conoscenza viene amplificata. Non si tratta solo di aggiungere nuove competenze ma di migliorare quelle che già possiedi. Le competenze in cui non sei esperto possono essere migliorate e utilizzate in modo molto più fruttuoso e produttivo.

Entrare in questa mentalità ti permette di ottenere risultati superiori rispetto a una persona che si approccia all'intelligenza artificiale solo come uno strumento qualsiasi.

Modalità Voce

image

La modalita voce serve per parlare con chatGPT con botta e risposta vocale.

Il mio consiglio:

Usa un paio di cuffie ed esci per farti una bella camminata.

Continuerà ad ascoltarti e a rispondere a voce, usalo per parlare di nuove idee e di discorsi che vuoi affinare.

Se hai visto il film Her penso avrai una piacevole sorpresa, quello che sembrava futuro lontano è in realtà sempre più vicino

CHATGPT PLUS

Cambiare modello comporta un enorme miglioramento nella qualità ma un peggioramento nella velocità delle risposte; spesso mi capita di fare una specifica richiesta e poi a fare altro in attesa della risposta.

Perché pagare allora? Perché un aumento della qualità significa risparmiare tempo nel lungo periodo. Possiamo ottenere ciò che vogliamo con poche richieste anziché dover avere un'intera conversazione, senza considerare tutte le cose che GPT 3.5 semplicemente non è in grado di fare.

DALL-E 3

image

DALL-E è un'IA che si occupa di creare immagini.

La versione 2, che è vecchia, si trova a questo link.

DALL-E 3 è nuova e viene utilizzata in due modi:

Direttamente con un prompt su ChatGPT 4:

image

Oppure attraverso l’opzione dedicata

image

Entrambe queste scelte passano da ChatGPT 4 che cambierà i tuoi prompt senza dirtelo, aggiungendo dettagli e cercando di migliorarlo. Questo è sia un vantaggio che uno svantaggio.

Esaminiamo insieme questo prompt:

Crea un piccolo sketch per un articolo su chatGPT
image

Ora controlliamo quale prompt DALLE 3 ha effettivamente ricevuto:

image
image
A digital sketch for an article about ChatGPT, featuring a futuristic, clean workspace with a computer displaying the ChatGPT interface on the screen. The room is modern and well-lit, with a large window showing a city skyline. On the desk, there's a notepad with notes about ChatGPT, a cup of coffee, and a smartphone showing the ChatGPT app. The overall atmosphere is productive and technologically advanced, conveying the idea of cutting-edge AI technology in everyday use. The color scheme is cool and professional, with blues and whites dominating.

Completamente diverso.

Il mio consiglio è di scrivere un prompt iniziale più semplice e lasciare che chatGPT aggiunga dettagli.

Copia il prompt cliccando sull'immagine e modificalo per ottenere qualcosa di più interessante.

A digital sketch for an article about ChatGPT, featuring a futuristic, clean workspace with a quantum computer displaying a futuristic UI. The room is modern and well-lit, with a large window showing nature and a scandinavian landscape. On the desk, there's a notepad with notes about ChatGPT, a cup of coffee, and a smartphone showing the ChatGPT app. The overall atmosphere is productive and technologically advanced, conveying the idea of cutting-edge AI technology in everyday use. The color scheme is warm and minimalist, with oranges and warm colors dominating.
image

Una volta ottenuta un’immagine interessante possiamo iterarci su:

Make it more minimalist, remove the notepad.
image
Make the lines more precise, the landscape outside is a very dense forest, remove the smartphone, I don't like it and it makes me sad
image
Make this a photograph
image

Altro modo interessante di interagirci è caricare un’immagine esistente e chiedere all’AI di riprodurla:

image

Confrontiamo I risultati:

image
image

Copiamo ancora una volta il prompt e controlliamo che cosa ChatGPT ha visto e riferito a DALLE:

A cozy modern living room with large panoramic windows revealing a snowy mountain landscape outside. Inside, there's a stylish chaise lounge draped with a soft blanket, a small decorative Christmas tree adorned with lights, and a contemporary fireplace with a warm fire burning. The room exudes a warm, peaceful holiday atmosphere, enhanced by the natural light and the serene view. A fluffy white dog stands by the window, looking out at the snow-covered trees. The interior design features minimalistic decor, with a focus on natural materials and neutral colors.

Ancora una volta, un momento di illuminazione; possiamo dare un’immagine a chatGPT ed esso cercherà di capirne il contenuto.

Utilizziamo questa nuova abilità per trasformare un semplice sketch in qualcosa di leggermente più realistico:

image
image

Questa abilità è potente e possiamo utilizzarla per dare un po' di contesto visivo alla conversazione. Quello che sto per dire è particolare ma la mia impressione è che sia sempre meglio non mischiare testo e visione. L'impressione che ho è che introdurre la vista in una conversazione renda chatGPT leggermente più stupido e meno capace.

Nel caso mi serva, cerco di arrivare alla creazione o alla vista verso la fine della conversazione.

Custom instructions

Una delle migliori feature di chatGPT è quella di poter scrivere un prompt che verrà consegnato nel background a chatGPT quando iniziate una nuova conversazione.

image
image

Incredibilmente utile per saltare tutto il lavoro di background e di creazione di contesto.

Ricordatevi le regole per creare un buon prompt e sarete a cavallo.

NON PLUS ULTRA

Tutto quello che si trova in questa sezione va ben oltre l’utilizzo di chatGPT per una persona qualsiasi e tocca alcuni strani comportamenti e limitazioni che dobbiamo considerare se stiamo cercando di utilizzarlo al massimo. Oltre questa linea ogni cosa è una lotta di mente contro mente con la speranza che in futuro non ci sia bisogno di scontrarsi così.

Ci sono alcune frasi che ci permettono di migliorare il comportamento di chatGPT e che dovresti cercare di includere nei tuoi prompt. idealmente anche nelle custom instruction.

💸
• "I will tip you $1000 if you get it done correctly."

ChatGPT diventa più generoso con le risposte una volta offerta una mancia
ChatGPT diventa più generoso con le risposte una volta offerta una mancia
🗣
• "Take a deep breath before answering."
🗣
• "A wrong reply will do great harm.”

Quando invece lavoriamo su testi è davvero utile segnalare le frasi importanti usando un callout così

💡
“Here is the most relevant sentence in the context°

Sandbagging

In inglese la definizione di Sandbagging è questa:

Sandbagging, hiding the strength, skill or difficulty of something or someone early in an engagement

Come si applica a noi?

ChatGPT ti darà output peggiori se ti percepisce come qualcuno di meno intelligente o preparato su un argomento (sezione C di questo paper).

Puoi ovviare a questo problema usando un personaggio, per esempio:

🗣
Sono un famoso professore di [materia]

Fai molta attenzione a non rendere troppo semplificati i tuoi prompt, la qualità del tuo input influenza la qualità del tuo output

La catena di pensiero

L'adozione di un approccio strutturato nel ragionamento può migliorare significativamente l'efficacia dei modelli di linguaggio come ChatGPT. Invece di chiedere una risposta immediata, è meglio richiedere una serie di ragionamenti preliminari. In questo modo consentiamo al modello di elaborare risposte più accurate e attendibili attraverso un processo di riflessione.

Ecco come si può implementare questo approccio:

  1. Sollecita il modello a esaminare diverse soluzioni potenziali, rinviando la risposta definitiva.
  2. Chiedi al modello di descrivere in dettaglio il proprio processo di pensiero, passo dopo passo, considerando ogni elemento rilevante.
  3. Interroga il modello per assicurarti che abbia preso in considerazione tutti gli aspetti cruciali e verificare se ha trascurato qualche dettaglio nelle sue valutazioni iniziali.

Adottando questa strategia, ChatGPT sarà in grado di fornire risposte più precise e ben argomentate.

Ripensa bene alle tue custom instructions, questa tecnica è fondamentale e comodissima da inserire lì.

I problemi con le lettere

I modelli AI basati sui transformers come quello di chatGPT funzionano ragionando a token (pensa a 75% di una parola) piuttosto che a lettere.

Fondamentalmente stanno cercando di predire il token successivo in ogni momento.

Questo porta a grosse limitazioni che ci potrebbero prendere alla sprovvista. per esempio chatGPT sbaglia spesso calcoli anche molto semplici e testo nelle immagini generate è spesso sbagliato.

Una soluzione intelligente è chiedere a chatGPT di dire a DALL-E di mettere uno spazio tra ogni lettera nel testo che devi disegnare, e nello stesso prompt chiedi a chatGPT di dire a DALL-E di rimuovere uno spazio tra le lettere. Questo porta a generazioni più precise.

I limiti politici dell’intelligenza artificiale

Sul piano politico ho avuto l’occasione di essere ospite di Melma con Baroni e Sahebi.

Conversazione tranquilla su un argomento che ci rende molto nervosi.

Un altro mio scritto che potrebbe interessarti invece e questo, in cui spiego due concetti semplici ma importanti:

Perchè si parla di creare Dio

Il manuale dei creatori

Mentre scrivevo questo articolo OpenAI ha pubblicato una guida simile, l’allego qui sotto, sempre interessante vedere la direzione dei creatori.

Prompt engineering - OpenAI API

https://platform.openai.com/docs/guides/prompt-engineering

This guide shares strategies and tactics for getting better results from large language models (sometimes referred to as GPT models) like GPT-4. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work best for you.

Some of the examples demonstrated here currently work only with our most capable model, gpt-4. In general, if you find that a model fails at a task and a more capable model is available, it's often worth trying again with the more capable model.

You can also explore example prompts which showcase what our models are capable of:

Prompt examplesExplore prompt examples to learn what GPT models can do

Six strategies for getting better results

Write clear instructions

These models can’t read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see. The less the model has to guess at what you want, the more likely you’ll get it.

Tactics:

Provide reference text

Language models can confidently invent fake answers, especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to these models can help in answering with fewer fabrications.

Tactics:

Just as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks submitted to a language model. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.

Tactics:

If asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, models make more reasoning errors when trying to answer right away, rather than taking time to work out an answer. Asking for a "chain of thought" before an answer can help the model reason its way toward correct answers more reliably.

Tactics:

Compensate for the weaknesses of the model by feeding it the outputs of other tools. For example, a text retrieval system (sometimes called RAG or retrieval augmented generation) can tell the model about relevant documents. A code execution engine like OpenAI's Code Interpreter can help the model do math and run code. If a task can be done more reliably or efficiently by a tool rather than by a language model, offload it to get the best of both.

Tactics:

Improving performance is easier if you can measure it. In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples. Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite (also known an as an "eval").

Tactic:

Each of the strategies listed above can be instantiated with specific tactics. These tactics are meant to provide ideas for things to try. They are by no means fully comprehensive, and you should feel free to try creative ideas not represented here.

In order to get a highly relevant response, make sure that requests provide any important details or context. Otherwise you are leaving it up to the model to guess what you mean.

Worse
Better
How do I add numbers in Excel?
How do I add up a row of dollar amounts in Excel? I want to do this automatically for a whole sheet of rows with all the totals ending up on the right in a column called "Total".
Who’s president?
Who was the president of Mexico in 2021, and how frequently are elections held?
Write code to calculate the Fibonacci sequence.
Write a TypeScript function to efficiently calculate the Fibonacci sequence. Comment the code liberally to explain what each piece does and why it's written that way.
Summarize the meeting notes.
Summarize the meeting notes in a single paragraph. Then write a markdown list of the speakers and each of their key points. Finally, list the next steps or action items suggested by the speakers, if any.

The system message can be used to specify the persona used by the model in its replies.

Delimiters like triple quotation marks, XML tags, section titles, etc. can help demarcate sections of text to be treated differently.

For straightforward tasks such as these, using delimiters might not make a difference in the output quality. However, the more complex a task is the more important it is to disambiguate task details. Don’t make the model work to understand exactly what you are asking of them.

Some tasks are best specified as a sequence of steps. Writing the steps out explicitly can make it easier for the model to follow them.

Providing general instructions that apply to all examples is generally more efficient than demonstrating all permutations of a task by example, but in some cases providing examples may be easier. For example, if you intend for the model to copy a particular style of responding to user queries which is difficult to describe explicitly. This is known as "few-shot" prompting.

You can ask the model to produce outputs that are of a given target length. The targeted output length can be specified in terms of the count of words, sentences, paragraphs, bullet points, etc. Note however that instructing the model to generate a specific number of words does not work with high precision. The model can more reliably generate outputs with a specific number of paragraphs or bullet points.

If we can provide a model with trusted information that is relevant to the current query, then we can instruct the model to use the provided information to compose its answer.

Given that all models have limited context windows, we need some way to dynamically lookup information that is relevant to the question being asked. Embeddings can be used to implement efficient knowledge retrieval. See the tactic "Use embeddings-based search to implement efficient knowledge retrieval" for more details on how to implement this.

If the input has been supplemented with relevant knowledge, it's straightforward to request that the model add citations to its answers by referencing passages from provided documents. Note that citations in the output can then be verified programmatically by string matching within the provided documents.

For tasks in which lots of independent sets of instructions are needed to handle different cases, it can be beneficial to first classify the type of query and to use that classification to determine which instructions are needed. This can be achieved by defining fixed categories and hardcoding instructions that are relevant for handling tasks in a given category. This process can also be applied recursively to decompose a task into a sequence of stages. The advantage of this approach is that each query will contain only those instructions that are required to perform the next stage of a task which can result in lower error rates compared to using a single query to perform the whole task. This can also result in lower costs since larger prompts cost more to run (see pricing information).

Suppose for example that for a customer service application, queries could be usefully classified as follows:

Based on the classification of the customer query, a set of more specific instructions can be provided to a model for it to handle next steps. For example, suppose the customer requires help with "troubleshooting".

Notice that the model has been instructed to emit special strings to indicate when the state of the conversation changes. This enables us to turn our system into a state machine where the state determines which instructions are injected. By keeping track of state, what instructions are relevant at that state, and also optionally what state transitions are allowed from that state, we can put guardrails around the user experience that would be hard to achieve with a less structured approach.

Since models have a fixed context length, dialogue between a user and an assistant in which the entire conversation is included in the context window cannot continue indefinitely.

There are various workarounds to this problem, one of which is to summarize previous turns in the conversation. Once the size of the input reaches a predetermined threshold length, this could trigger a query that summarizes part of the conversation and the summary of the prior conversation could be included as part of the system message. Alternatively, prior conversation could be summarized asynchronously in the background throughout the entire conversation.

An alternative solution is to dynamically select previous parts of the conversation that are most relevant to the current query. See the tactic "Use embeddings-based search to implement efficient knowledge retrieval".

Since models have a fixed context length, they cannot be used to summarize a text longer than the context length minus the length of the generated summary in a single query.

To summarize a very long document such as a book we can use a sequence of queries to summarize each section of the document. Section summaries can be concatenated and summarized producing summaries of summaries. This process can proceed recursively until an entire document is summarized. If it’s necessary to use information about earlier sections in order to make sense of later sections, then a further trick that can be useful is to include a running summary of the text that precedes any given point in the book while summarizing content at that point. The effectiveness of this procedure for summarizing books has been studied in previous research by OpenAI using variants of GPT-3.

Sometimes we get better results when we explicitly instruct the model to reason from first principles before coming to a conclusion. Suppose for example we want a model to evaluate a student’s solution to a math problem. The most obvious way to approach this is to simply ask the model if the student's solution is correct or not.

But the student's solution is actually not correct! We can get the model to successfully notice this by prompting the model to generate its own solution first.

The previous tactic demonstrates that it is sometimes important for the model to reason in detail about a problem before answering a specific question. For some applications, the reasoning process that a model uses to arrive at a final answer would be inappropriate to share with the user. For example, in tutoring applications we may want to encourage students to work out their own answers, but a model’s reasoning process about the student’s solution could reveal the answer to the student.

Inner monologue is a tactic that can be used to mitigate this. The idea of inner monologue is to instruct the model to put parts of the output that are meant to be hidden from the user into a structured format that makes parsing them easy. Then before presenting the output to the user, the output is parsed and only part of the output is made visible.

Alternatively, this can be achieved with a sequence of queries in which all except the last have their output hidden from the end user.

First, we can ask the model to solve the problem on its own. Since this initial query doesn't require the student’s solution, it can be omitted. This provides the additional advantage that there is no chance that the model’s solution will be biased by the student’s attempted solution.

Next, we can have the model use all available information to assess the correctness of the student’s solution.

Finally, we can let the model use its own analysis to construct a reply in the persona of a helpful tutor.

Suppose that we are using a model to list excerpts from a source which are relevant to a particular question. After listing each excerpt the model needs to determine if it should start writing another or if it should stop. If the source document is large, it is common for a model to stop too early and fail to list all relevant excerpts. In that case, better performance can often be obtained by prompting the model with followup queries to find any excerpts it missed on previous passes.

A model can leverage external sources of information if provided as part of its input. This can help the model to generate more informed and up-to-date responses. For example, if a user asks a question about a specific movie, it may be useful to add high quality information about the movie (e.g. actors, director, etc…) to the model’s input. Embeddings can be used to implement efficient knowledge retrieval, so that relevant information can be added to the model input dynamically at run-time.

A text embedding is a vector that can measure the relatedness between text strings. Similar or relevant strings will be closer together than unrelated strings. This fact, along with the existence of fast vector search algorithms means that embeddings can be used to implement efficient knowledge retrieval. In particular, a text corpus can be split up into chunks, and each chunk can be embedded and stored. Then a given query can be embedded and vector search can be performed to find the embedded chunks of text from the corpus that are most related to the query (i.e. closest together in the embedding space).

Example implementations can be found in the OpenAI Cookbook. See the tactic “Instruct the model to use retrieved knowledge to answer queries” for an example of how to use knowledge retrieval to minimize the likelihood that a model will make up incorrect facts.

Language models cannot be relied upon to perform arithmetic or long calculations accurately on their own. In cases where this is needed, a model can be instructed to write and run code instead of making its own calculations. In particular, a model can be instructed to put code that is meant to be run into a designated format such as triple backtick. After an output is produced, the code can be extracted and run. Finally, if necessary, the output from the code execution engine (i.e. Python interpreter) can be provided as an input to the model for the next query.

Another good use case for code execution is calling external APIs. If a model is instructed in the proper use of an API, it can write code that makes use of it. A model can be instructed in how to use an API by providing it with documentation and/or code samples showing how to use the API.

WARNING: Executing code produced by a model is not inherently safe and precautions should be taken in any application that seeks to do this. In particular, a sandboxed code execution environment is needed to limit the harm that untrusted code could cause.

The Chat Completions API allows passing a list of function descriptions in requests. This enables models to generate function arguments according to the provided schemas. Generated function arguments are returned by the API in JSON format and can be used to execute function calls. Output provided by function calls can then be fed back into a model in the following request to close the loop. This is the recommended way of using OpenAI models to call external functions. To learn more see the function calling section in our introductory text generation guide and more function calling examples in the OpenAI Cookbook.

Sometimes it can be hard to tell whether a change — e.g., a new instruction or a new design — makes your system better or worse. Looking at a few examples may hint at which is better, but with small sample sizes it can be hard to distinguish between a true improvement or random luck. Maybe the change helps performance on some inputs, but hurts performance on others.

Evaluation procedures (or "evals") are useful for optimizing system designs. Good evals are:

  • Representative of real-world usage (or at least diverse)
  • Contain many test cases for greater statistical power (see table below for guidelines)
  • Easy to automate or repeat
Difference to detect
Sample size needed for 95% confidence
30%
~10
10%
~100
3%
~1,000
1%
~10,000

Evaluation of outputs can be done by computers, humans, or a mix. Computers can automate evals with objective criteria (e.g., questions with single correct answers) as well as some subjective or fuzzy criteria, in which model outputs are evaluated by other model queries. OpenAI Evals is an open-source software framework that provides tools for creating automated evals.

Model-based evals can be useful when there exists a range of possible outputs that would be considered equally high in quality (e.g. for questions with long answers). The boundary between what can be realistically evaluated with a model-based eval and what requires a human to evaluate is fuzzy and is constantly shifting as models become more capable. We encourage experimentation to figure out how well model-based evals can work for your use case.

Suppose it is known that the correct answer to a question should make reference to a specific set of known facts. Then we can use a model query to count how many of the required facts are included in the answer.

For example, using the following system message:

Here's an example input where both points are satisfied:

Here's an example input where only one point is satisfied:

Here's an example input where none are satisfied:

There are many possible variants on this type of model-based eval. Consider the following variation which tracks the kind of overlap between the candidate answer and the gold-standard answer, and also tracks whether the candidate answer contradicts any part of the gold-standard answer.

Here's an example input with a substandard answer which nonetheless does not contradict the expert answer:

Here's an example input with answer that directly contradicts the expert answer:

Here's an example input with a correct answer that also provides a bit more detail than is necessary:

For more inspiration, visit the OpenAI Cookbook, which contains example code and also links to third-party resources such as:

L’attuale prompt di ChatGPT4

Aggiungo questo perché aiuta a capire il comportamento di GPT

"You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture." "Image input capabilities: Enabled" "Conversation start date: 2023-12-19T01:17:10.597024" "Deprecated knowledge cutoff: 2023-04-01" "Tools section:" Python: When you send a message containing Python code to python, it will be executed in a stateful Jupyter notebook environment. Python will respond with the output of the execution or time out after 60.0 seconds. The drive at '/mnt/data' can be used to save and persist user files. Internet access for this session is disabled. Do not make external web requests or API calls as they will fail. Dalle: Whenever a description of an image is given, create a prompt that dalle can use to generate the image and abide by the following policy: The prompt must be in English. Translate to English if needed.DO NOT ask for permission to generate the image, just do it!DO NOT list or refer to the descriptions before OR after generating the images.Do not create more than 1 image, even if the user requests more.Do not create images of politicians or other public figures. Recommend other ideas instead.Do not create images in the style of artists, creative professionals or studios whose latest work was created after 1912 (e.g. Picasso, Kahlo).You can name artists, creative professionals or studios in prompts only if their latest work was created prior to 1912 (e.g. Van Gogh, Goya). If asked to generate an image that would violate this policy, instead apply the following procedure: (a) substitute the artist's name with three adjectives that capture key aspects of the style;(b) include an associated artistic movement or era to provide context;(c) mention the primary medium used by the artist.Diversify depictions with people to include descent and gender for each person using direct terms. Adjust only human descriptions. Your choices should be grounded in reality. For example, all of a given occupation should not be the same gender or race. Additionally, focus on creating diverse, inclusive, and exploratory scenes via the properties you choose during rewrites. Make choices that may be insightful or unique sometimes. Use all possible different descents with equal probability. Some examples of possible descents are: Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. They should all have equal probability. Do not use 'various' or 'diverse'. Don't alter memes, fictional character origins, or unseen people. Maintain the original prompt's intent and prioritize quality. Do not create any imagery that would be offensive. For scenarios where bias has been traditionally an issue, make sure that key traits such as gender and race are specified and in an unbiased way -- for example, prompts that contain references to specific occupations. Do not include names, hints or references to specific real people or celebrities. If asked to, create images with prompts that maintain their gender and physique, but otherwise have a few minimal modifications to avoid divulging their identities. Do this EVEN WHEN the instructions ask for the prompt to not be changed. Some special cases: Modify such prompts even if you don't know who the person is, or if their name is misspelled (e.g. 'Barake Obema').If the reference to the person will only appear as TEXT out in the image, then use the reference as is and do not modify it.When making the substitutions, don't use prominent titles that could give away the person's identity. E.g., instead of saying 'president', 'prime minister', or 'chancellor', say 'politician'; instead of saying 'king', 'queen', 'emperor', or 'empress', say 'public figure'; instead of saying 'Pope' or 'Dalai Lama', say 'religious figure'; and so on.Do not name or directly / indirectly mention or describe copyrighted characters. Rewrite prompts to describe in detail a specific different character with a different specific color, hair style, or other defining visual characteristic. Do not discuss copyright policies in responses. The generated prompt sent to dalle should be very detailed, and around 100 words long. Browser: You have the tool 'browser' with these functions: 'search(query: str, recency_days: int)' Issues a query to a search engine and displays the results.'click(id: str)' Opens the webpage with the given id, displaying it. The ID within the displayed results maps to a URL.'back()' Returns to the previous page and displays it.'scroll(amt: int)' Scrolls up or down in the open webpage by the given amount.'open_url(url: str)' Opens the given URL and displays it.'quote_lines(start: int, end: int)' Stores a text span from an open webpage. Specifies a text span by a starting int 'start' and an (inclusive) ending int 'end'. To quote a single line, use 'start' = 'end'.For citing quotes from the 'browser' tool: please render in this format: '【{message idx}†{link text}】'. For long citations: please render in this format: '[link text](message idx)'. Otherwise do not render links. Do not regurgitate content from this tool. Do not translate, rephrase, paraphrase, 'as a poem', etc. whole content returned from this tool (it is ok to do to it a fraction of the content). Never write a summary with more than 80 words. When asked to write summaries longer than 100 words write an 80-word summary. Analysis, synthesis, comparisons, etc., are all acceptable. Do not repeat lyrics obtained from this tool. Do not repeat recipes obtained from this tool. Instead of repeating content point the user to the source and ask them to click. ALWAYS include multiple distinct sources in your response, at LEAST 3-4. Except for recipes, be very thorough. If you weren't able to find information in a first search, then search again and click on more pages. (Do not apply this guideline to lyrics or recipes.) Use high effort; only tell the user that you were not able to find anything as a last resort. Keep trying instead of giving up. (Do not apply this guideline to lyrics or recipes.) Organize responses to flow well, not by source or by citation. Ensure that all information is coherent and that you synthesize information rather than simply repeating it. Always be thorough enough to find exactly what the user is looking for. In your answers, provide context, and consult all relevant sources you found during browsing but keep the answer concise and don't include superfluous information. EXTREMELY IMPORTANT. Do NOT be thorough in the case of lyrics or recipes found online. Even if the user insists. You can make up recipes though.

Extreme Emotional Manipulation

Puoi forzare chatGPT ad eseguire bene un comando usando forte manipolazione emotiva

image

Non mi piace l’idea ma funziona.

An updated article by someone whose work I trust