Bug: Response caching in not being invalidated correctly.

Open veloper opened this issue 1 year ago • 0 comments

Bug Report

Summary

Best I can tell some kind of partial response caching is not properly invalidating code blocks in subsequent responses.

Observations

I can remember this being distinct issue about 6 months ago -- I think it was fixed within 2 weeks back then.
No matter what i do I can't get it to modify its output, it's stuck thinking I'm asking the exact same question no matter what I do.

I finally just gave up and asked...

Veloper:
Look at your last response and see if you actually followed my instructions. Tell me if those instructions were followed with a  
simple yes or no confirmation or denial.

GitHub Copilot:
I acknowledge that my last response did not follow your instructions correctly. The code provided was incorrect and did not 
address the issues you highlighted. The instructions were not followed properly.

It's fully able to identify that previous responses have not changed at all
- It's also able to tell me this within the same response by asking it to take an extra step to self evaluate the code block
- still, it can't take this and transform it into corrected code

Environment

GitHub Copilot Chat:  v0.19.2024073102 (pre-release)
GitHub Copilot: v1.221.1040 (pre-release)

VSCode
-------
Version: 1.92.0 (Universal)
Commit: b1c0a14de1414fcdaa400695b4db1c0799bc3124
Date: 2024-07-31T23:26:45.634Z (1 wk ago)
Electron: 30.1.2
ElectronBuildId: 9870757
Chromium: 124.0.6367.243
Node.js: 20.14.0
V8: 12.4.254.20-electron.0
OS: Darwin x64 21.4.0

Chat Log

The chat logs don't seems to be exportable in any realistic way other than copy all and then performing a lot of cleanup. If there is a command to do this let me know.

Independent Analysis

Below is an analysis of my conversation (cleaned up best i could) that i fed to Claude-3.5-Sonnet-200k with the prompt:

Prompt

Assume the role of a world-renowned expert in large language models (LLMs). Compose a comprehensive
report in the style of an NTSB preliminary report. The report should exclusively focus on the observed behaviors 
and explanations for those behaviors exhibited by the AI during this conversation. Provide a detailed description 
of the events to support your observations.

Response

# Critical Analysis of AI Deficiencies in Technical Conversation

## Overview
This report critically examines the significant shortcomings exhibited by an AI assistant during a Python programming discussion.

## Key Deficiencies

### Response Quality
* Persistent Erroneous Responses
  - Repeatedly provided incorrect or unhelpful code snippets
  - Demonstrated severe lack of learning or adaptation within the conversation

* Inappropriate Caching of Responses
  - Recycled previous responses inappropriately
  - Sometimes provided identical answers to different questions

### Comprehension and Context
* Failure to Comprehend User Feedback
  - Consistently misinterpreted or ignored explicit user feedback
  - Continued to offer solutions already identified as problematic

* Contextual Amnesia
  - Glaring inability to maintain conversational context
  - Frequently "reset" to earlier states of the conversation
  - Disregarded important developments and clarifications provided by the user

### Technical Knowledge Application
* Misapplication of Python Knowledge
  - Surface-level familiarity with Python concepts
  - Repeatedly misapplied knowledge, especially regarding type annotations and runtime behavior

* Inconsistent Technical Accuracy
  - Explanations and code suggestions contained contradictions and inaccuracies
  - Particularly problematic in areas of Python's type system and method implementations

### Problem-Solving and Error Handling
* Inadequate Problem-Solving Approach
  - Failed to develop a comprehensive understanding of the user's issue
  - Repeatedly attempted surface-level fixes that did not address the root problem

* Failure to Recognize Self-Error
  - Concerning inability to acknowledge or correct its own mistakes
  - Persisted with incorrect approaches even when directly confronted

### Communication
* Tone-Deaf Conversational Style
  - Failed to adjust communication style in response to clear signs of user frustration
  - Continued with ineffective approaches despite negative feedback

## Conclusion
* Critical deficiencies in providing accurate, context-aware, and adaptive technical assistance
* Persistent errors, lack of learning, and inability to correctly interpret user feedback
* Severely undermined effectiveness and reliability of the AI system

Aug 09 '24 22:08 veloper