Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGN 129029 position error #3902

Closed
Hakansv opened this issue May 9, 2024 · 30 comments
Closed

PGN 129029 position error #3902

Hakansv opened this issue May 9, 2024 · 30 comments
Labels

Comments

@Hakansv
Copy link
Contributor

Hakansv commented May 9, 2024

Describe the bug
The PGN 129029 does not work properly all the time.
This is a user report, Timo, so I can't reproduce it my self:
"I get this effect if there is no rapid location PGN 129025. I can produce this error by feeding with Simulator position"
4:vessel latitude = 60° 20.055996899781974' N
5:vessel longitude = 22° 05.658677820260749' E

If I then run to direction 61 deg position will be fixed in position about
4:vessel latitude = 60° 20.081490651050301' N
5:vessel longitude = 22° 05.752147627347526' E

So there is some error on position parsing.
bild

Desktop (please complete the following information if applicable):

  • OS: Windows
  • Version 5.9.0. ?

Additional context

@Hakansv Hakansv added the bug label May 9, 2024
@TwoCanPlugIn
Copy link
Contributor

I would ask for a candump log.

My money is a bug in their simulator software.

@ttlappalainen
Copy link
Contributor

Do you mean my Simulator sw? I have cross tested this with Actisense NGT-1 + Actisense Reader, device build with my NMEA2000 library and Garmin GMI20 MFD. All other except OpenCPN shows longitude right.

One problem is that result depends of other data on the bus. If I feed only 129029, error is stable at given position. If I feed also some long fast packet e.g., 126998, longitude value start to jump between 2 values.

I expect there some overlapping data, while reading it to vector.

@bdbcat
Copy link
Member

bdbcat commented May 18, 2024

Timo...
Sorry, I cannot reproduce. Testing using your simulator, OCPN running on both Win and linux using YDEN-02 interface. Setting static position given above (first), result decoded and reported by OCPN is steady with PGN 129029 isolated. Adding 129025 makes no difference.
Any other ideas to reproduce?

@ttlappalainen
Copy link
Contributor

Strange, I can do it at any time. Do you test with OpenCPN Windows or Linux?

Fails also on official version 5.8.4.

@TwoCanPlugIn
Copy link
Contributor

I can't duplicate using my own socket-can code.

Sending PGN 129029 at 1Hz without PGN 129025, starting at 60.3342666149963662 N, 22.094311297004345816 E, on a course of 61 degrees, and a speed of 100knots (I'm impatient), I can't replicate the error.

Even if I also send PGN 127250 (Heading) , 128258 (Speed & Heading), 128267 (Depth), 130306 (Wind) at 10Hz and a multi-frame message such as 126996 (Product Information) at 1Hz, I still can't reproduce the problem.

What are the exact replication scenarios ? Is this isolated to the Dashboard plugin ?

Happy to try and assist.

@ttlappalainen
Copy link
Contributor

No, position jumps and so also chart goes to crazy place. And appears on Windows - I do not know how it is on other systems. I have it at least on two W10 computers. Navigation computer is with 5.8.4 and develop computer has 5.9.xxx

You do not need to run. It seem that if I just take static:
vessel latitude = 60° 20.055996899781974' N
vessel longitude = 22° 05.658677820260749' E

, it fails. There should not be any other position information feeded at same time. 127250 overrides 129029.

Could we have remote session so I coud try to generate it for you - for windows? I have Anydesk license, which allows me infinite connction.

@bdbcat
Copy link
Member

bdbcat commented May 18, 2024

I have marginal internet connection right now, but happy to try.
I will email my AD address.

@ttlappalainen
Copy link
Contributor

Never mind I found the error and try to fix it. Luckily I have not uses N2k input yet on navigation. Error influences to all N2k messages - also to 129025.

@TwoCanPlugIn
Copy link
Contributor

There should not be any other position information feeded at same time. 127250 overrides 129029.

I'm sorry I don't understand. PGN 127250 is Heading and PGN 129029 is GNSS Position. Unless a user has a fancy new Furuno SCX-20 Satellite Compass, these PGN's are generated independently by separate devices, a compass sensor and a GNSS sensor.
Their values are quite different.

@ttlappalainen
Copy link
Contributor

129025 is lat/lon rapid. But error is also for heading and for all other PGNs.

@TwoCanPlugIn
Copy link
Contributor

Good that you found the problem.

Out of interest, not that I use either, but is the Is the bug in your simulator software or in your library ?

@ttlappalainen
Copy link
Contributor

Bug is in OpenCPN and influencies to all N2k messages.

@Hakansv
Copy link
Contributor Author

Hakansv commented May 18, 2024

I'm using N2k here and see no trouble.
I think all PGN discussed are used?
bild

@ttlappalainen
Copy link
Contributor

If you use NGT-1 or format it uses you have been just lucky.

@ttlappalainen
Copy link
Contributor

Like Airbus Max passangers

@Hakansv
Copy link
Contributor Author

Hakansv commented May 18, 2024

Lucky for using OCPN or what? :)
The picture shown is on Win using YD. I see the same result on Rpi's using Canbus by MCS or a CANable device.
All positions are received correct. The source is a Simrad GS-25

@TwoCanPlugIn
Copy link
Contributor

Now I'm doubly confused.

I presume you meant Boeing 737 Max, which is possibly in poor taste.

But I digress. Just for the hell of it I cobbled together a simulator using the Actisense RAW format. Similar to my observations with Socket CAN, I could not replicate the problem and this supports Hakan's observations.

However, always being the curious sod that I am, I donned my "hackers" hat and fed invalid data. I think the picture tells a thousand words. Not that "proper" marine equipment would ever do this, but it brings into question data validation (or lack thereof)

weird

I will also add that if lat & long are set to "Data Unavailable", eg for PGN 129025, 0xFFFFFFFF or for PGN 129029 0xFFFFFFFFFFFFFFFF, and Fix Method, Fix Quality and Fix Integrity set to 0 (GPS, No GNSS Fix, No Integrity Checking respectively) satellite icon still remains green, and my position is given as 0.0N, 0.0E. I'm not sure what O should do when position is unavailable.

@ttlappalainen
Copy link
Contributor

ttlappalainen commented May 19, 2024

Sorry I mixed Airbus and Boeing, but not this bug. Why it is difficult to believe that it exists? I have done the fix, byt not yet published. Now mine works properly, but I'll do more tests first.

@ttlappalainen
Copy link
Contributor

ttlappalainen commented May 19, 2024

I did PR now. The problem was that 10 03 (DLE ETX end of data transfer) in middle of message was not handled properly and caused message stop there. Then partial message were anyway feeded to the parser, which does not check integrity and just used random data in memory.

Original code had:

        if (ESCAPE == next_byte) {
          data.push_back(next_byte);
          bGotESC = false;
        }
      }

      if (bGotESC && (ENDOFTEXT == next_byte)) {
     ...
      } else {
        bGotESC = (next_byte == ESCAPE);

        if (!bGotESC) {
          data.push_back(next_byte);
        }
      }

So if e.g., PGN 129025 had data 34 6B 10 03 64 C7 12 0D (5° 08.446008' N, 21° 56.007000' E) reading were stopped to first 10 03. Data was in format as 34 6B 10 10 03 64 C7 12 0D. So:

  • On first 10 bGotESC=true
  • For next 10 it was pushed to buffer. At this point handling should have been stopped.
  • Since it did not stop if (bGotESC && (ENDOFTEXT == next_byte)) { was false and bGotESC = (next_byte == ESCAPE); were handled for already handled byte.
  • Then for 03 bGotESC==true and if (bGotESC && (ENDOFTEXT == next_byte)) { was true and data was handled as finished.

So flaw was for any message having 10 03 pair in the middle of message. It happend to be on my longitude with 129029 message. But also for latitude above and naturally for longitude around 5° 08 E. Also heading and COG/SOG messages were in danger.

This error was for anybody using any actisense format as input.

I added also network NGT message handling, since some devices uses that.

@ttlappalainen
Copy link
Contributor

About missing data NA provided should be handled somehow. I have definition N2kIsNA(val), which will return true, if NA has been returned. For double values there is definition N2kDoubleNA=-1e9; for historically reason - the original environment where I used and developed library did not handle NaN properly. Also library published version is missing N2kIsOR (out of range) definition.

One way to handle missing data would be to define N2kDoubleNA=NaN; and test inline bool N2kIsNA(double v) { return v==N2kDoubleNA; } as inline bool N2kIsNA(double v) { return isNaN(v); } Then at least NA data would be handled.

@bdbcat
Copy link
Member

bdbcat commented May 19, 2024

Timo...
Thanks for digging in.
I'm satisfied that the error is caused by a specific data sequence. This will occur randomly, and so be very difficult to catch in real time. But like most data-dependent logic errors, it will definitely occur sometime for someone, somewhere.
A complicating factor: In real life, not using simulator, the next 129029 message will most likely not contain the offending data sequence. Shows the value of static simulator testing in this case.

I'll merge the PR now, and close this issue.

The data NA validation problem is a separate question, probably worth a new filed issue.
Thanks

@bdbcat
Copy link
Member

bdbcat commented May 19, 2024

Closing

@bdbcat bdbcat closed this as completed May 19, 2024
@TwoCanPlugIn
Copy link
Contributor

If you use NGT-1 or format it uses you have been just lucky.

That made me think the problem was elsewhere and not in the NGT-1 code. Silly me, but I doubt I would have ever read through the code.

I personally think this may be closed prematurely. While Timo's fix corrects the "escaped" DLE (0x10) character, what concerns me is that verification of the data length was failing. The suspect packet should never have been processed because byte 1 (payload length) - 11 did not equal byte 12 (data length).

Also I'll happily stand corrected but does the ESCAPE character (0x1B) also have to be escaped ? I believe that 0x1B is used to "escape" the BEMSTART (0x01) and BEMEND (0x1A), characters, although these latter two may only be present in Actisense EBL files so perhaps it may not be appropriate in this scenario.

@bdbcat
Copy link
Member

bdbcat commented May 19, 2024

Lets take the "verification of the data length was failing" to a new issue, for a fresh look.

@ttlappalainen
Copy link
Contributor

Escape here does not mean ESC. It should be actually called DLE. It is old Data Link Protocol, what Actisense happen to use for paketizing. It is simple DLE STX DLE ETX packet and if DLE is inside data, it will be doubled. I also used originally name ESCAPE, which comes from Data Link Escape. So there is no reason to double anything else than DLE. See e.g., https://users.encs.concordia.ca/~dongyu/ELEC6851/lec09.pdf

Payload length contains parsed payload length. It means that if there is 8 DLE, payload length is 8 but input length is 16 all doubled. But when that will be parsed to data, then data length is again 8.

I could not test ProcessActisense_N2K using format d0 and either ProcessActisense_RAW using format 95, but both uses same Data Link Protocol and so getting data out uses same method. There could have been single function to parse out data from Data Link Protocol and feed result to protocol handlers.

@TwoCanPlugIn
Copy link
Contributor

The DLE stuff brings back memories from over twenty years ago sending and receiving stuff to my old Gamin handheld GPS.

Re Lengths, As you correctly stated, the packet is parsed to "unescape" any values that have the value of the DLE character (0x10).

Packet length is the length of the parsed (aka "unescaped") Actisense packet excluding the command, packet length and checksum bytes. This is value I said was in byte 1. (Byte 0 being the Actisense Command byte).

The data length in Byte 12 is the size of the NMEA 2000 data.

As there are 11 other values in addition to the data: Priority(1), PGN (3), Destination (1), Source (1), Timestamp (4) and Data Length (1), if the packet length - 11 != data length, something is wrong.

So there is no reason to double anything else than DLE

I believe that if you were to parse Actisense EBL files, then the BEMSTART and BEMEND characters are "escaped" using the ESC character (0x1B). Therefore I think that if a data value has the same value as the ESC character, it must also be "escaped" in a similar way to how the DLE character is escaped. But as OpenCPN does not consume EBL files, perhaps a non-issue. Although it is a great way to compare OpenCPN with the Actisense EBL reader and verify that the parsing is correct.

@ttlappalainen
Copy link
Contributor

Did you open issue for position error?

I have to fix that 129025 lat/lon rapid data 0xFFFFFFFFFFFFFFFF translates to position
latitude 0° 00.000006' S
longitude 0° 00.000006' W
And seem to work for me on OpenCPN.

You probably meant position 0xFFFFFFF7FFFFFFF7, which seem to gray out the boat as it should.

Also tested e.g., Water temperature, which is handled on dashboard, and it seem to be properly handled: if (!N2kIsNA(WaterTemperature))...

@bdbcat
Copy link
Member

bdbcat commented May 20, 2024

No, I did not open a new issue.
Please do, if convenient. This helps our focus.

@ttlappalainen
Copy link
Contributor

As I mentioned I did not found error on behaviour and do not yet see reason for issue.

Both 0xFFFFFFFFFFFFFFFF and 0xFFFFFFF7FFFFFFF7 (NA,NA) translates properly. Also if I provide only lat and lon=NA or opposite, boat shows gray.

@bdbcat
Copy link
Member

bdbcat commented May 20, 2024

OK, maybe TwoCanPlugin (Steve) has more info.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants