Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct way to maintain a connection to a PLC. #188

Open
Leery2495 opened this issue Nov 23, 2021 · 38 comments
Open

Correct way to maintain a connection to a PLC. #188

Leery2495 opened this issue Nov 23, 2021 · 38 comments

Comments

@Leery2495
Copy link

Hi,
Just a quick question if possible. In the examples provided when connecting to the processor we use with PLC() as comm:,
I am writing an application that basically reads tags provided and displays them. The application has connection to multiple PLC's.
What is the correct way to store the connection to the PLC so that I can recall it at any time.
The reason I want to do this is that opening and closing the connection is resulting in the EXCP 0300 error on my processor cards.
Any help is appreciated.

@Leery2495
Copy link
Author

I seen this post but dont fully understand what it is saying.
#9 (comment)

@dmroeder
Copy link
Owner

When you first read or write a tag, the connection is established. If you have not closed the connection (calling close directly or indirectly), you have about a minute and a half to make another read or write before the PLC will drop the connection. If you keep making reads or writes before the PLC closes the connection, the same connection will be used.

That particular exception is the card not handling sheer volume of connections. ENBT's didn't handle flushing connections very well, so if you open/close connections too quickly, it would eventually cause an error in the card. This is what was happening in issue #9

Consider:

import pylogix
for i in range(10000):
    with pylogix.PLC("192.168.1.10") as comm:
        ret = comm.Read('MyTag')

Each iteration of the loop will open a connection, read the tag, close the connection. 10,000 times. This not what you want to do, it would be better to put the loop inside the pylogix context so that only one connection is made:

import pylogix
with pylogix.PLC("192.168.1.10") as comm:
    for i in range(10000):
        ret = comm.Read('MyTag')

@Leery2495
Copy link
Author

When you first read or write a tag, the connection is established. If you have not closed the connection (calling close directly or indirectly), you have about a minute and a half to make another read or write before the PLC will drop the connection. If you keep making reads or writes before the PLC closes the connection, the same connection will be used.

That particular exception is the card not handling sheer volume of connections. ENBT's didn't handle flushing connections very well, so if you open/close connections too quickly, it would eventually cause an error in the card. This is what was happening in issue #9

Consider:

import pylogix
for i in range(10000):
    with pylogix.PLC("192.168.1.10") as comm:
        ret = comm.Read('MyTag')

Each iteration of the loop will open a connection, read the tag, close the connection. 10,000 times. This not what you want to do, it would be better to put the loop inside the pylogix context so that only one connection is made:

import pylogix
with pylogix.PLC("192.168.1.10") as comm:
    for i in range(10000):
        ret = comm.Read('MyTag')

Thanks @dmroeder
So because I'm going back and forward between different PLC's. What is the best way of ensuring that the connection is still open? Can I store the comm object in memory somehow so that a different thread can pick it up and use it?

@kyle-github
Copy link

kyle-github commented Nov 23, 2021

you have about a minute and a half to make another read or write before the PLC will drop the connection.

Oops. Hit comment by accident.

I have had very poor luck waiting that long. In my testing I see relatively consistent drops after about five seconds. I must be setting up the connection differently than you.

I ended up writing code within my library to automatically close the connection, nicely, after five seconds of idle time and reconnect automatically when read our write are called.

@dmroeder
Copy link
Owner

you have about a minute and a half to make another read or write before the PLC will drop the connection.

Oops. Hit comment by accident.

I have had very poor luck waiting that long. In my testing I see relatively consistent drops after about five seconds. I must be setting up the connection differently than you.

I ended up writing code within my library to automatically close the connection, nicely, after five seconds of idle time and reconnect automatically when read our write are called.

It might be that not all controllers and/or firmware revisions are equal. I tested on a CompactLogix and the connection was flushed after about a minute and a half. Certainly never experienced 5 seconds, as long as ForwardClose and/or UnregisterSession is not called. Maybe a controller is more aggressive about connections the more connections it currently has. It's been a while since I've looked into this but the connection parameters in the forward open might matter as well.

@Leery2495 I've found that the best result when using threads is for each thread to have it's own instance of pylogix, rather than share it with each thread.

@evaldes2015
Copy link

My experience is also that the connections drop at about 5 seconds.

In every application I've built that talks to Rockwell PLCs I always "heartbeat" the PLC if more than 2 seconds have passed with no activity on the connection.

@ottowayi
Copy link

The timeout is set as part of the forward open, so based on the _buildCIPForwardOpen method the timeout should be about 14 seconds.
CIPPriority is 0x0A which means the tick time is 1024ms and CIPTimeoutTicks is 0x0E or 14 ticks, so the timeout values is 1024 * 14 = 14336ms or ~14 seconds. The unconnected send method also uses the same values, so it should be the same for unconnected messages as well.

Any message can work as a heartbeat to maintain the connection, in cases where I've needed it simple things like reading the PLC time or getting the program name is what I've used.

@kyle-github
Copy link

I went with a connection close instead of a keep-alive. I found that my own code either:

  1. Read frequently and thus maintained its own keep-alives.
  2. Read/wrote very infrequently and thus was just hogging PLC connection resources.

I didn't seem to have two many cases in between. A very informal question to my users at the time showed similar results. In the case that reads/writes are done infrequently, the time overhead of setting up a new connection is generally OK.

But perhaps I should add a keep-alive as an option.

@Leery2495 I've found that the best result when using threads is for each thread to have it's own instance of pylogix, rather than share it with each thread.

Does each instance have its own connection? If so, I'd be a bit careful. At least one of my old PLCs only supports 32 connections. The environments my library grew up in were fairly heavily networked and PLCs often had several open connections from other PLCs and other systems. Thus, I try pretty hard to minimize the number of connections. Should make that easier to manage though. Hmm, going to file myself an enhancement ticket.

@TheFern2
Copy link
Collaborator

I'm all for a connection keep alive option for pylogix as well, I think we've had a fair share of the same question around connection timeout. I think we can just read a system tag or clock on a background thread, no?

@kyle-github
Copy link

I should mention that I did definitely see a case where allowing the PLC to drop the connections itself was a problem. This was with a very old ENBT in a ControlLogix (perhaps they fixed this now) and it took so long for the connection resources to get cleaned up in the ENBT that I was able to run out of connection resources simply by waiting 30 seconds and reconnecting without explicitly disconnecting. I did that repeatedly just to see what would happen and the ENBT locked up. I have a ControlLogix with a L81 CPU and that definitely cleans up faster than that.

In general I strongly suggest making sure you are careful about cleaning up as the PLC might not be very fast or efficient.

@Leery2495
Copy link
Author

Leery2495 commented Nov 24, 2021

Thanks for all the input everyone.
I should clarify. As it stands i'm forming a connection once every ten seconds, I then read all the tags I need to and close the connection. In my view this should be making one or two connections only in the timeout window. But for some reason my module is still faulting with the EXCP 0300 error. The reads themselves are celery driven as part of a bigger application, therefore each read may be done by any worker.
I am doing the same thing on another system with no issues but with this I am getting the fault roughly once every two days.
Does anybody know a better way for me to debug this as it sounds like my current method may not be the issue.
Thanks again.

@TheFern2
Copy link
Collaborator

TheFern2 commented Nov 24, 2021

The quick and dirty way would be to create one connection object per plc, and do not call close, if you have a cleanup function like ctrl + c sigint or something along those lines you can close connections there. Then create a keep alive function to read a dummy tag just to keep the connection alive. That should hopefully prevent the error from happening on this ent card.

What code are you currently using?

@dmroeder
Copy link
Owner

dmroeder commented Nov 24, 2021

This exception is specific to the ENBT, as far as I'm aware. I'm guessing that the other systems that seem to work fine are not 1756-ENBT, or they are of a different firmware revision. Rockwell recommends flashing the module to 6.006 or higher as it addresses issues regarding this exception. The ENBT's web page gives some good information regarding the number of connections, loading and other information. I can help you analyze some screenshots of the page. You can email them to me if you are more comfortable with that.

Edit: I see, exactly what @kyle-github was talking about.

@dmroeder
Copy link
Owner

@Leery2495 I found an ENBT to test against. Opening/closing connections doesn't seem to be the problem, opening new connections without closing connections is a bigger problem. Of course sharing instances between threads can be an issue too.

I'd verify in wireshark that you are not accidentally opening a new connection with each read, if you are, and you don't have an easy way to prevent that, then make sure you close it.

A simple example, you can quickly open too many connections by doing something like this:

for i in range(1000):
    comm = pylogix.PLC("192.168.1.9")
    ret = comm.Read("ProductPointer")
    print(ret.Status)

@Leery2495
Copy link
Author

Leery2495 commented Nov 28, 2021

@shared task
def machine_scan():
    processors = Processors.objects.filter(enabled = True)
    for p in processors:
        machines = Machine.objects.filter(machine_processor = p)
        with PLC() as comm:
            comm.IPAddress = p.processor_ipaddress
            comm.ProcessorSlot = p.processor_slot
            if p.processor_routing:
                comm.Route = literal_eval(p.processor_routing)
            for m in machines:
                try:
                    m.scan(comm)
        comm.Close()

Sorry it took so long to get back to you. Only back in the office today.
This is the code that is causing me the issues. the m.scan() function is just a list of all the tags I wish to read from each machine and where to put the results so I didnt think including it was relevant. From what you have said I am wondering if because i'm setting the IP route for each connection it is creating a new connection for every read. Perhaps I need something that checks if the comm config is the same and if so doesnt set it again.
And to clarify both the card that has no issues and the one that does are same card with same firmware revision so definitely something i'm doing wrong in the code.

Thanks again.

@TheFern2
Copy link
Collaborator

TheFern2 commented Nov 29, 2021

When you're using the with context the connection should be closed after going out of scope, so comm.Close() isn't needed. I don't see any other item that stands out as being an issue assuming m.scan works and machines are all within the same processor ip, route. However I am not entirely sure about the shared task decorator, is that multiprocessing?

I would test without the decorator, using your same function just try to read a tag here without going out of scope. Then try reading with decorator. If that works then somehow the comm object isn't being shared to m.scan properly.

def machine_scan():
   [snip]
    for m in machines:
        try:
            ret = comm.Read("Some_Tag")
            print(ret.Status)

Another thing you can do is on the scan function is to check if the comm object is equal to None, if it is, then you know for sure this function isn't passing the correct comm object.

@dmroeder
Copy link
Owner

To add to @TheFern2's reply, what is unclear to me is what would happen if machine_scan() was called before a previous call was completed. Your processor object has all the properties for PLC(), maybe a better approach would be to add an instance of PLC() to your processor object instead. Then you might be able to do something like:

@shared task
def machine_scan():
    processors = Processors.objects.filter(enabled = True)
    for p in processors:
        machines = Machine.objects.filter(machine_processor = p)
        for m in machines:
            try:
                m.scan(p.comm)
            except:
                pass

As far as your error and this only happening to the one module, everything I've read about that error says the module is running out of resources. That would take more investigation as to why specifically that one module. Are there other instances of the same module part number working fine? Or is that the only ENBT?

I mentioned before, the two best troubleshooting tools for this will be wireshark and the ENBT's web page. Make sure your ENBT is not running too low on resources. Make sure connections aren't being opened and never closed

@Leery2495
Copy link
Author

Leery2495 commented Nov 30, 2021

Thanks again both. Monitoring the card for the last day or two and haven't had the issue as of yet. Max observed connections being only one at a time. I like the approach of adding a p.comm instance. I'll report back if I have anything further but at this point im starting to suspect that the ENBT card is perhaps faulty as suggested. This is new out of the box so its possible. Im going to try swapping it with one of the known working cards at an opportunistic moment.

@evaldes2015
Copy link

If you're going to swap the card, go for an EN2T if you can. The ENBTs were problematic at best. They were very easy to overload.

@dmroeder
Copy link
Owner

It's possible that the card is suspect out of the box. Honestly though, I think they were just never very good at managing resources.

@dmroeder
Copy link
Owner

@Leery2495
Copy link
Author

There was this too: https://rockwellautomation.custhelp.com/ci/okcsFattach/get/41204_4

Hi, Just getting in touch to feedback on the issue.
I did not go through all of the steps in this bulletin to 100% make sure my device was affected but I did have two dhrio cards in the same rack. Revisions B and C. I upgraded them to revision E two weeks ago and am reading every second so far with no issues. Before doing so this would last a day or two if I was lucky. So i'll give it another month and report back on whether there was anymore issues. Thanks again.

@Leery2495
Copy link
Author

I also have an EN2T on the way but no stock until May...

@Leery2495
Copy link
Author

Leery2495 commented Mar 13, 2022

Still having bother but only like once a month. I am currently trying to refactor my application but i'm still not sure what the correct approach is. The idea is I dedicate one celery worker that continues to run as long as connection is available with a heartbeat to ensure connection is still established. Does anyone have any experience with this sort of usage and how I might then queue tasks to the communication worker and wait on response.
Really not sure where to go from here.
@dmroeder

@TheFern2
Copy link
Collaborator

It sounds like you need a server handling these requests. You also need to handle the connection exception properly and have some sort of while loop until connection is good again.

Have a look here for advanced error handling

https://youtu.be/ZsvftkbbrR0

Have a look here for an idea
https://github.com/TheFern2/pylogix-api I don't have a global maintained connection but you could have one. And then spin up little servers based on how many plc's you have.

@Leery2495
Copy link
Author

It sounds like you need a server handling these requests. You also need to handle the connection exception properly and have some sort of while loop until connection is good again.

Have a look here for advanced error handling

https://youtu.be/ZsvftkbbrR0

Have a look here for an idea
https://github.com/TheFern2/pylogix-api I don't have a global maintained connection but you could have one. And then spin up little servers based on how many plc's you have.

Thank you. I'll take a look.

@Leery2495
Copy link
Author

I am still confused about how to actually go about keeping the connection open. When you say smaller servers would that mean running another Django application for each processor on a different port and then retrieving values from there. Or would there be a way to maintain the connection within the current server.
Either way I think this is beyond me but just not sure how you go about figuring this stuff out.
Thanks again.

@TheFern2
Copy link
Collaborator

TheFern2 commented Mar 14, 2022

You could probably do it with one server, that server could make sure all your plc connection(s) are maintained open, problem with python is that is not async. So if one connection is hung up, it will be blocking code. So whatever you do, you need to have this connection check service on a separate thread, or threads if multiple connections.

PLC1 <> Thread to maintain connection <> Django/Flask server 1 port 5555
PLC2 <> Thread to maintain connection <> Django/Flask server 1 port 5555

If you don't feel like dealing with threads, then yes you'll need to spin up djanjo/fastapi/flask servers for each plc connection, since there's no connection your request should bounce a 4xx http request code until the connection is back up.

PLC1 <> Django/Flask server 1 (Server maintains connection without a thread, if conn is bad, is ok since is one plc conn) port 5555
PLC2 <> Django/Flask server 2 port 5556

At least that how I would do it, others might have better ideas.

@Leery2495
Copy link
Author

Thanks @TheFern2
I have implemented the threads to manage the servers which seems to be working fine. Any videos on how I actually use the processor comm thread to return results to the main thread. I think this is the part my understanding is really lacking.
I really appreciate the help you are giving me.

@evaldes2015
Copy link

Python has locking objects. You could use that to have the comm threads add results to a list or a dict for the main thread to read.

@Leery2495
Copy link
Author

@evaldes2015 I have had a look but cant quite figure how that would work.
The goal is to have the main process request the result of a tag. From here it is sent to the thread that is handling the processor communication. This thread then picks it up and executes. I would then like to return the value to the waiting main process. Or if connection is unavailable return an error which shoots a 404.
Will locking help me achieve this?
Regards

@TheFern2
Copy link
Collaborator

Yeah you want to use a shared variable between the main proc and the thread. So you pass a list or dict as @evaldes2015 mentioned, and you add results from the thread side. The main proc doesn't even have to know about PLC object at all.

https://www.pythonforthelab.com/blog/handling-and-sharing-data-between-threads/

@TheFern2
Copy link
Collaborator

You could have a shared memory dictionary that contains, thread_id, plc_id, and the PLC object, probably use plc_id for the key, when someone makes a tag read/write request on the server, you look up the PLC object by plc_id, then use it to make your tag reads.

Then the threads are just responsible for keeping the PLC conn alive or reconnecting and updating the dictionary accordingly. So on the threads you'll iterate through the dict until thread id matches and then update the plc conn object.

Does that make sense, I can write something on paper if it helps.

@evaldes2015
Copy link

Or use a key/value store like memcached, Redis or etcd.

@TheFern2
Copy link
Collaborator

Yeah there's 1001 ways to do it, main thing is to keep the connection alive, how you stored data is mainly a subproblem. Before you even spin up a server, just experiment with a main process and one thread with a shared variable with the PLC object, read a tag, disconnect the PLC, and make sure your thread can reconnect without crashing, then read a tag again. The arjancodes link about exceptions has a good reconnect example which can be adapted for this purpose.

@dmroeder
Copy link
Owner

@Leery2495 your original problem was that the ENBT was giving you an EXCP 0300 exception right? You've made some changes to your python code, it happens, but less frequently. Is that was is still going on? Or are we talking about a different issue?

@Leery2495
Copy link
Author

Leery2495 commented Mar 15, 2022

Yes. Still the excp0300 issue. It wasn't changing my code that made it less frequent. It was removing the dhrio card listed on the bulletin. I also moved the enbt to slot 1 which somehow helped a lot. I am now trying to rework my code so instead of making lots of separate connections it instead works with one maintained connection. I am still getting between one a week and one a month faults.

@hectibrown
Copy link

Hi, I just stumbled on this thread question, just want to say that the "except: pass" solved all of my issues. I'm using pycomm to read our alarm messages and creating windows widows popups when the alarm causes down time, I run the application when I'm at an installation and I am usually on Wifi and tend to loose signal to the PLC causing the program to quit unexpected. the except: pass fixed everything. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants