Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loosing connection cloud <-> edge after few hours --- RPC Error: NO_ACTIVE_CONNECTION #8

Open
AdrienAdB opened this issue May 18, 2022 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@AdrienAdB
Copy link
Contributor

Describe the bug

Self hosted TB cloud seems to loose connection with Edge after sometime (hours).
Issue apply only for RPC calls from cloud server. Error from audit log TB Cloud: "RPC Error: NO_ACTIVE_CONNECTION"
It seems like unassigning/reassigning Edge device restart connection, RPC call will work after that.

Your Server Environment

  • own setup
    • cloud
    • ThingsBoard Version 3.3.4.1 Community
    • TB Edge 3.3.4.1 Community
    • Ubuntu server

To Reproduce
Steps to reproduce the behavior:

  1. Make RPC call from cloud. It works.
  2. Wait few hours, I have the problem over night on the next morning.
  3. Try same RPC call from TB cloud dashboard (audit log: RPC Error: NO_ACTIVE_CONNECTION). I can confirm the same RPC call works fine from TB Edge dashboard.
  4. Unassign Edge device.
  5. Assign Edge device.
  6. RPC call success.

Screenshots

Screenshot 2022-05-18 at 10 56 17

@AdrienAdB AdrienAdB added the bug Something isn't working label May 18, 2022
@volodymyr-babak
Copy link
Collaborator

hi @AdrienAdB
that is most probably related to the active status of the device on the cloud.
That's a known bug that is targeted to be fixed in the next release.
Because the device is connected to the edge, its status on the cloud is not properly updated and the RPC call failed.
I'm going to review it later in details and provide you feedback on the exact reason.

@AdrienAdB
Copy link
Contributor Author

Thanks @volodymyr-babak.

@volodymyr-babak
Copy link
Collaborator

Hello @AdrienAdB
I'm trying to reproduce this problem and fix it.
Could you please provide your device protocol?
Are you using MQTT?
Do you send some data overnight from edge to cloud?

@AdrienAdB
Copy link
Contributor Author

AdrienAdB commented Jun 12, 2022

Hello,

  • http from device to edge.
  • I use default conf from edge to cloud. (not sure about the protocol). Find my conf below.
  • No data coming overnight.
  • TB-Edge is behind NAT, it has accesss to internet (tb-cloud), but no incoming port forwarding to its instance.
  • TB-Cloud has 1883,7070,80,443 opened.

Overall setup work really well, TB-Edge make remote very fast. Only issue is this disconnection overnight.

I will try to send extra data overnight, something every 10min and see if problem persists. That can be an easy work around time issue is resolved.

# /etc/tb-edge/conf/tb-edge.conf 
#
# Copyright © 2016-2022 The Thingsboard Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

export JAVA_OPTS="$JAVA_OPTS -Dplatform=deb -Dinstall.data_dir=/usr/share/tb-edge/data"
export JAVA_OPTS="$JAVA_OPTS -Xlog:gc*,heap*,age*,safepoint=debug:file=/var/log/tb-edge/gc.log:time,uptime,level,tags:filecount=10,filesize=10M"
export JAVA_OPTS="$JAVA_OPTS -XX:+IgnoreUnrecognizedVMOptions -XX:+HeapDumpOnOutOfMemoryError"
export JAVA_OPTS="$JAVA_OPTS -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+PerfDisableSharedMem -XX:+UseCondCardMark"
export JAVA_OPTS="$JAVA_OPTS -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:+UseStringDeduplication -XX:+ParallelRefProcEnabled -XX:MaxTenuringThreshold=10"
export LOG_FILENAME=tb-edge.out
export LOADER_PATH=/usr/share/tb-edge/conf,/usr/share/tb-edge/extensions
export SQL_DATA_FOLDER=/usr/share/tb-edge/data/sql

# UNCOMMENT NEXT LINES AND PUT YOUR CLOUD CONNECTION SETTINGS:
export CLOUD_ROUTING_KEY=xxxxxx-xxxxxxxx
export CLOUD_ROUTING_SECRET=xxxxxx

# UNCOMMENT NEXT LINES IF EDGE CONNECTS TO CE 'DEMO.THINGSBOARD.IO' SERVER:
export CLOUD_RPC_HOST=xxxxxx

# UNCOMMENT NEXT LINES IF YOU CHANGED DEFAULT CLOUD RPC HOST/PORT SETTINGS:
# export CLOUD_RPC_HOST=xxxxxx
# export CLOUD_RPC_PORT=7070

# UNCOMMENT NEXT LINES IF YOU ARE RUNNING EDGE ON THE SAME MACHINE WHERE THINGSBOARD SERVER IS RUNNING:
# export HTTP_BIND_PORT=18080
# export MQTT_BIND_PORT=11883
# export COAP_BIND_PORT=15683

# UNCOMMENT NEXT LINES IF YOU HAVE CHANGED DEFAULT POSTGRESQL DATASOURCE SETTINGS:
# export SPRING_DATASOURCE_URL=jdbc:postgresql://localhost:5432/tb_edge
export SPRING_DATASOURCE_USERNAME=postgres
export SPRING_DATASOURCE_PASSWORD=xxxxxx

@AdrienAdB
Copy link
Contributor Author

Device is now sending "keepAlive" attribute every 1min.
I let you know tomorrow...

@AdrienAdB
Copy link
Contributor Author

Hi, "keepAlive" attribute every minute didn't fix issue.

@volodymyr-babak
Copy link
Collaborator

Hello @AdrienAdB

thanks for the updates.
Pull request that should fix this issue was created:
https://github.com/thingsboard/thingsboard-pe/pull/897

It should be available next release.
I'm going to validate this use case separately and let you know the results before the release.
But this will work only in the case of sending "keepAlive" events from the device to keep the session active on a cloud.

volodymyr-babak pushed a commit that referenced this issue Nov 9, 2022
…s_fetch_to_msg_data

added ALLOW_UNQUOTED_FIELD_NAMES_MAPPER to JacksonUtil
@truongvanhuy2000
Copy link

Have you knew the fix for this?

@volodymyr-babak
Copy link
Collaborator

@truongvanhuy2000

please provide additional details on your issue

  1. Are you seeing 'RPC Error: NO_ACTIVE_CONNECTION' error in the logs?
  2. Do you send any data from the edge to the cloud actively? Or you have some pauses in sending data?

So please provide any additional data so issue can be reproduced and fixed.

@AndreMaz
Copy link

AndreMaz commented May 4, 2023

hi @AdrienAdB that is most probably related to the active status of the device on the cloud. That's a known bug that is targeted to be fixed in the next release. Because the device is connected to the edge, its status on the cloud is not properly updated and the RPC call failed. I'm going to review it later in details and provide you feedback on the exact reason.

Hi @volodymyr-babak I'm on TB Edge 3.4.4 and just got exactly the same problem. I'm using MQTT between the TB Edge and devices. Here's what I get in Audit Logs of the device
image

After the "unassign -> assign" the problem is gone.

@volodymyr-babak
Copy link
Collaborator

@AndreMaz

Can you kindly check if there's an RPC_CALL event logged immediately before or after you notice the NO_ACTIVE_CONNECTION error? This will help us ascertain whether the RPC_CALL request is being sent to the edge, or if it's not leaving the cloud.

image

Also, it would be insightful to determine if there are any RPC_CALL cloud events being logged on the edge. If the RPC_CALL is being sent from the cloud but is not being received at the edge, it may signify network issues or problems with the edge's ability to process the RPC_CALL.

Your observations on these points will be very valuable for us to pinpoint the issue and help you further.

I look forward to your response. If you have any additional questions or need further clarification, please don't hesitate to ask.

@akseerali
Copy link

Hi, after the new update to ThingsBoard, the RPC Call Request at Cloud is showing No Active Connection. From the Edge it can send the RPC request.

Kindly share any hints, thanks.

image

@akseerali
Copy link

akseerali commented May 29, 2023

Hi, after the new update to ThingsBoard, the RPC Call Request at Cloud is showing No Active Connection. From the Edge it can send the RPC request.

Kindly share any hints, thanks.

image

Problem is resolved. Please refer this issue for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants